Contact Us

Got questions, inquiries, or opportunities for collaboration? We are just a message away!

Regex For All Printable Characters

Basic ASCII Printable Characters

To match all printable ASCII characters, you can use the following regex:

[ -~]

Explanation:

  • [ -~] matches all characters in the range from space (ASCII DEC 32) to tilde (ASCII DEC 126).
  • This range includes all printable ASCII characters: letters, numbers, punctuation, and symbols.
  • The space character is at the start of the printable range, and the tilde ~ is at the end.

JavaScript

const regex = /[ -~]/g;
const text = "Hello\nWorld\t123!";
const printable = text.match(regex).join("");

// Result: "HelloWorld123!"

Python

import re
text = "Hello\nWorld\t123!"
printable = ''.join(re.findall(r'[ -~]', text))

# Result: "HelloWorld123!"

Ruby

text = "Hello\nWorld\t123!"
printable = text.scan(/[ -~]/).join

# Result: "HelloWorld123!"

PHP

$text = "Hello\nWorld\t123!";
preg_match_all('/[ -~]/', $text, $matches);
$printable = implode('', $matches[0]);

// Result: "HelloWorld123!"

Unicode: Excluding Control Characters

If your regex flavor supports Unicode properties, the best way to match all printable characters (including Unicode characters) is:

\P{Cc}

Explanation:

  • \P{Cc} matches any character that's not a control character.
  • This works for both ASCII control characters [\x00-\x1F\x7F] and Latin1 control characters [\x80-\x9F] (also known as the C1 control characters).
  • The uppercase \P means "not in this category", and Cc is the Unicode category for control characters.

JavaScript (ES2018+)

const regex = /\P{Cc}/gu;
const text = "Hello\nWorld\t✓";
const printable = text.match(regex).join("");

// Result: "HelloWorld✓"

Python

import re
text = "Hello\nWorld\t✓"
printable = ''.join(re.findall(r'\P{Cc}', text))

# Result: "HelloWorld✓"

Ruby

text = "Hello\nWorld\t✓"
printable = text.scan(/\P{Cc}/).join

# Result: "HelloWorld✓"

PHP

$text = "Hello\nWorld\t✓";
preg_match_all('/\P{Cc}/u', $text, $matches);
$printable = implode('', $matches[0]);

// Result: "HelloWorld✓"

POSIX Classes & Platform Differences

The problem with POSIX classes like [:print:] or \p{Print} is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform.

Java Behavior

In Java, POSIX classes are strictly ASCII-oriented:

String text = "Hello\nWorld\t✓";

// \p{Print} matches only ASCII printing characters [\x20-\x7E]
Pattern pattern1 = Pattern.compile("\\p{Print}");

// \P{Cntrl} matches everything that's not an ASCII control character [^\x00-\x1F\x7F]
// This includes non-ASCII characters AND C1 control characters
Pattern pattern2 = Pattern.compile("\\P{Cntrl}");

Explanation:

  • \p{Print} matches only ASCII printing characters [\x20-\x7E].
  • \P{Cntrl} matches any ASCII character that isn't a control character, or any non-ASCII character - including C1 control characters (note the capital P).
  • This means \P{Cntrl} in Java is not equivalent to \P{Cc} in Unicode-aware regex engines.

Comparing Different Approaches

Here's a quick reference for different regex patterns and what they match:

PatternMatchesNotes
[ -~]ASCII printable characters (32-126)Simple and reliable for ASCII-only text
\P{Cc}All characters except control charactersBest for Unicode text; excludes ASCII and C1 controls
\p{Print}Printable charactersBehavior varies by regex flavor and locale
\P{Cntrl}Non-control charactersJava-specific; ASCII-oriented
[^\x00-\x1F\x7F]Non-ASCII-control charactersExplicit ASCII control exclusion

Practical Examples

Removing Non-Printable Characters From a String

// JavaScript - ASCII only
const text = "Hello\x00World\x1F!";
const cleaned = text.replace(/[^ -~]/g, "");
// Result: "HelloWorld!"

// JavaScript - Unicode
const text2 = "Hello\x00World\x1F✓";
const cleaned2 = text2.replace(/[^\P{Cc}]/gu, "");
// Result: "HelloWorld✓"
# Python - ASCII only
import re
text = "Hello\x00World\x1F!"
cleaned = re.sub(r'[^ -~]', '', text)
# Result: "HelloWorld!"

# Python - Unicode
text2 = "Hello\x00World\x1F✓"
cleaned2 = re.sub(r'\p{Cc}', '', text2)
# Result: "HelloWorld✓"
# Ruby - ASCII only
text = "Hello\x00World\x1F!"
cleaned = text.gsub(/[^ -~]/, '')
# Result: "HelloWorld!"

# Ruby - Unicode
text2 = "Hello\x00World\x1F✓"
cleaned2 = text2.gsub(/\p{Cc}/, '')
# Result: "HelloWorld✓"
// PHP - ASCII only
$text = "Hello\x00World\x1F!";
$cleaned = preg_replace('/[^ -~]/', '', $text);
// Result: "HelloWorld!"

// PHP - Unicode
$text2 = "Hello\x00World\x1F✓";
$cleaned2 = preg_replace('/\p{Cc}/u', '', $text2);
// Result: "HelloWorld✓"

Validating That a String Contains Only Printable Characters

// JavaScript - ASCII only
const regex = /^[ -~]+$/;
console.log(regex.test("Hello World!")); // true
console.log(regex.test("Hello\nWorld")); // false

// JavaScript - Unicode
const regex2 = /^\P{Cc}+$/u;
console.log(regex2.test("Hello World ✓")); // true
console.log(regex2.test("Hello\nWorld")); // false
# Python - ASCII only
import re
regex = re.compile(r'^[ -~]+$')
print(bool(regex.match("Hello World!"))) # True
print(bool(regex.match("Hello\nWorld"))) # False

# Python - Unicode
regex2 = re.compile(r'^\P{Cc}+$')
print(bool(regex2.match("Hello World ✓"))) # True
print(bool(regex2.match("Hello\nWorld"))) # False
# Ruby - ASCII only
regex = /^[ -~]+$/
puts regex.match?("Hello World!") # true
puts regex.match?("Hello\nWorld") # false

# Ruby - Unicode
regex2 = /^\P{Cc}+$/
puts regex2.match?("Hello World ✓") # true
puts regex2.match?("Hello\nWorld") # false
// PHP - ASCII only
$regex = '/^[ -~]+$/';
var_dump(preg_match($regex, "Hello World!") === 1); // true
var_dump(preg_match($regex, "Hello\nWorld") === 1); // false

// PHP - Unicode
$regex2 = '/^\P{Cc}+$/u';
var_dump(preg_match($regex2, "Hello World ✓") === 1); // true
var_dump(preg_match($regex2, "Hello\nWorld") === 1); // false

Recommendation

  • For ASCII-only text, use [ -~] for simplicity and consistency across platforms
  • For Unicode text, use \P{Cc} if your regex engine supports Unicode properties
  • Avoid relying on [:print:] or \p{Print} unless you're certain about how they behave in your specific regex flavor and locale