Basic ASCII Printable Characters
To match all printable ASCII characters, you can use the following regex:
[ -~]
Explanation:
[ -~]
matches all characters in the range from space (ASCII DEC 32) to tilde (ASCII DEC 126).- This range includes all printable ASCII characters: letters, numbers, punctuation, and symbols.
- The space character is at the start of the printable range, and the tilde
~
is at the end.
JavaScript
const regex = /[ -~]/g;
const text = "Hello\nWorld\t123!";
const printable = text.match(regex).join("");
// Result: "HelloWorld123!"
Python
import re
text = "Hello\nWorld\t123!"
printable = ''.join(re.findall(r'[ -~]', text))
# Result: "HelloWorld123!"
Ruby
text = "Hello\nWorld\t123!"
printable = text.scan(/[ -~]/).join
# Result: "HelloWorld123!"
PHP
$text = "Hello\nWorld\t123!";
preg_match_all('/[ -~]/', $text, $matches);
$printable = implode('', $matches[0]);
// Result: "HelloWorld123!"
Unicode: Excluding Control Characters
If your regex flavor supports Unicode properties, the best way to match all printable characters (including Unicode characters) is:
\P{Cc}
Explanation:
\P{Cc}
matches any character that's not a control character.- This works for both ASCII control characters
[\x00-\x1F\x7F]
and Latin1 control characters[\x80-\x9F]
(also known as the C1 control characters). - The uppercase
\P
means "not in this category", andCc
is the Unicode category for control characters.
JavaScript (ES2018+)
const regex = /\P{Cc}/gu;
const text = "Hello\nWorld\t✓";
const printable = text.match(regex).join("");
// Result: "HelloWorld✓"
Python
import re
text = "Hello\nWorld\t✓"
printable = ''.join(re.findall(r'\P{Cc}', text))
# Result: "HelloWorld✓"
Ruby
text = "Hello\nWorld\t✓"
printable = text.scan(/\P{Cc}/).join
# Result: "HelloWorld✓"
PHP
$text = "Hello\nWorld\t✓";
preg_match_all('/\P{Cc}/u', $text, $matches);
$printable = implode('', $matches[0]);
// Result: "HelloWorld✓"
POSIX Classes & Platform Differences
The problem with POSIX classes like [:print:]
or \p{Print}
is that they can match different things depending on the regex flavor and, possibly, the locale settings of the underlying platform.
Java Behavior
In Java, POSIX classes are strictly ASCII-oriented:
String text = "Hello\nWorld\t✓";
// \p{Print} matches only ASCII printing characters [\x20-\x7E]
Pattern pattern1 = Pattern.compile("\\p{Print}");
// \P{Cntrl} matches everything that's not an ASCII control character [^\x00-\x1F\x7F]
// This includes non-ASCII characters AND C1 control characters
Pattern pattern2 = Pattern.compile("\\P{Cntrl}");
Explanation:
\p{Print}
matches only ASCII printing characters[\x20-\x7E]
.\P{Cntrl}
matches any ASCII character that isn't a control character, or any non-ASCII character - including C1 control characters (note the capitalP
).- This means
\P{Cntrl}
in Java is not equivalent to\P{Cc}
in Unicode-aware regex engines.
Comparing Different Approaches
Here's a quick reference for different regex patterns and what they match:
Pattern | Matches | Notes |
---|---|---|
[ -~] | ASCII printable characters (32-126) | Simple and reliable for ASCII-only text |
\P{Cc} | All characters except control characters | Best for Unicode text; excludes ASCII and C1 controls |
\p{Print} | Printable characters | Behavior varies by regex flavor and locale |
\P{Cntrl} | Non-control characters | Java-specific; ASCII-oriented |
[^\x00-\x1F\x7F] | Non-ASCII-control characters | Explicit ASCII control exclusion |
Practical Examples
Removing Non-Printable Characters From a String
// JavaScript - ASCII only
const text = "Hello\x00World\x1F!";
const cleaned = text.replace(/[^ -~]/g, "");
// Result: "HelloWorld!"
// JavaScript - Unicode
const text2 = "Hello\x00World\x1F✓";
const cleaned2 = text2.replace(/[^\P{Cc}]/gu, "");
// Result: "HelloWorld✓"
# Python - ASCII only
import re
text = "Hello\x00World\x1F!"
cleaned = re.sub(r'[^ -~]', '', text)
# Result: "HelloWorld!"
# Python - Unicode
text2 = "Hello\x00World\x1F✓"
cleaned2 = re.sub(r'\p{Cc}', '', text2)
# Result: "HelloWorld✓"
# Ruby - ASCII only
text = "Hello\x00World\x1F!"
cleaned = text.gsub(/[^ -~]/, '')
# Result: "HelloWorld!"
# Ruby - Unicode
text2 = "Hello\x00World\x1F✓"
cleaned2 = text2.gsub(/\p{Cc}/, '')
# Result: "HelloWorld✓"
// PHP - ASCII only
$text = "Hello\x00World\x1F!";
$cleaned = preg_replace('/[^ -~]/', '', $text);
// Result: "HelloWorld!"
// PHP - Unicode
$text2 = "Hello\x00World\x1F✓";
$cleaned2 = preg_replace('/\p{Cc}/u', '', $text2);
// Result: "HelloWorld✓"
Validating That a String Contains Only Printable Characters
// JavaScript - ASCII only
const regex = /^[ -~]+$/;
console.log(regex.test("Hello World!")); // true
console.log(regex.test("Hello\nWorld")); // false
// JavaScript - Unicode
const regex2 = /^\P{Cc}+$/u;
console.log(regex2.test("Hello World ✓")); // true
console.log(regex2.test("Hello\nWorld")); // false
# Python - ASCII only
import re
regex = re.compile(r'^[ -~]+$')
print(bool(regex.match("Hello World!"))) # True
print(bool(regex.match("Hello\nWorld"))) # False
# Python - Unicode
regex2 = re.compile(r'^\P{Cc}+$')
print(bool(regex2.match("Hello World ✓"))) # True
print(bool(regex2.match("Hello\nWorld"))) # False
# Ruby - ASCII only
regex = /^[ -~]+$/
puts regex.match?("Hello World!") # true
puts regex.match?("Hello\nWorld") # false
# Ruby - Unicode
regex2 = /^\P{Cc}+$/
puts regex2.match?("Hello World ✓") # true
puts regex2.match?("Hello\nWorld") # false
// PHP - ASCII only
$regex = '/^[ -~]+$/';
var_dump(preg_match($regex, "Hello World!") === 1); // true
var_dump(preg_match($regex, "Hello\nWorld") === 1); // false
// PHP - Unicode
$regex2 = '/^\P{Cc}+$/u';
var_dump(preg_match($regex2, "Hello World ✓") === 1); // true
var_dump(preg_match($regex2, "Hello\nWorld") === 1); // false
Recommendation
- For ASCII-only text, use
[ -~]
for simplicity and consistency across platforms - For Unicode text, use
\P{Cc}
if your regex engine supports Unicode properties - Avoid relying on
[:print:]
or\p{Print}
unless you're certain about how they behave in your specific regex flavor and locale