In Python, checking if a string contains only non-ASCII characters can be accomplished using several methods. Here are a few effective approaches:
Method 1: Using Regular Expressions
The re module in Python allows for pattern matching, making it simple to determine if a string contains only non-ASCII characters.
import re
def contains_only_non_ascii(s):
return bool(re.match(r'^[^\x00-\x7F]*$', s))
# Example usage
test_string = "?????"
print(contains_only_non_ascii(test_string)) # Output: True
In this function, the regular expression ^[^\x00-\x7F]*$ matches a string that is composed solely of characters outside the ASCII range (0-127).
Method 2: Using String Encoding
You can also check the string encoding using the encode method. This approach attempts to encode the string in ASCII and will raise an exception if any ASCII characters are present.
def contains_only_non_ascii(s):
try:
s.encode('ascii')
return False # String contains ASCII characters
except UnicodeEncodeError:
return True # String contains only non-ASCII characters
# Example usage
test_string = "?????"
print(contains_only_non_ascii(test_string)) # Output: True
Method 3: Using Unicode Code Points
Python provides a way to iterate through each character in the string and check its Unicode code point. All ASCII characters have code points in the range of 0 to 127.
def contains_only_non_ascii(s):
return all(ord(char) > 127 for char in s)
# Example usage
test_string = "?????"
print(contains_only_non_ascii(test_string)) # Output: True
Conclusion
Any of these methods can effectively determine whether a string contains only non-ASCII characters. The choice of method may depend on specific needs, such as performance or readability. Regular expressions are powerful and concise, while encoding and code point checking methods offer clarity and straightforwardness. Choose the approach that best fits your project requirements.