How do you check if a string contains only non-ASCII characters in Python?

Posted by PaulAnd

Last Updated: August 21, 2024

string Python

In Python, checking if a string contains only non-ASCII characters can be accomplished using several methods. Here are a few effective approaches:

Method 1: Using Regular Expressions

The re module in Python allows for pattern matching, making it simple to determine if a string contains only non-ASCII characters.

import re

def contains_only_non_ascii(s):
    return bool(re.match(r'^[^\x00-\x7F]*$', s))

# Example usage
test_string = "?????"
print(contains_only_non_ascii(test_string))  # Output: True

In this function, the regular expression ^[^\x00-\x7F]*$ matches a string that is composed solely of characters outside the ASCII range (0-127).

Method 2: Using String Encoding

You can also check the string encoding using the encode method. This approach attempts to encode the string in ASCII and will raise an exception if any ASCII characters are present.

def contains_only_non_ascii(s):
    try:
        s.encode('ascii')
        return False  # String contains ASCII characters
    except UnicodeEncodeError:
        return True   # String contains only non-ASCII characters

# Example usage
test_string = "?????"
print(contains_only_non_ascii(test_string))  # Output: True

Method 3: Using Unicode Code Points

Python provides a way to iterate through each character in the string and check its Unicode code point. All ASCII characters have code points in the range of 0 to 127.

def contains_only_non_ascii(s):
    return all(ord(char) > 127 for char in s)

# Example usage
test_string = "?????"
print(contains_only_non_ascii(test_string))  # Output: True

Conclusion

Any of these methods can effectively determine whether a string contains only non-ASCII characters. The choice of method may depend on specific needs, such as performance or readability. Regular expressions are powerful and concise, while encoding and code point checking methods offer clarity and straightforwardness. Choose the approach that best fits your project requirements.

Posted by PaulAnd

Method 1: Using Regular Expressions

Method 2: Using String Encoding

Method 3: Using Unicode Code Points

Conclusion

Related Content

How do you check if a string contains only ASCII characters in Python?

C program to print the ASCII values 0 to 255 with numbers and related value

How do you use the ASCII function to return the ASCII code for a given character?

How do you use the CHAR function to return the character for a given ASCII code?

Convert ASCII Character to Hexadecimal and Binary using C++

How do you use the RIGHT function to extract a specified number of characters from the right side of a string?

How do you check if a string contains only alphabetic characters in Python?

How do you check if a string contains only lowercase characters in Python?

How do you check if a string contains only uppercase characters in Python?

How do you check if a string contains only alphanumeric characters in Python?

How do you check if a string contains only whitespace characters in Python?

How do you check if a string contains only printable characters in Python?

How do you check if a string contains only punctuation characters in Python?

How do you check if a string contains only hexadecimal characters in Python?