In Python, finding the longest substring with all distinct characters can be efficiently accomplished using the sliding window technique combined with a set to track characters. The algorithm operates in linear time complexity, making it suitable for large strings. Below is a Python function that implements this approach.
Function Implementation
def longest_distinct_substring(s: str) -> str:
char_set = set() # Set to store unique characters
left = 0 # Left pointer for the window
longest = 0 # Length of the longest substring found
start_index = 0 # Starting index of the longest substring
for right in range(len(s)):
# If character is already in the set, shrink the window from the left
while s[right] in char_set:
char_set.remove(s[left])
left += 1
# Add the character to the set
char_set.add(s[right])
# Update the longest substring if the current is longer
current_length = right - left + 1
if current_length > longest:
longest = current_length
start_index = left
# Return the longest substring
return s[start_index:start_index + longest]
# Example usage
input_string = "abcabcbb"
result = longest_distinct_substring(input_string)
print(f"The longest substring with all distinct characters is: '{result}'")
Explanation of the Code
1. Initialization:
- A set char_set is created to track the characters in the current substring.
- Two pointers left and right define the window's boundaries in the string.
- Variables longest and start_index keep track of the longest substring's length and starting index.
2. Iterating Through the String:
- As the right pointer moves through each character in the string, it checks if the character is already in the char_set.
- If the character is found, the left pointer is moved right, and characters are removed from the set until the current character can be included.
3. Updating the Longest Substring:
- The length of the current substring is calculated. If it exceeds longest, update both longest and start_index.
4. Returning the Result:
- After the loop, the longest substring with all distinct characters is retrieved using the stored start_index and length.
Example Output
For the input string "abcabcbb", the output will be:
The longest substring with all distinct characters is: 'abc'
This function is effective for any input string and can handle various cases, including empty strings and strings with all identical characters.