In Python, finding the longest substring with at most k distinct characters can be efficiently achieved using the sliding window technique along with a hash map to keep track of the character counts. Below is a detailed implementation of the function that accomplishes this task.
Function Implementation
def longest_substring_k_distinct(s, k):
if k == 0 or not s:
return ""
left = 0
char_count = {}
max_length = 0
start_index = 0
for right in range(len(s)):
char_count[s[right]] = char_count.get(s[right], 0) + 1
while len(char_count) > k:
char_count[s[left]] -= 1
if char_count[s[left]] == 0:
del char_count[s[left]]
left += 1
# Update maximum length and starting index if needed
current_length = right - left + 1
if current_length > max_length:
max_length = current_length
start_index = left
return s[start_index:start_index + max_length]
Explanation of the Code
1. Function Signature: The function longest_substring_k_distinct(s, k) accepts a string s and an integer k.
2. Edge Cases: It checks if k is zero or if the string is empty. In either case, it immediately returns an empty string because no valid substring can exist.
3. Initialization: Variables are initialized:
- left is used for the left boundary of the sliding window.
- char_count is a dictionary to keep track of the frequency of characters in the current window.
- max_length tracks the length of the longest valid substring found.
- start_index records the starting index of the longest substring.
4. Sliding Window Technique:
- The right variable iterates over each character index in the string.
- The character at s[right] is added to char_count and its count is incremented.
- While the number of distinct characters (length of char_count) exceeds k, the window is shrunk from the left:
- The count of the character at s[left] is decremented, and if it drops to zero, that character is removed from char_count.
- left is incremented to shrink the window.
5. Updating the Maximum Length:
- After adjusting the window, the length of the current valid substring is calculated. If it is greater than max_length, the maximum length and starting index are updated.
6. Return Value: Finally, the function returns the longest substring found, slicing the original string s from start_index to start_index + max_length.
Example Usage
result = longest_substring_k_distinct("araaci", 2)
print(result) # Output: "araa"
This function efficiently finds the longest substring with at most k distinct characters in linear time, making it suitable for large strings.