Write a Python function to find the longest substring with at most k distinct characters.
Posted by TinaGrn
Last Updated: August 31, 2024
In Python, finding the longest substring with at most k distinct characters can be efficiently achieved using the sliding window technique along with a hash map to keep track of the character counts. Below is a detailed implementation of the function that accomplishes this task.
Function Implementation
def longest_substring_k_distinct(s, k):
    if k == 0 or not s:
        return ""

    left = 0
    char_count = {}
    max_length = 0
    start_index = 0
    
    for right in range(len(s)):
        char_count[s[right]] = char_count.get(s[right], 0) + 1
        
        while len(char_count) > k:
            char_count[s[left]] -= 1
            if char_count[s[left]] == 0:
                del char_count[s[left]]
            left += 1
            
        # Update maximum length and starting index if needed
        current_length = right - left + 1
        if current_length > max_length:
            max_length = current_length
            start_index = left
            
    return s[start_index:start_index + max_length]
Explanation of the Code
1. Function Signature: The function longest_substring_k_distinct(s, k) accepts a string s and an integer k. 2. Edge Cases: It checks if k is zero or if the string is empty. In either case, it immediately returns an empty string because no valid substring can exist. 3. Initialization: Variables are initialized: - left is used for the left boundary of the sliding window. - char_count is a dictionary to keep track of the frequency of characters in the current window. - max_length tracks the length of the longest valid substring found. - start_index records the starting index of the longest substring. 4. Sliding Window Technique: - The right variable iterates over each character index in the string. - The character at s[right] is added to char_count and its count is incremented. - While the number of distinct characters (length of char_count) exceeds k, the window is shrunk from the left: - The count of the character at s[left] is decremented, and if it drops to zero, that character is removed from char_count. - left is incremented to shrink the window. 5. Updating the Maximum Length: - After adjusting the window, the length of the current valid substring is calculated. If it is greater than max_length, the maximum length and starting index are updated. 6. Return Value: Finally, the function returns the longest substring found, slicing the original string s from start_index to start_index + max_length.
Example Usage
result = longest_substring_k_distinct("araaci", 2)
print(result)  # Output: "araa"
This function efficiently finds the longest substring with at most k distinct characters in linear time, making it suitable for large strings.