Write a Python function to find the longest repeating substring in a string.
Posted by GraceDv
Last Updated: August 18, 2024
Finding the Longest Repeating Substring in a String Using Python
The task of finding the longest repeating substring within a string can be efficiently tackled using a combination of suffix arrays and the longest common prefix (LCP) array. The approach leverages sorting and comparison techniques to identify the longest repeated sequence without explicitly checking every possible substring, which would be computationally prohibitive for larger strings. Here’s a step-by-step breakdown of the algorithm along with the implementation in Python.
Steps to the Solution
1. Build Suffix Array: Generate all suffixes of the string and sort them. This gives a structured view of where substrings appear in the original string. 2. Calculate LCP Array: For each pair of consecutive suffixes in the sorted suffix array, compute the longest common prefix. This enables the identification of the longest repeating substring. 3. Identify the Longest Repeating Substring: Traverse the LCP array to find the maximum value, which corresponds to the length of the longest repeating substring.
Python Implementation
Below is the Python function that implements the above approach:
def longest_repeating_substring(s: str) -> str:
    def build_suffix_array(s: str):
        suffixes = sorted((s[i:], i) for i in range(len(s)))
        return [suff[1] for suff in suffixes]

    def build_lcp(s: str, suffix_array):
        n = len(s)
        rank = [0] * n
        lcp = [0] * n
        for i, suffix in enumerate(suffix_array):
            rank[suffix] = i
        h = 0
        for i in range(n):
            if rank[i] > 0:
                j = suffix_array[rank[i] - 1]  # previous suffix in sorted order
                while (i + h < n) and (j + h < n) and (s[i + h] == s[j + h]):
                    h += 1
                lcp[rank[i]] = h  # LCP for the suffix at rank[i]
                if h > 0:
                    h -= 1  # decrement h for the next suffix comparison
        return lcp

    if not s:
        return ""

    suffix_array = build_suffix_array(s)
    lcp = build_lcp(s, suffix_array)

    max_length = 0
    index = 0
    for i in range(1, len(lcp)):
        if lcp[i] > max_length:
            max_length = lcp[i]
            index = suffix_array[i]
    
    return s[index:index + max_length]

# Example Usage
input_string = "banana"
result = longest_repeating_substring(input_string)
print(f"The longest repeating substring in '{input_string}' is '{result}'")
Explanation of the Code
- Suffix Array Construction: The build_suffix_array function generates a sorted list of suffixes along with their starting indices. - LCP Calculation: The build_lcp function computes the longest common prefixes between sorted suffixes, resulting in an array that signifies the lengths of common substrings. - Finding the Result: The final loop through the LCP array identifies the maximum length and corresponding starting index of the longest repeating substring.
Example
For the string "banana", the longest repeating substring is "ana", which appears multiple times in the original string. The provided implementation effectively identifies and returns this substring. Using this method, it is possible to efficiently determine the longest repeating substring in a string, even for larger inputs.