## Finding the Longest Common Subsequence in Python
The Longest Common Subsequence (LCS) problem is a classic algorithmic problem often encountered in fields such as bioinformatics, version control, and text comparison. The challenge is to find the longest subsequence present in both given sequences, which can be derived by deleting some characters without changing the order of the remaining characters.
Understanding the Algorithm
A common method to solve the LCS problem is by using dynamic programming. The basic idea is to create a 2D table where each cell (i, j) holds the length of the LCS of the first i characters of string X and the first j characters of string Y. The algorithm follows these key steps:
1. If the characters X[i-1] and Y[j-1] are the same, the length of the LCS at that cell is 1 + LCS(i-1, j-1).
2. If they are different, the length of the LCS is the maximum between LCS(i-1, j) and LCS(i, j-1).
Python Implementation
The following Python function implements the above logic to find the length of the LCS, along with a utility to reconstruct the LCS itself.
def longest_common_subsequence(X, Y):
m = len(X)
n = len(Y)
# Create a 2D array to store lengths of longest common subsequence.
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Build the dp array
for i in range(1, m + 1):
for j in range(1, n + 1):
if X[i - 1] == Y[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + 1
else:
dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
# The length of LCS is found in dp[m][n]
lcs_length = dp[m][n]
# Reconstructing the LCS
lcs = []
while m > 0 and n > 0:
if X[m - 1] == Y[n - 1]:
lcs.append(X[m - 1])
m -= 1
n -= 1
elif dp[m - 1][n] > dp[m][n - 1]:
m -= 1
else:
n -= 1
# Since we've built lcs backwards, we need to reverse it
lcs.reverse()
return lcs_length, ''.join(lcs)
# Example usage
if name == "main":
str1 = "AGGTAB"
str2 = "GXTXAYB"
length, sequence = longest_common_subsequence(str1, str2)
print(f"Length of LCS: {length}")
print(f"LCS: {sequence}")
Explanation of the Code
1. Initialization: A 2D list dp is initialized to store lengths of common subsequences. dp[i][j] indicates the length of LCS of the first i characters of X and the first j characters of Y.
2. Building the DP Table: Two nested loops iterate through each character of both strings. If characters match, the diagonal value in dp is incremented. If they do not match, the larger value between the left or top cell is taken.
3. LCS Reconstruction: After filling the dp table, another while loop retrieves the sequence by tracing back through the dp table.
4. Output: The function returns both the length of the LCS and the subsequence itself.
This function is efficient, operating with a time complexity of \(O(m \times n)\) and space complexity of \(O(m \times n)\), where \(m\) and \(n\) are the lengths of the two input strings.