Technology
Algorithm to Find Lexicographically Smallest Longest Common Subsequence
Algorithm to Find Lexicographically Smallest Longest Common Subsequence
When dealing with two strings, the task of finding the longest common subsequence (LCS) is well understood. However, what about scenarios where there are multiple LCSs of maximum length, and we need to identify the lexicographically smallest one? This article delves into the nuances of this problem and presents a modified dynamic programming approach to tackle it effectively.
Algorithm Breakdown
To find the lexicographically smallest LCS among two strings, we can modify the traditional dynamic programming (DP) approach to compute the LCS. Let's go through the steps:
Dynamic Programming Table Construction
1. Create a 2D array dp[n][m], where dp[i][j] will hold the length of the longest common subsequence of the substrings A[0..i-1] and B[0..j-1].
2. Initialize the table such that dp[0][j] and dp[i][0] are 0 for all valid i and j.
Filling the DP Table
For each character pair A[i-1] and B[j-1]: If they match, set dp[i][j] dp[i-1][j-1] 1. If they do not match, set dp[i][j] max(dp[i-1][j], dp[i][j-1]).
Backtracking to Find LCS
1. Start from dp[m][n] where m and n are the lengths of strings A and B.
2. Use a recursive or iterative approach to backtrack through the dp table:
If A[i-1] B[j-1], that character is part of the LCS. Add it to the result and move diagonally up-left in the table. If A[i-1] ! B[j-1], choose the direction that has the larger value in the dp table. If both directions are equal, choose the one that leads to the lexicographically smaller character.3. Store the characters in reverse order as you backtrack. Reverse the collected characters to get the final LCS.
Implementation in Python
Here is a Python implementation of the above algorithm:
def lexicographically_smallest_lcs(A, B): m, n len(A), len(B) # Step 1: Create and fill the dp table dp [[0] * (n 1) for _ in range(m 1)] for i in range(1, m 1): for j in range(1, n 1): if A[i - 1] B[j - 1]: dp[i][j] dp[i - 1][j - 1] 1 else: dp[i][j] max(dp[i - 1][j], dp[i][j - 1]) # Step 2: Backtrack to find the lexicographically smallest LCS lcs [] i, j m, n while i 0 and j 0: if A[i - 1] B[j - 1]: (A[i - 1]) i - 1 j - 1 elif dp[i - 1][j] dp[i][j - 1]: i - 1 elif dp[i - 1][j] dp[i][j - 1]: j - 1 else: # Both directions are equal, choose the lexicographically smaller character if A[i - 1] B[j - 1]: (A[i - 1]) i - 1 else: j - 1 # Step 3: Reverse the collected characters to form the LCS return ''.join(reversed(lcs))
Example Usage
Suppose we have the following strings:
A "abcfbc"
B "abfcab"
With the above function, the output would be:
result lexicographically_smallest_lcs(A, B)
print(result) Output: "abcf"
Explanation of the Code
The dp table is built to calculate the lengths of the LCS. During the backtracking phase, the solution decides which character to include based on matching characters and the values in the dp table, ensuring that we always opt for the lexicographically smaller option when faced with a tie.
This approach ensures that we efficiently find the lexicographically smallest longest common subsequence.