TechTorch

Location:HOME > Technology > content

Technology

Algorithm to Find Lexicographically Smallest Longest Common Subsequence

March 29, 2025Technology1477
Algorithm to Find Lexicographically Smallest Longest Common Subsequenc

Algorithm to Find Lexicographically Smallest Longest Common Subsequence

When dealing with two strings, the task of finding the longest common subsequence (LCS) is well understood. However, what about scenarios where there are multiple LCSs of maximum length, and we need to identify the lexicographically smallest one? This article delves into the nuances of this problem and presents a modified dynamic programming approach to tackle it effectively.

Algorithm Breakdown

To find the lexicographically smallest LCS among two strings, we can modify the traditional dynamic programming (DP) approach to compute the LCS. Let's go through the steps:

Dynamic Programming Table Construction

1. Create a 2D array dp[n][m], where dp[i][j] will hold the length of the longest common subsequence of the substrings A[0..i-1] and B[0..j-1].

2. Initialize the table such that dp[0][j] and dp[i][0] are 0 for all valid i and j.

Filling the DP Table

For each character pair A[i-1] and B[j-1]: If they match, set dp[i][j] dp[i-1][j-1] 1. If they do not match, set dp[i][j] max(dp[i-1][j], dp[i][j-1]).

Backtracking to Find LCS

1. Start from dp[m][n] where m and n are the lengths of strings A and B.

2. Use a recursive or iterative approach to backtrack through the dp table:

If A[i-1] B[j-1], that character is part of the LCS. Add it to the result and move diagonally up-left in the table. If A[i-1] ! B[j-1], choose the direction that has the larger value in the dp table. If both directions are equal, choose the one that leads to the lexicographically smaller character.

3. Store the characters in reverse order as you backtrack. Reverse the collected characters to get the final LCS.

Implementation in Python

Here is a Python implementation of the above algorithm:

def lexicographically_smallest_lcs(A, B):
    m, n  len(A), len(B)
    # Step 1: Create and fill the dp table
    dp  [[0] * (n   1) for _ in range(m   1)]
    for i in range(1, m   1):
        for j in range(1, n   1):
            if A[i - 1]  B[j - 1]:
                dp[i][j]  dp[i - 1][j - 1]   1
            else:
                dp[i][j]  max(dp[i - 1][j], dp[i][j - 1])
    # Step 2: Backtrack to find the lexicographically smallest LCS
    lcs  []
    i, j  m, n
    while i  0 and j  0:
        if A[i - 1]  B[j - 1]:
            (A[i - 1])
            i - 1
            j - 1
        elif dp[i - 1][j]  dp[i][j - 1]:
            i - 1
        elif dp[i - 1][j]  dp[i][j - 1]:
            j - 1
        else:
            # Both directions are equal, choose the lexicographically smaller character
            if A[i - 1]  B[j - 1]:
                (A[i - 1])
                i - 1
            else:
                j - 1
    # Step 3: Reverse the collected characters to form the LCS
    return ''.join(reversed(lcs))

Example Usage

Suppose we have the following strings:

A  "abcfbc"
B "abfcab"

With the above function, the output would be:

result  lexicographically_smallest_lcs(A, B)
print(result) Output: "abcf"

Explanation of the Code

The dp table is built to calculate the lengths of the LCS. During the backtracking phase, the solution decides which character to include based on matching characters and the values in the dp table, ensuring that we always opt for the lexicographically smaller option when faced with a tie.

This approach ensures that we efficiently find the lexicographically smallest longest common subsequence.