Back to blog
Dec 06, 2024
4 min read

LeetCode 940: Distinct Subsequences II

Leetcode 940: Distinct Subsequences II solution in Python

Problem Description

LeetCode Problem 940

Given a string s, return the number of distinct non-empty subsequences of s. Since the answer may be very large, return it modulo 109 + 7.

A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., “ace” is a subsequence of abcde while “aec” is not.

 

Example 1:

Input: s = “abc” Output: 7 Explanation: The 7 distinct subsequences are “a”, “b”, “c”, “ab”, “ac”, “bc”, and “abc”.

Example 2:

Input: s = “aba” Output: 6 Explanation: The 6 distinct subsequences are “a”, “b”, “ab”, “aa”, “ba”, and “aba”.

Example 3:

Input: s = “aaa” Output: 3 Explanation: The 3 distinct subsequences are “a”, “aa” and “aaa”.

 

Constraints:

  • 1 <= s.length <= 2000
  • s consists of lowercase English letters.

Difficulty: Hard

Tags: string, dynamic programming

Rating: 97.87%

Solution Approach

Intuition

  1. For each position i in the string, we want to calculate the number of distinct subsequences that end at position i.

  2. When we encounter a new character, we can:

    • Either use it to form new subsequences by appending it to all existing subsequences
    • Or start a new subsequence with just this character
  3. However, we need to handle duplicates carefully. If we’ve seen a character before, we need to subtract some combinations to avoid counting duplicates.

Dynamic Programming Solution

Here’s my Python solution with detailed explanations:

class Solution:
    def distinctSubseqII(self, s: str) -> int:
        MOD = 10**9 + 7
        dp = [1]  # Initialize with 1 (empty subsequence)
        last_occ = {}  # Track last occurrence of each character
        
        for i, c in enumerate(s):
            current = dp[-1]  # Get count from previous position
            # Double the count (append current char to all existing subsequences)
            new_count = (current * 2) % MOD
            
            # If we've seen this character before
            if c in last_occ:
                j = last_occ[c]
                # Subtract the count of subsequences that would create duplicates
                new_count = (new_count - dp[j]) % MOD
            
            dp.append(new_count)
            last_occ[c] = i  # Update last occurrence
            
        return (dp[-1] - 1) % MOD  # Subtract empty subsequence
  1. Base Case: We start with dp[0] = 1, representing the empty subsequence.

  2. State Transition: For each character c at position i:

    • We multiply the previous count by 2 because we can append c to all existing subsequences
    • This gives us all old subsequences plus all new ones with c appended
  3. Handling Duplicates: When we encounter a character we’ve seen before:

    • We need to subtract the count of subsequences that would create duplicates
    • We can find this by looking at the dp value at the last occurrence of the character
  4. Final Result: We subtract 1 from the final answer to exclude the empty subsequence

Example Walkthrough

Let’s trace through the example s = “aba”:

  1. Start with empty subsequence: dp = [1]

  2. Process ‘a’:

    • Double previous (1 * 2 = 2)
    • Haven’t seen ‘a’ before
    • dp = [1, 2] (“a”)
  3. Process ‘b’:

    • Double previous (2 * 2 = 4)
    • Haven’t seen ‘b’ before
    • dp = [1, 2, 4] (“a”, “b”, “ab”)
  4. Process ‘a’:

    • Double previous (4 * 2 = 8)
    • Seen ‘a’ before at position 0
    • Subtract dp[0] = 1
    • dp = [1, 2, 4, 7] (“a”, “b”, “ab”, “aa”, “ba”, “aba”)
  5. Final result: 7 - 1 = 6 distinct non-empty subsequences

Complexity Analysis

  • Time Complexity: O(n), where n is the length of the string
  • Space Complexity: O(n) to store the dp array and last occurrences

The solution is efficient because we only need to iterate through the string once and perform constant-time operations at each step.