Introduction
Imagine searching for a specific word or phrase within a large document. We might use the brute-force approach, comparing every possible starting position of the pattern within the text. However, as the text and pattern grow larger, this method becomes increasingly inefficient. This is where the Knuth-Morris-Pratt (KMP) algorithm shines. It offers a remarkable solution for pattern searching, significantly enhancing efficiency compared to brute-force techniques.
The Power of the KMP Algorithm
The KMP algorithm is a powerful tool for finding a specific pattern within a larger string. It leverages the concept of "prefix function" to optimize the search process. Instead of comparing characters one by one, it uses a precomputed lookup table to determine the maximum possible shift of the pattern after a mismatch. This intelligent approach minimizes redundant comparisons, resulting in faster search times.
Understanding the Algorithm's Core Concepts
At the heart of the KMP algorithm lies the concept of the "prefix function." This function is calculated for the pattern and helps determine the maximum shift to be made after a mismatch. Let's break down this concept:
1. Prefix Function: The Key to Efficient Shifts
The prefix function for a pattern is a table that indicates, for each index in the pattern, the length of the longest proper prefix that is also a suffix of the substring ending at that index. A proper prefix is a prefix that is not the entire string.
2. Computing the Prefix Function: Building the Lookup Table
We calculate the prefix function iteratively. Consider the pattern "AABAACAADA". The prefix function for this pattern would be:
Index | Character | Longest Proper Prefix that is also a Suffix | Prefix Function Value |
---|---|---|---|
0 | A | 0 | |
1 | A | A | 1 |
2 | B | 0 | |
3 | A | A | 1 |
4 | A | AA | 2 |
5 | C | 0 | |
6 | A | A | 1 |
7 | A | AA | 2 |
8 | D | 0 | |
9 | A | A | 1 |
This table serves as a lookup for the KMP algorithm, guiding the pattern shifts.
3. Pattern Matching with the KMP Algorithm: A Step-by-Step Process
The KMP algorithm operates by iteratively comparing the pattern with the text. Let's illustrate this with an example:
- Text: "ABABDABACDABABCABAB"
- Pattern: "ABABCABAB"
- Initialization: Initialize two pointers,
i
for the text andj
for the pattern, both starting at 0. - Comparison: Compare characters at positions
i
andj
. - Match: If they match, increment both
i
andj
. - Mismatch: If they mismatch, use the prefix function to determine the shift.
- Shift: Shift the pattern by
j - prefixFunction[j - 1]
positions. Resetj
toprefixFunction[j - 1]
. - Termination: Repeat steps 2-5 until
j
is equal to the length of the pattern. Ifj
reaches the end of the pattern, a match is found.
Example:
i | Text | j | Pattern | Prefix Function Value | Shift |
---|---|---|---|---|---|
0 | A | 0 | A | 0 | 0 |
1 | B | 1 | B | 1 | 0 |
2 | A | 2 | A | 0 | 0 |
3 | B | 3 | B | 1 | 0 |
4 | D | 4 | C | 0 | 2 |
4 | D | 0 | A | 0 | 0 |
5 | A | 1 | B | 1 | 0 |
6 | B | 2 | A | 0 | 0 |
7 | D | 3 | B | 1 | 0 |
8 | A | 4 | C | 0 | 2 |
8 | D | 0 | A | 0 | 0 |
9 | A | 1 | B | 1 | 0 |
10 | B | 2 | A | 0 | 0 |
11 | C | 3 | B | 1 | 0 |
12 | D | 4 | C | 0 | 2 |
12 | D | 0 | A | 0 | 0 |
13 | A | 1 | B | 1 | 0 |
14 | B | 2 | A | 0 | 0 |
15 | C | 3 | B | 1 | 0 |
16 | A | 4 | C | 0 | 2 |
16 | A | 0 | A | 0 | 0 |
17 | B | 1 | B | 1 | 0 |
18 | A | 2 | A | 0 | 0 |
19 | B | 3 | B | 1 | 0 |
20 | A | 4 | C | 0 | 2 |
20 | A | 0 | A | 0 | 0 |
21 | B | 1 | B | 1 | 0 |
22 | A | 2 | A | 0 | 0 |
23 | B | 3 | B | 1 | 0 |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 | |
24 | 1 | B | 1 | 0 | |
24 | 2 | A | 0 | 0 | |
24 | 3 | B | 1 | 0 | |
24 | 4 | C | 0 | 2 | |
24 | 0 | A | 0 | 0 |
|