We present the first formal proof that partial evaluation of a quadratic string matcher can yield the precise behaviour of the KMP string matcher. Obtaining the KMP is a canonical example of partial evaluation: one starts from the naive, quadratic program checking whether a pattern occurs in a text; one observes that backtracking can be performed at partial-evaluation time (a binding-time shift); and one obtains a specialized program that does not back up on the text, a la KMP. We are not aware, however, of any formal proof that partial evaluation of a staged string matcher precisely yields the KMP string matcher. To compare the performance of the KMP and the result of specializing a staged string matcher, we have compared the traces of the successive indices at which the text is addressed. We have found that staged string matchers keeping no negative information do not perform as well as the KMP; that staged string matches keeping one character of negative information operate in lock step with the KMP; and that staged string matchers keeping a list of characters of negative information perform better than the KMP. In this article, we formally prove that a staged string matcher keeping one character of negative information operates in lock step with the KMP. We also state the (mild) conditions under which specializing this staged string matcher with respect to a pattern string provably yields a specialized string matcher whose size and time complexity is proportional to the length of this pattern string. Finally, we show how tabulating one of the functions in this staged string matcher gives rise to the `next´table of the original KMP. The method scales for obtaining other linear string matchers, be they known or new.
Asia-pepm '02 Proceedings of the Asian Symposium on Partial Evaluation and Semantics-based Program Manipulation, 2002, p. 32-46