Newsgroups: comp.lang.apl
Path: watmath!watserv2.uwaterloo.ca!mach1!torn!utcsri!rpi!zaphod.mps.ohio-state.edu!howland.reston.ans.net!spool.mu.edu!darwin.sura.net!newsserver.jvnc.net!phage!wchang
From: wchang@phage.cshl.org (William Chang in Marr Lab - CSHL)
Subject: Re: Quality of Implementation
Message-ID: <C1uEv8.96A@phage.cshl.org>
Keywords: Needless copying, one data point.
Organization: Cold Spring Harbor Lab, Long Is New York
References: <1539@kepler1.rentec.com>
Date: Tue, 2 Feb 1993 22:37:56 GMT
Lines: 38

Here's a simple dynamic programming algorithm for
"approximate string matching":

P[1..m] = pattern
T[1..n] = text
D = (n+1)x(m+1) table whose entry 
D[i,j] = minimum number of edit operations needed to
         convert P[1..j] to _some_ T[i'..i]
where edit operations are substitutions, insertions,
or deletions of single letters,

so D[0,j] = 0; D[i,0] = i; /* boundary conditions */
D[i,j] = min(D[i-1,j] + 1, /* insert T[i] */
             D[i,j-1] + 1, /* delete P[j] */
             D[i-1,j-1] + (0 if T[i]=P[j], 
                           1 o.w. /* substitute */) );
and D[i,m] = "best match of pattern P ending at T[i]".

The naive double nested loop APL code is very slow;
however, with a simple transformation 
  D'[i,j] = D[i,j] - i
the inner loop on i can be replaced by SCAN (\):

(! iota  ? rho  @ quad  ~ drop  [APL?!])

    Z <- P ASM T ; @IO
[1] @IO <- 0         o} origin 0
[2] Z <- -!1+?T      o} transformation
[3] J <- 1
[4] LOOPJ: Z <- min\ J, (1~Z+1) min ((_1~Z)-P[J-1]=T)
[5] ->LOOPJ -! (?P)>=J<-J+1   o} dyadic -! is "if"
[6] Z <- 1~Z+!1+?T   o} undo transformation

which is not too slow (but still slower than C).

Give this a try!

-- Bill Chang (wchang@cshl.org)
