Newsgroups: comp.lang.apl
Path: watmath!watserv1!70530.1226@compuserve.com
From: Mike Kent <70530.1226@CompuServe.COM>
Subject: Your mileage may vary (APL execution efficiency)
Message-ID: <920318062402_70530.1226_CHC90-1@CompuServe.COM>
Sender: payne@watserv1.waterloo.edu (Doug Payne [DCS])
Organization: University of Waterloo
Date: Wed, 18 Mar 1992 06:24:03 GMT
Lines: 53

In article <1992Mar17.185047.8403@csi.jpl.nasa.gov>, <sam@jpl.nasa,gov
(Sam Sirlin) asks (re speed of APL vs. C or FORTRAN for a specific
problem) 

 >> Any references?


About three or four years ago I was working on optimizing some APL code
for speed.  One routine that I looked was the usual algorithm for doing a
partitioned sum; if V is the vector to be part-summed and parallel bit
vector B marks partition ends, take the first item and then first
differences of B/+\V.  The application was spending about 10% of its time
doing this [exclusive of interface time *within* AP126/GDDM].  I figured I
could slice a good chunk of this time away by using FORTRAN via []NA, so I
coded up a FORTRAN subroutine which looped over a suppplied vector of
partition lengths, and slipped along V.  The FORTRAN version was SLOWER
than the APL, so I decided to ignore the red line on the VS FORTRAN
ccompiler (the warning about OPT(2) sometimes leading to incorrect
results), recompiled, link-edited ... and APL2 was   s t i l l   faster
except in cases that did not arise in practice:

In all cases, V is 6000 items; timings are relative (APL2 = 1); the
variable that makes the difference is the number of partitions:

    # partitions      FORTRAN/APL ratio
    ------------      -----------------
	1              0.8                    this is really just +/
        8              1
       20              1.?
       50              1.?
      100              2.? 
      500              3.?
     1200              4.1                    the real case
     3000              4.?
     5000              5.?

Timings remembered only approximately where the ?s appear, run on a 3090
under VM; neither FORTRAN nor APL2 was using vector assist.  Just to make
sure that the []NA interface wasn't getting in the way of the timings, I
passed the same arguments with the same "names file" descriptors to a
BR14 assembler routine, and confirmed that the interface time was at
worst 1 or 2 % of the cost of the FORTRAN.  Good thing the FORTRAN was so
easy to code right, or I wouldn't have had time to get any useful work
done that day.

Of course if I had subsequently directly translated the FORTRAN to APL2
and run timings on the resulting nested-loop APL fn, the results would
have been probably about the same only with the roles of APL2 and FORTRAN
reversed (slow APL2, fast FORTRAN).

Moral(s):  Ssometimes APL isn't slow, it's just that the problem takes 
           time to solve.

