Newsgroups: comp.lang.apl
Path: watmath!watserv1!utgpu!news-server.csri.toronto.edu!torsqnt!jtsv16!itcyyz!yrloc!rbe
From: rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)
Subject: Re: APL runtime optimizations?
Message-ID: <1992Mar12.234040.21580@yrloc.ipsa.reuter.COM>
Reply-To: rbe@yrloc.ipsa.reuter.COM (Robert Bernecky)
Organization: Snake Island Research Inc, Toronto
References: <ZIGGY.92Mar5192805@plex.ai.mit.edu> <1992Mar06.141720.23993@watson.ibm.com> <1992Mar10.160235.20913@yrloc.ipsa.reuter.COM> <1992Mar11.031256.5622@watson.ibm.com>
Distribution: comp
Date: Thu, 12 Mar 92 23:40:40 GMT

In article <1992Mar11.031256.5622@watson.ibm.com> gerth@watson.ibm.com (John Gerth) writes:
i>optimizations and points squarely at what I think is one of the most
>important motivations for true compilation - memory allocation.
 
That, and overhead reduction caused by primitive function setup stuff:
"Are these arrays conformable? Are the types OK? How many loops
 do I need? What execution and fetch types are required?"

>...static idiom recognition
>  
>  The final bit of enabling technology for incremental compilation was
>  eventually used for static idiom recognition.  Essentially this was
>  a process of salting the run-time token string for defined objects with
>  distinguished tokens indicating that an APL idiom had been recognized
>  earlier when the object was fixed.  The mechanism could afford to be fairly
>  optimistic since the implementation provided a robust fallback to normal
>  interpretation (i.e. all the tokens were still there, but could be
>  "skipped" if the idiom succeeded).  As I remember, idioms ranged from
>  obvious ones like "rank of array" up to "blank counting " (+/^\' '=...).

I think there is a lot of promise here in the compiler world, where you
can spend a fair bit of time staring at code to find patterns, and to
perform replacements and algebraic simplifications of the code.

>  
>  All of this was revisited for the 370 vector facility in the mid-80s --
>  a happy upshot of which was forcing us to adopt IPSA's exact computation
>  of comparison tolerance (replacing APL\360's venerable unnormalized

I got into the comparison tolerance fixup biz by accident: I was trying
to write a version of membership (and indexof) for floating point arrays.
With the baroque "salimi slicer" version of comparison tolerance used
up until then ("Are the first N bits of these numbers the same? Oh, OK,
then they're equal."), it was just impossible.

I ended up working with Doug Forkes on the new definitions, and 
ended up with a far superior system in many other respects as well.

>  number hack).  Due to the nature of the 370 vector hardware, Dick Dunbar
>  proposed and implemented extensions which permitted consecutive scalar
>  functions (strings) to execute on a single set of data (I don't know
>  any details of this as I was already at Watson and just doing the
>  basic vector work on a pro bono basis).       

Based on the timings I've seen, the "loop jamming" technique 
(Do A+B*C with one loop, rather than two loops, 2 sets of fetchs and
stores, two storage alloc/dealloc, etc) made a far more substantial
performance improvement than did the vector facility itself. 

>In closing, I should mention again support for real compilation.
>The problem with all the run-time interpreter optimizations is that
>interpreters don't deal with a large enough clump of code at one
>time to be able to make real optimizations (i.e. those which avoid
>unnecessary operations and/or temporary storage allocations).  

Interpreters are also bound by performance limitations: You can't spend
a half second staring at a function line to figure out how to make it
go fast in an interpreter. A compiler has the luxury of time to do this.

Robert Bernecky      rbe@yrloc.ipsa.reuter.com  bernecky@itrchq.itrc.on.ca 
Snake Island Research Inc  (416) 368-6944   FAX: (416) 360-4694 
18 Fifth Street, Ward's Island
Toronto, Ontario M5J 2B9 
Canada
