Newsgroups: comp.lang.apl
Path: watmath!watserv1!utgpu!news-server.csri.toronto.edu!rpi!think.com!yale.edu!jvnc.net!darwin.sura.net!haven.umd.edu!socrates!socrates!rockwell
From: rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell)
Subject: Re: J (Re: A useful example for Perl (was Re: Which language can write this but Perl))
In-Reply-To: andrew@rentec.com's message of 15 Mar 92 19:10:52 GMT
Message-ID: <ROCKWELL.92Mar15213024@socrates.umd.edu>
Sender: rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell)
Organization: Traveller
References: <id.OVWN.R6D@ferranti.com> <FISCHER.92Mar14183922@solsort.iesd.auc.dk>
	<ROCKWELL.92Mar15010848@socrates.umd.edu> <726@kepler1.rentec.com>
Date: Mon, 16 Mar 1992 02:30:24 GMT

Andrew Mullhaupt:
   Iverson clearly set out to prove that it was possible to have a
   less readable language than APL without using a unique, runic
   character set.

Funny, that's what most of the APL programmers who I know say about
_any_ language which doesn't use the APL character set.

   Wait a while. As extensible as any language purports to be, as the
   years go by, people end up having to extend the language by
   reclaiming the syntax errors.  If the language survives, the syntax
   errors will go away.

I dunno... what kind of extensions did you have in mind?

I wrote:
   >   not_in=:     -. @ e.
   >   short_seq=:  >:   ,&0 @}.
   >   delim=:      short_seq  in not_in esc
   >   out=:   ; transl {~ alph i. delim <;.2 in

Andrew Mullhaupt:
   Ah yes. It's all becoming clear to me. This means that you must
   have gone wrong somewhere. Should you be using so many alphabetic
   characters?

Obviously an invitation for more documentation:

e.  is a set membership operator.  It checks each item in the left
argument for membership in the set of objects in the right argument.
The result is boolean, and each 1 or 0 corresponds to an element in
the left argument.  For example,
        'This is a test' e. 'hers'
0 1 0 1 0 0 1 0 0 0 0 1 1 0
Here, the first element of the result is 0 because 'T' does not occur
in 'hers'

-. is logical negation.  It just changes 0s to 1s and vice versa.

@ is function composition.  If you're not used to functional
languages, probably the best analogy is to unix's pipe.  -. @ e. takes
the result of e. and feeds it into -.  So the line that says
        not_in =. -. @ e.
defines an boolean operation which returns 1 for each element of the
left argument which is not in the right argument.

}. drops an item off the front of an array.  For example,
        }. 1 2 3 4
2 3 4

, is a generic catenate operation.  For example,
        2 3 4 , 0
2 3 4 0

& curries an infix operation by fixing one of the arguments.  The
result is a prefix operation.  Therefore, the part of the code which
says 
        ,&0 @}.
defines a left shift operation.  There are other ways of defining left
shift, but that's not important here.

>: is similar to C's >= (in other words, it returns 1 where the right
argument is greater than or equal to the left argument).  The reason
>: is used instead of >= lies in J's parsing rules -- in J, >: is a
single token while >= is two tokens.  If you feed >: boolean
arguments, the result behaves according to this truth table:

           right
           arg
      >:   0  1

left  0    1  0
arg   1    1  1


Next, a naked sequence of two functions results in a derived function.
If f and g are functions, and x is data, 
        (f g) x
is equivalent to
        x (f g) x
which is equivalent to
        x f g x
which is equivalent to
        x f (g x)
It's equivalent to other things, but I'll stop here.

Anyways, the function definition
        short_seq=:  >:   ,&0 @}.
applies to a boolean list and returns a 1 for each 1 in the original.
It also returns a 1 for each 0 in the original which has a 0 to the
right [and, because ,&0 shifts a 0 onto the right end of the list,
you're guaranteed that the rightmost element of the result is a 1].
But, short_seq returns a 0 for each 0 in the argument which has a 1 to
the right.  In other words, if 0s in the argument mark each occurance
of an escape character, 0s in the result mark each occurance of an
escape character followed by a non-escape character.

For example:
        short_seq  1 1 0 1 0 0 0 1 1 0 1 0 1 0
1 1 0 1 1 1 0 1 1 0 1 0 1 1

Or, if 'in' is a variable holding the text 'ab/c///de/f/g/', and 'esc'
is a variable holding '/', then
        delim=: short_deq  in not_in esc
will set delim to: 1 1 0 1 1 1 0 1 1 0 1 0 1 1

;. is a functional which will apply a function to each part of a
sequence.  If f is a function, n is a number, x is a boolean array
with a 1 indicating delimiting characters, and y is the sequence to be
parsed,
        x  f;.n  y
will apply f to each of the subsequences in y indicated by x.  The
number n indicates if delimiters are leading or trailing delimiters,
and whether or not the delimiters are to be seen by f.  If n is 2,
delimiters are trailing, and delimiters are seen by f.

< when used as a monadic operation is analogous to & used as a monadic
operation in C.  In other words, it returns a reference to an array.
In J, array references have a print representation which consists of a
box drawn around the contents of the array.

So, with the sample I've been using ('in' defined as
'ab/c///de/f/g/'), monadic < is applied to the following sequences:
        'a'
        'b'
       '/c'
        '/'
        '/'
       '/d'
        'e'
       '/f'
       '/g'
        '/'
And, the print representation of that is:
+-+-+--+-+-+--+-+--+--+-+
|a|b|/c|/|/|/d|e|/f|/g|/|
+-+-+--+-+-+--+-+--+--+-+

i. is a lookup function, which will look up items in the right
argument which appear in the left argument, and return their indices. 

For example, if 'alph' is a list which contains character sequences
(either single character, or escaped characters), such as:
+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+--+--+
|/|a|b|c|d|e|f|g|h|i|j|/a|/b|/c|/d|/e|/f|/g|/h|/i|/j|
+-+-+-+-+-+-+-+-+-+-+-+--+--+--+--+--+--+--+--+--+--+
then 
        alph i.  delim <;.2 in
would have the result
        1 2 13 0 0 14 5 16 17 0

Ideally, alph would contain every ascii character, and every escape
sequence, but for test cases that is not necessary.

{ is an indexing function.  x{y is analogous to y[x] in C.

~ is a functional which reverses the order of arguments to an
operation.  So, a{~b is equivalent to b{a in J, and is similar to a[b]
in C.

So, if transl is a translate table, for instance
+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+---+---+
|/|a|b|c|d|e|f|g|h|i|j|APE|BAT|CAT|DOG|ELF|FLY|GNU|HOT|ILK|JON|
+-+-+-+-+-+-+-+-+-+-+-+---+---+---+---+---+---+---+---+---+---+
then 
        transl {~ 1 2 13 0 0 14 5 16 17 0
would yield
+-+-+---+-+-+---+-+---+---+-+
|a|b|CAT|/|/|DOG|e|FLY|GNU|/|
+-+-+---+-+-+---+-+---+---+-+

; when used monadically, will take a list of array references and
and catenate those arrays together.  For example
        ; transl {~ 1 2 13 0 0 14 5 16 17 0
yields
abCAT//DOGeFLYGNU/

I wrote:
   >To provide full generality, you'd want to replace the definition
   >of 'short_seq' with something a little more involved (like a state
   >machine with more than two states)...

   Yes.  Prefereably a universal self-replicating multi-tape Turing
   machine with proof of undecidability for every input, which
   rewrites itself into the shortest possible J representation and
   composes poetry, too.

Um... I don't think that's necessary.  All I was trying to say was
that if you want a slash in the result you might want to make it so
that '//' returns a slash.  If you don't need that (for instance, if
you want '/s' to return a slash), then the above code would do fine.

[Yes, I recognize Andrew's comments as sarcasm, but I think he's
laying it on a little thick.  Anyways, hopefully this will have
cleared up any major questions about that section of code...]

-- 
Raul Deluth Miller-Rockwell                   <rockwell@socrates.umd.edu>
