Newsgroups: comp.lang.apl
Path: watmath!watserv1!utgpu!news-server.csri.toronto.edu!rpi!usc!wupost!darwin.sura.net!haven.umd.edu!socrates!socrates!rockwell
From: rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell)
Subject: Re: building an emacs TAGS file
In-Reply-To: rockwell@socrates.umd.edu's message of Sun, 23 Feb 1992 18:43:29 GM
Message-ID: <ROCKWELL.92Feb23142124@socrates.umd.edu>
Sender: rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell)
Organization: Traveller
References: <ROCKWELL.92Feb22122907@socrates.umd.edu>
Date: Sun, 23 Feb 1992 19:21:24 GMT
Lines: 203

[ second posting of this text -- I think I cancelled the first copy
before it got out, but there were a couple things in the code that I
was a bit embarrassed to have displayed. ]

emacs has a several features for dealing with source code spread out
over a number of files.  However, the program which builds the cross
reference information (/usr/local/emacs/etc/etags) can not deal with
the j source code, because of the heavy use of the preprocessor.

The obvious solution, since my main goal is to learn my way around j,
is to write a version of etags in j which can deal with this code.

As I mentioned in an earlier article, the execution time for a couple
derived functions was a significant obstacle to implementing this sort
of a program.  I had a couple problems getting LINKJ to compile, so
eventually I just modified x.c to directly support my compiled
versions of these functions.

Below is the code, in case anyone else wants to use it.  Efficiency is
still nothing to write home about, and the code could stand a facelift
or two, but this is something of a one-off job...

First off, here's the diffs for my modified x.c

221a222,229
> static F1(p_scan){A z;
>  ASSERT(1==AR(w),EVRANK);       /* w must be vector */
>  ASSERT(BOOL&AT(w),EVDOMAIN);   /* w must be boolean */
>  RZ(z=ga(INT,AN(w),1L,0L));     /* allocate space for result */
>  *AV(z)=*AV(w);                 /* result has same shape as argument */
>  {I*zv=AV(z),s=0;B*wv=(B*)AV(w);DO(AN(w),zv[i]=s+=wv[i];);}; /* plus scan */
>  R z;
> }
222a231,240
> static F1(m_scan){A z;
>  ASSERT(1==AR(w),EVRANK);       /* w must be vector */
>  ASSERT(INT&AT(w),EVDOMAIN);    /* w must be integer */
>  RZ(z=ga(INT,AN(w),1L,0L));     /* allocate space for result */
>  *AV(z)=*AV(w);                 /* result has same shape as argument */
>  /* should return here if z empty, to avoid seg fault on some machines */
>  {I*zv=AV(z),*wv=AV(w),s=*wv;DO(AN(w),zv[i]=s=MAX(s,wv[i]););}; /* max scan */
>  R z;
> }
> 
289a308,309
>   case XC(10,1):  R CDERIV(CIBEAM, p_scan,  0L,       1L,   0L,   0L   );
>   case XC(10,2):  R CDERIV(CIBEAM, m_scan,  0L,       1L,   0L,   0L   );

Second, the j script file (which I've named etags.js) follows, below
my signature line.

-- 
Raul Deluth Miller-Rockwell                   <rockwell@socrates.umd.edu>

NB. this is intended to be used as a silent script...

NB. Generate an emacs TAGS file for Hui's implementation of the J
NB. interpreter etags can not deal with the source, because of the
NB. extensive use of preprocessor directives.  A totally general
NB. solution would use cpp to process the code and strip out comments,
NB. use the # line, file hints to piece locate which lines correspond
NB. to which lines of the original, replace any quoted strings with
NB. white space, then locate all defined names in the original text to
NB. generate the search targets...
NB.
NB. This version works by explicitely listing each "definition word"
NB. and having an extraction rule for each word.  Also, to keep things
NB. simple, all definitions must start on the left side of the line
NB. (with no leading whitespace) -- much like existing implementations
NB. of etags.
NB.
NB. The structure of a TAGS file is as follows:
NB.  12 10 { a.
NB.  source file name, ',', length of body, 10 { a.
NB.  0 or more repetitions of:
NB.        text from begining of line up through defined word,
NB.        (127 {. a.), line number definition appears on,
NB.        ',', character number of begining of line, 10 { a.
NB. This whole thing is repeated once for each additional source file...
NB.
NB. In the case of multiple hits on the same line, the one with the
NB. longest text is used.
NB.
NB. Needless to say, the implemented algorithm is highly heuristic,
NB. and not likely to be robust in the face of other coding styles.
NB. Much like etags...

nl=. 10{a.
ff=. 12{a.
del=. 127{a.

NB. allow previously defined values to override 
ttyf=. '<''/dev/tty''' ". 'ttyf'        NB. use /dev/null for quietness
tagf=. '<''TAGS'''     ". 'tagf'        NB. tags file

cmd =. 0!:0                             NB. execute unix command
read =. 1!:1                            NB. read from file
write =. 1!:2                           NB. write to file (replace)
append =. 1!:3                          NB. append to file
noise   =. write&ttyf @:(,&nl)
noise 'Building etags function' 

NB. Words where defined word is followed by 2nd instance of whitespace
w0=. '#define '&E.

NB. Words where defined word is followed by right parenthesis
w1=. 'F1('&E.
w2=. 'F2('&E.
w3=. 'S1('&E.
w4=. 'S2('&E.
w5=. 'FMTX('&E.
w6=. 'TACT('&E.
wex=. _3&|. @:('PRE'&E.)                NB. exclude PREF1( and PREF2(
ww=. (wex < w1 +. w2) +. w3 +. w4 +. w5 +. w6

NB. All remaining non-blank lines which do not begin with one of these
NB. characters ('#/} e') are considered definitions [and the defining
NB. word is considered to be followed by a left parenthesis (included
NB. in TAGS file), or a newline character (not included in TAGS file),
NB. whichever comes first].
nw=. (_1&|. @:(=&nl)) > e.&('#/} e',nl)

NB. functions to correlate search hits with other nearby search hits
locate_hits =. *   i.@#                 NB. significant index numbers, in place

greatest_before =. >./\                 NB. propagate big numbers to end
greatest_before =. 10!:2                NB. *** use compiled code...
prev_in=. greatest_before @:locate_hits NB. previous hit indices

least_after =. <./\.                    NB. propagate small numbers to front
NB. find for each character the index of the next hit:
next_in =. >: @:least_after @:(# | <:@:locate_hits) 
NB. *** use compiled code:
next_in =. greatest_before &. (r :.r=.# | >: @:- @:|.) @:locate_hits

NB. lin_mat emits a two column matrix.  There are as many rows in
NB. lin_mat as there are characters in the argument. The first column
NB. is the line number on which each character occurs, the second
NB. column is offset (in characters) between the begining of the file
NB. and the begining of that line.
lin_mat =. (+/\ ,: >:@prev_in) @:(=&nl) NB. locate line of each character
lin_mat =. (10!:1,:>:@prev_in) @:(=&nl) NB. *** use compiled code...



NB. the next batch of functions are intended to be used as:
NB.           (fn  lin_mat) text
NB. that is, left arg is text, right is lin_mat...
NB. These functions reduce the data provided by lin_mat to occurances
NB. of the search predicates defined earlier.  These functions also
NB. add a third column to the result, which is the offset from the
NB. begining of the file to the end of the significant text for this
NB. particular hit.  I should probably have lin_mat compute the
NB. next_eol info, to avoid recomputing it each time [and, technicaly,
NB. defw should also be clipped at next_eol.]

next_space   =. next_in @:(=&' ')
second_space =. ({~ >:) @:next_space
def0 =.         w0@[ (# |:) ] , second_space@[ <. next_eol@[

next_rparen  =. next_in @:(e.&',)')
defw =.         ww@[ (# |:) ] , next_rparen@[

next_lparen  =. next_in @:(=&'(')
next_eol     =. <: @next_in @:(=&nl)
defn =.         nw@[ (# |:) ] , next_lparen@[ <. next_eol@[

NB. combine results, remove redundant defn's, then sort into a
NB. heuristically useful order (increasing length)
numeric_tags   =. lensort @:kill @:(\:~) @:(def0, defw, defn)
shift_minus_one=. _1&(|.!._1)           NB. x, y, z ->  _1, x, y
killA  =. #~ (~: shift_minus_one) @, @(0 _2&}.) NB. keep longest refs
killB  =. #~   * @, @(0 2&}.)           NB. kill deviance from t.c
kill   =. killA @:killB
lensort=. \:   -/ @:|: @:(0 1&}.)       NB. shorter hits are more specific

NB. These next functions format data for file.  Intended to be used like this:
NB.     (tags_line"_ 1 (numeric_tags lin_mat)) text
NB. that is, left arg is text, right arg is start/end of text to extract
first =. 0&{
second =. 1&{
third   =. 2&{
extract  =. {~  second@] + i. @:>: @(third - second)@]
tags_line =. extract, sep3, ":@first@], sep1, ":@second@], sep2

tags_body   =. ; @:(<@tags_line"_ 1 (numeric_tags lin_mat))

sep0 =. (ff,nl)"_                       NB. marks new section on new file
sep1 =. ','"_                           NB. ending character number follows
sep2 =. nl"_                            NB. marks end of line (end of record)
sep3 =. del"_                           NB. marks end of search text
body_format   =. sep1, ":@#, sep2, ]    NB. need length as well as tags text 

NB. wrap into a noisy package
NB. each of these operations takes a filename argument
t_start       =. noise @:('Now extracting tags from '&,)
t_end         =. noise @:('Done with '&,)
tagify        =. append&tagf @:(sep0,  [,  body_format @tags_body @read @<)
noisy_tagify  =. ([ t_end) @:([ tagify) @:([ t_start)

(read tagf) write tagf ,&.> '~'         NB. quietly backup old tags file
1!:55 tagf                              NB. quietly delete original copy
noisy_tagify;._2 cmd '/bin/ls *.[ch]'
