                       Technical Information
                       for the APL Newsreader

                           November 1994


This document contains technical information about the newsreader.  You
need not read it if you are only interested in reading the comp.lang.apl
archives.

   The newsreader can be used to browse files other than the
comp.lang.apl archive.  I personally use the newsreader to maintain an
archive of my e-mail and to browse other USENET newsgroups that I
download from a Unix machine.

   A message file is an ASCII text file with each line followed by
either carriage return plus linefeed or linefeed alone (as is the
convention on Unix systems).  Each message should be followed by an
ASCII Record Separator character (decimal 30) on a line by itself.  The
software tolerates and ignores a Ctrl-Z end-of-file mark as the last
character of the file.

   Messages can be moved from a Unix system to a PC using the following
procedure:  Have your Unix newsreader write the messages to separate
files named cla.1, cla.2, cla.3, etc.  For example, using the "tin"
newsreader, you would tag all the messages (by pressing capital "T"
repeatedly), save them (by pressing "s"), give a file name prefix of
"cla", and specify no preprocessing (press "n").  Then exit the
newsreader and execute the following Unix commands to join the messages
together into a single file with a Record Separator character between
each message.

        touch cla                 -- create the output file
        cd News                   -- switch to the News directory
        foreach f (cla*)          -- Loop for each cla.1, cla.2, etc.
          cat $f >>../cla         --   append cla.n to cla
          echo ^^ >>../cla        --   append the Record Separator
                                       (Note: "^^" is entered as Ctrl-6.)
        end                       -- Endloop
        rm cla*                   -- erase cla.1, cla.2, etc.
        cd ..                     -- return to home directory 

This sequence assumes that the C shell is in use, and that the messages
are stored in a subdirectory named "News".  After doing this, the file
"cla" contains all the messages.  Zip it up and Zmodem it to your PC.
(How you do this will vary from system to system.)


Extending a Message File

You can append new messages to a file by using the "extend" command from
command mode in the newsreader.  The syntax is:

        EXTEND  targetfile  sourcefile

"targetfile" is the name of the file that will be extended, and
"sourcefile" is the file containing the new data.  Both files should be
message files in the format described below.  All that EXTEND does is
catenate the two files.  Users who are programmers can easily write
their own APL functions or DOS procedures to accomplish the same task.

Note:  Do not use EXTEND to join two directory files (files having the
extension .DIR); the directory for the target file will be updated
automatically the next time you view the file.


Message Header Format

The first nonblank line in a message marks the start of the message
header.  The header ends with the first blank line following a line that
begins with "Date:".  The header is examined to find the date, author,
and subject of the message.  These are extracted from lines that begin
with "Date:", "From:", and "Subject:", respectively.  (The search is
case-sensitive.)  If a "Reply-To:" line is found, it will be used in
place of the "From:" line.  (This finds the correct author address for
people posting via the APL$L mailing list.)  If no "From:" line is
found, a lines that begins ">From:" will be used as a substitute.

   The Date parser accepts the two forms commonly found in USENET
messages:

         Saturday, 1 Jan 93 00:00:00 GMT
and 
         Sat Jan 1 00:00:00 {GMT} 1993

It recognizes many (but not all) of the time zone codes found in
messages.  Unrecognized codes are treated as GMT and are displayed after
the directory generation is complete.  The variable TZONES contains the
list of recognized time zone codes and shifts.

   The From parser uses any text within parentheses as the author's
name, or, if there are no parentheses, uses text up to the first "<" as
the name.  For example:

         rn@e-mail.address (Real Name)
         Real Name <rn@e-mail.address>

   The Subject parser drops the phrase "Re:" (in upper or lower case)
from the front of the subject.
   

Directory File Format

The first line of a directory file is header information, consisting of
five numbers formatted as text:

[1]  number of records (messages) in the directory
[2]  length of each record in the directory (excluding the CR,LF at the
     end of each record)
[3]  flag: 1 if this is directory is for e-mail (To: field included)
        or 0 if this is a news file (To: field omitted)
[4]  size (in bytes) of the message file when the directory was last 
     updated 
[5]  number of columns to the left of the Author field in the directory
[6]  width of the Author field
[7]  number of columns to the left of the Subject field in the directory
[8]  width of the Subject field 
[9]  number of columns to the left of the To field in the directory 
[10] width of the To field 
     (Elements [9] and [10] will be 0 if this is not an e-mail file.)
[11] default sort order for directory, packed as a decimal number where
     1=date, 2=author, 3=recipient, 4=subject, 5=thread.

The rest of the file (following the carriage return and linefeed that
end the header line) is conceptually a character matrix (whose shape is
Header[1 2]+0 2) having a row for each directory entry, and columns
holding:

[;1-4]   - pointer to start of msg in file, 32-bit integer 
[;5-8]   - length of message, 32-bit integer
[;9-16]  - timestamp for message, 64-bit IEEE real in FTIMEBASE format
[;17-97] - the directory entry (as displayed on screen)
[;98 99] - CR,LF ending the record

(The binary data is stored in the usual Intel low-order-at-the-low-
address format.)

   When the newsreader reads a directory file, it compares the current
size of the message file with the value given in Header[4].  If the
message file has grown since the directory was generated, new data past
the Header[4] ending point is scanned and the new directory entries are
appended to the existing entries.

                              - End -
