Newsgroups: comp.lang.apl
Path: watmath!watserv1!utgpu!cs.utexas.edu!wupost!darwin.sura.net!haven.umd.edu!socrates!socrates!rockwell
From: rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell)
Subject: Re: File I/O in J.
In-Reply-To: keil@apollo.hp.com's message of Wed, 29 Jan 1992 03:54:09 GM
Message-ID: <ROCKWELL.92Jan29001053@socrates.umd.edu>
Sender: rockwell@socrates.umd.edu (Raul Deluth Miller-Rockwell)
Organization: Traveller
References: <1992Jan28.024028.18052@newstand.syr.edu>
	<ROCKWELL.92Jan28091229@socrates.umd.edu>
	<1992Jan29.035409.1839@apollo.hp.com>
Date: Wed, 29 Jan 1992 05:10:53 GMT

Mark Keil:
     Ok, I can read & write ASCII & J objects. How would you go about
     reading RAW bainary stuff written as C structs? How would you do
     the conversions for all the different C types {assuming one knows
     the struct layout)?

You seem to be assuming that there is a coherent standard for raw C
data.  As far as I'm aware, no such standard exists.  Different C
compilers are free to store their binary files in totally incompatible
formats.  Accordingly, the most reliable way of converting raw C data
to a form readable by J is to write a C program that reads in the
binary data and writes it back out as formatted ascii.

As a general rule, it's going to be more complex to emulate C in J
than it is to just use C directly.  It's when you don't need to get
down to that level of grundiness that J starts to look good.

     This also bring up the question of how data is stored internally
     in J.

     I suspect that J Booleans are stored as bits, and J strings are
     stores as chars (zreo terminated?), intergers are stored as 32
     bit ints, and floats are stored as double precision floats. Is
     this true?

Well, the machine I normally run J on is down, so I can't check how
things are with version 4.  I'll have to rely on my memory of version
3.

Integers are stored as C longs [usually 32 bits], floating point
numbers as C double precision floats, and strings are stored as C
characters.  Further, there's an extra byte allocated with all strings
[presumably to stuff with a null character, so that C library routines
can be used without having to copy the string.]  Complex, by the way
are stored as two floats, side by side.

Booleans, however, are stored exactly as strings.  Presumably because
it is so difficult to index to the nearest bit using C.  [All versions
of J, so far, are written in C.]

Also, there is some header information that goes with each array,
which always takes at least 4 "C longs".  If you want to take a look
at specific examples, try this function:
   show_rep =.  a.&i.@(3!:1)

You can also build functions to just extract the headers [possibly
represented as longs, instead of four-byte-groups], or to present a
boxed array with the headers and the data all nicely grouped in their
own boxes.  But it's asking too much of my memory to expect to get
those right without double checking a couple things.

I do remember that the first long represents the "grammatical class"
and (for the case of nouns) the "storage class" of the object.  This
is the same information you get by running 3!:0 on a noun.  [Note that
version 2 had a bug that let you peek at verbs as well as nouns --
verbs, and most likely adverbs, conjunctions, and copulae have their
own "class number" that goes here.]  This is always a power of two,
which means that sets of types can be represented as a single long,
and membership can be tested with a C "boolean and".

The second "long" is the "reference count" of the object.  [This saves
having to make a copy of the whole thing every time it is used as an
argument to a function.]

The third "long" is the number of elements contained in the array.

The fourth is the rank of the array.

Following the rank is the shape of the array.  The shape consists of
zero or more integers [however many the rank specifies].  Following
the shape is the array data.

If you don't like playing around with specific cases to figure out
this kind of information, you can get a copy of LinkJ and look at the
file 'lj.h'.  This file is useful if you want to deal with J data in
C.  It's also handy for understanding some of the underlying
representations used by J [provided you can read C].

Once again, there is no single "C standard binary representation".
And, even if there was, there are some elements of J which do not map
easily into C.  So, if you write a program which uses this kind of
stuff, you probably will want to keep that part of the program
isolated [I/O will often have to be modified when porting a program].

Raul
<rockwell@socrates.umd.edu>
