2012-03-31

wordcode that includes grammar trees

3.6: adda/wordcode/including grammar trees:

. not only can we be encoding all words,
but also have etrees (grammar trees) to the words
. we should reserve some ascii control codes for this,
since that will be common .

. relative addressing can reduce pointer size,
ie, the pointers can be byte sized
if we know that their base address
(the address they are relative to)
is no farther than 256 units from the
farthest place the pointer can indicate .
. thus to be byte sized,
we need to have a separate etree for every phrase
(delimited by semicolon  or period)
and we need to know that the size of phrases
will be less than 128 words
(there will be as many pointers as words,
and the first node may have to point into
the middle of the phrases text,
so it has to jump past all the pointers,
plus half the text).
[3.7:
. we can give the pointers more range by
making them relative to where the text begins .
. we could have a code that says
"( this is the beginning of an etree;
what follows is a link to the beginning of the text;
if the first byte of this link is a negative number,
then there's only one byte,
and its absolute value is showing
how many bytes ahead the text is;
it also says the etree's pointers are byte-sized .
. if the first byte is positive,
then the link is composed of 2 bytes,
showing you how far ahead the text is;
this also says the etree's pointers are 2-byte sized .]

. you can tell what a node's pointer is pointing at
by looking at the 1st byte of what it's pointing at:
if it's not the code for a node, then it's text;
therefore a node needs to have 3 parts:
a 1-byte node code, and 2 pointers .

. english syntax trees can have long tree nodes;
eg, (if * then * else * ) = 6 pointers in one node;
but, generally, all syntax trees can be reduced to
a sequence of minimal nodes (2pointers);
so one node code could mean
it's the non-end of a sequence,
and the other node could mean
it's the end a node sequence .
. but there is a more compact way:
we could have just one node code
followed by a 1-byte length field,
and that tells us how many pointers follow .
. or
if we have more codes to spare,
then we could have 5 node codes:
#1: 2pointers, (eg, the * )
#2: 3pointers, (eg, * unless *)
#3: 4pointers, (eg, if * then * )
#4: 6pointers (eg, if * then * else * )
#5: n-pointers -- the generic case:
it means the number of pointers is in the following byte .