2011-12-31

short pointer

12.22: adda/type/short pointer:

pos: kiss:
. build the obvious system first;
don't worry about having to
rebuild from scratch in order to merge an optimization;
it's important for testing that there are
simple, trustworthy substitutes .

summary:
. shorter pointers could save a lot of space;
in c, pointers may be 4-bytes long
in order to address any place on the machine .
. addresses can be byte-sized only if needing
less than 256 addresses; for instance,
an array(1..256)of word can hold 256 tree nodes;
and, each node has 2 bytes,
each of which can point to any place in that array .
. a larger tree would use word-sized pointers,
which means the nodes are 4bytes long
(2pointers * 2bytes).

. the byte- and word-sized addresses
use an array index to do the pointing;
and, are collectively called  short pointers;
any use of a short pointer implies a reference to
some particular array;
eg, the root of a tree would be a descriptor
which identifies the tree as using short pointers
and then provides a link to the array being referred to .

. a tree descriptor would also include
a separate array for data;
because, arrays need to contain just one type
(unless its items all have bulky type tags);
and, while all internal tree nodes
are of type (ptr, ptr),
the leaf nodes contain data of varying sizes and types;
so, the data array might contain system pointers;
or, it could use a heap system,
where the pointer refers to several items:
the first item contains a size field
which indicates how many subsequent items
belong to this datum .

. large trees (uses word-pointers)
may contain small trees (uses byte-pointers);
any subtree of a large tree containing less than 256 nodes
could be repacked as a pointer to small tree .
. as the list processor goes into a tree,
it is always keeping the current base addresses
for both the large tree and any small tree .

. a large tree is chosen over a small tree type
not only when it will always be large,
but also when it is likely to grow .
. often it will never approach its max size;
therefore the array being used for the
internal nodes of a large tree
should have a growable size to conserve space .
. one way of being growable is to be
an array of pointers to segments:
eg, suppose the segment is 10 units;
then for item# 78,
follow the 7th pointer to a segment,
and use the 8th item on that segment .
. in practice,
the segments likely should be the size of small trees
( 256 nodes * 2pointers * 1 byte).

. while the large tree is indexing the seg'd array,
a small subtree could index a segment therein .
[12.23:
. since the short pointers of a large tree are
pointing at a seg' pointer,
seg's could vary in type, and be type-tagged,
that way the large tree can point directly at a small tree
without having to redirect to a data array .
. the large tree is an array of
( pointer to seg
, enum"seg nodes are {word, long}
)].

. both the large tree (word-pointers)
and the small tree (byte-pointers)
can have a bit per word to indicate
whether they point to tree or data .
. word-sized pointers use their sign bit for indicating
whether they point to tree or data;
byte-sized pointers exist in pairs on word-sized nodes,
and a byte is too small to sacrifice a bit,
so we'll divide the segment in half;*
if the positive side is used (a byte in 0..127)
that item is an internal node;
else (a byte in 128..255) the item is a leaf node .

*(conveniently, no matter what the shape of a tree,
the number of internal and leaf nodes is nearly the same:
#leaf = #internal + 1 )
A binary tree with N internal nodes has exactly N+1 external (leaf) nodes
Proof: by induction
    Base case:
        a binary tree with 0 internal nodes has exactly 1 external (leaf) node (root)
    Induction step:
        . assume a binary tree with m internal nodes
        has exactly m+1 external (leaf) nodes
        A binary tree with N internal nodes is made up of:
            A root (internal node)
            A left binary trees with k internal nodes
            A right binary trees with (N - 1 - k) internal nodes
        According to the induction assumption:
            The left binary trees with k internal nodes has exactly
                   k+1 external nodes
            The right binary trees with (N - 1 - k) internal nodes
                   has exactly (N - k) external nodes
            So the total number of external nodes
                in a binary tree with N internal nodes is:
                k+1 + (N - k) = (N + 1 ) external nodes .

notes:

binary operation polymorphism:
. tree polymorphism works the same way numbers do:
if a biop has to deal with (byte, word),
it converts byte to word for the (word, word) vm;
ie, it makes a copy of the tree but with new pointer types .

. by making small arrays part of larger ones
you would be doing more of your own mem mgt
instead of using c mem mgt
which expects you to de/allocate in stack order;
use c mem mgt for arrays whenever possible;
but use small arrays for tree nodes
to keep tree-pointer size down .

. the segs of a seg'd array
can either come from malloc,
or be pointers into a reserve array
where you're doing your own mem mgt .

. the small tree needs arrays of size = word*256
-- that may be the right size for units used by seg'd arrays .

. of the pointer and its target,
which should contain needed subtype tags?
if the target contains the id,
then there is efficient aliasing;
but in some targets there would be bloat
because targets like system pointers need the whole obj,
and can't spare a few bits for type tagging,
so then an additional array of byte for tagging is needed .