2013-08-31

semi-type-specific subheaps

14: adda/oop/type-specific heaps:
[29: intro:
. in a type-specific heap,
each object reference has 2 parts:
# an ID of the type it belongs to,
# a serial number unique to that type . 31:
. this promotes encapsulation,
because the reference isn't a pointer,
the only one who knows the object's address
is the type mgt that owns the object .
]
. given the idea of type-specific heaps,
how does that mix with adda's oop?
--that's the efficient form of oop;
the inefficient way of doing oop
uses garbage collection
and that is a choice independent of
which heap an object is owned by,
whether local or global .
. using pointers is not what causes
pointer semantics and garbage:
that's caused when assignments are overwriting
the pointer instead of the pointer's target:
to avoid garbage collection,
assignments need to overwrite the objects themselves
not the pointers to the objects .
. instead of a global type-specific heap
local vars would be placed into
the subprogram's local heap
that can have within it type-specific subheaps .

15:
. for each type it uses,
the locality gets its own instance of that type's mgr;
but a type mgt may want to coordinate efforts
(eg, for statistics on global usage);
so, a type mgt may have both a program init'
which sets up a coordinator process,
and also have a local init',
that establishes a local instance
of a type-specific subheap .

16: adda/oop/traditionally use pointer semantics:
. notice that in traditional oop,
when an overwrite happens, say to x,
others who were sharing what was overwritten
still get to access their intended value;
but they can no longer expect that value to be
updated by modifications to x;
thus, traditional oop has pointer semantics;
but, should oop want that ?
29:
. perhaps the traditional way is meant to
let you have your cake and eat it too:
assignment is supported,
but no objects were mutated
in this kinder, gentler assignment .
31:
. anyway, traditional oop is anti-intuitive,
and without achieving functional programming .
. the shining achievement of oop
was the use type-classing,
so that I could easily call integer+real
without having to worry about conversions,
because there was a supertype mgr
that automatically handled type.tags .
. everything else about traditional oop
was just a hack to bolt it onto
traditional procedural languages
without having to change them much .

20: adda/oop/semi-type-specific subheaps

. one key to oop is encapsulation;
( or "abstraction"
aka “encapsulation by convention
rather than enforcement”
;) meaning that when we
create a variable of a certain type,
only that type's mgt can operate on that var .
. one reason for having oop be pointer-based
(besides the efficiency of sharing)
is that only the var's type mgt can know
how large its object is going to be
because that is an impl' detail
that only the type mgt should be concerned with .
20: 25:
. in order to get away from pointers
we need an extendable object system:
the size of an object comes from 2 parts:
# the immediate part has a constant size;
# the trailer part is optional,
and has an indefinitely growable size .

. the default size for an immediate part
is sufficient to include a discriminant
and a local (short) pointer,
that can point to a local subheap:
that would likely be 2 bytes . [31:
. in space-saving mode the type.tag is implicit
and that's half of the object's reference;
the 15bits of the 2bytes, is for the serial number,
assuming a locale didn't use more than 2^15 objects
of any particular type .
. to simplify things for now,
that default might be 4bytes instead .]

. an immediate part can be less than 2 bytes
only if it decides it doesn't need a trailer .
. an immediate part will often be more than 2 bytes,
because the immediate part should accommodate
the object's typical size,
so that in the typical case
we can avoid using a pointer .
. on the other hand,
trailers can be easily shared
(by simply copying a pointer);
so, if there is often a chance to share,
then we'd want to minimize the immediate part's size .

. the type mgt can tell the compiler
the minimum size it will need per object;
eg, if a model truth value has only 2 states,
truth.type could require only one bit of storage,
implying it wouldn't need any trailer space .
. conversely, if a type mgr often needs nearly 8bytes,
it could require an 8-byte immediate size;
and then use that 8bytes for a variant record,
with one variant used for extending beyond 8bytes,
using 2 of the 8 bytes for a pointer to trailer .

. type mgr's themselves are in control of
whether they support extendability;
the writer of the type mgt did some homework,
deciding whether that type's set of possible objects
could ever be large enough to warrant
losing a bit for the optional pointer feature .
. when type mgt decides on an immediate size
it needs to keep in mind that
it is that size that must be copied
even if the trailer part can be shared .

. we can ask a type mgt at compile time
whether a particular subtype grows a trailer
and at run-time we can ask the type mgt
whether an object is currently with trailer .

. for example, look at type number;
some subtypes of number are trailered
while others not:
. if a var is typed number,
then it's saying we cannot be sure of
which numeric subtype it will be,
so we have to expect the worst-case size .
. but if the var is typed int16,
then we don't even need room for a type.tag,
since the symbol table's type is indicating
both the supertype and the subtype;
so, we know the object is 100% int16 .

20: 22: 23: 25: space-saving system:

. as part of a space-saving system,
we can find ways to safely share trailers .
. if a parameter is expecting a value
(rather than a pointer)
then to share a trailer instead of copying it,
the parameter should be typed read-only
and the assigned value should not change
(it should not be affected by other threads).
. even if a parameter is typed read-only,
if the assigned object is not constant
for the duration of the parameter's existence,
then the assigned trailer must be a copy
because that's the required semantics
of a value parameter:
the caller expects it to not be modified,
and the called expects it to mirror
modifications made by called
without affecting the caller's object .
. we must share with a thread only if
the thread's parameter is read-only .

20: 22: 23: 25: so, how to figure sharability?:

. a call tree can be reduced to
a sequence of non-nested calls,
and all the calls at the same level in that call tree,
can be done in parallel .

. so the n-layered call tree could become
n-1 sequential statements;
and only the first sequence is sure to have
a list of caller inputs as arguments,
all the successive statements
could be a mix of returns from child trees
and possibly some caller inputs .

. the contiguous subtrees of the same type
should be considered to be one call
as they will be sent in batch
to the appropriate type mgt .

. for most of these sequential statements
there may be multiple calls we can run in parallel,
and if we plan to do that, then we need to
consider their inputs collectively:
they are part of the same
sequence element:
. given a sequence element's caller inputs,
has anything become large and trailered ?
of the parameters it is being assigned to,
if all parameters are typed read-only,
then just give them all copies of the immediate part,
and they will all be sharing the same trailer .
. for every parameter that is not read-only,
it needs its own copy of the trailer;
all the read-only's can share a copy .
. some parameters are a var type
(ie, it expects a var's address
and is assumed to want write access);
if the parameter's type is pointer,
that is the same as being a var type .
. if we are sharing an address across threads
then we expect the value to change,
and we synchronize by message queuing .

23: obj pointers vs trailer pointers:

. if a parameter expecting a value
is instead given a pointer,
we just copy the pointer's value .
[23: 31:
. if a parameter expecting a pointer
is instead given a value;
as when a function returns a value,
that would be an error, because,
in adda's style of oop that avoids garbage,
a function that returns a value
implicitly expects the caller to provide
the address of where caller wants the value placed .
. the caller can't expect the function
to provide a new location,
because that would generate garbage .
. when a variable is of type pointer,
it is saying it doesn't own its target,
and that it expects its target
to be owned by some other variable .
. a separate issue is implementation:
a tree can be implemented with pointers,
but a tree assignment results in
copying the entire tree,
not just the hidden pointer;
if you don't want copying,
you need to type a var pointer to tree .]

. each subprogram has its own heap,
composed of subheaps for each type mgr
to hold the trailers of its locals .
. in addition to that,
individual symbols can have subheaps,
eg, a tree has its own subheap,
because it is composed of many nodes:
they are all local to that tree,
and they all die when that tree dies .
. trees never share nodes:
a tree leaf can be of type pointer to subtree,
and since it is a leaf,
we understand that this is not an internal pointer
that is implementing the tree,
it is pointer claiming to not own its target .
. when the tree is copied,
that pointer's target is not copied .

19: adda/oop/complications of avoiding pointers:

19: 31:
. traditional oop's pointer semantics
is often confusing to users,
but it is simplifying for compilers;
because, everything reduces to a pointer assignment
with no concern for implicit return parameters .
. by doing oop the adda way
(where assignments overwrite objects
instead of overwriting only pointers)
that means trees of calls need to be called
top-down instead of bottom-up .

[31: mis:
. this next part assumes an obsolete system;
in the new system for a call tree,
we make the root call first,
so the call's act'rec is on the stack
providing the return addresses
needed by child calls .]
. if there are a sequence of call trees
then you will want to reuse temp vars between trees
but is there any way that could get complicated?
[moot]
. you just have to ensure that the root
of the tree of calls is not a temp var:
every expression results in an assignment;
and the number of temp vars you need
is the number of parameter assignments
of the largest expression tree in that block .
. we must ensure that our own code
never wants to read the return object
before writing to it .

var versioning preserves value like pointers do:
. for debugging or auditing
we can specify that some vars be versioned;
-- like an SOA service, it keeps a log
of which statement wrote what value .