2013-12-07

target-owning vs aliasing pointers

12.7: summary:
. this explores types of pointers
needed for adda's space-efficient style of oop,
and support of safe pointers .

9.13: adda/dstr/pointers/local and symbolic types:
[12.1 ..12.7: intro:
. there are several types of pointers:
a unix hardlink is a file's serial number,
an index into an volume's array;
thus the scope of a hardlink is limited to
a volume (a partition within a physical device).
. a unix symbolic link is like a web address,
it's text describing the file's pathname,
and can refer to any volume that happens to be
mounted on the current file system .
. not only does the symbolic link have more reach
than a hardlink's address range,
it also uses a different method of addressing:
when using a hardlink,
the volume is seen as a linear array of objects,
whereas when using the symbolic link,
the volume is seen as a tree of folders .
. with a hardlink you can find a moved file
(if it was moved within the same volume)
because the move merely changed
which folder points at the file,
it did not change the file's actual location
within the volume's array;
whereas with a symbolic link,
you are finding a file name
along a particular path of subfolder names,
not a particular volume array entry .
. adda's pointers provide a superset
of the addressing styles provided by unix;
adda's hardlinks are called local direct pointers .

the direct pointer's size depends on its scope:
the global direct pointer to a file
is a tuple:
( the volume owning the file
, the file's unix hardlink
);
global direct pointers can point at
subprogram-local objects as well as files;
subprogram's objects are like files
in that they live in a particular volume:
the subprogram's exist within the addx run-time engine
which provides a virtual volume
in which to locate all subprogram objects .

here is the global direct pointer to
a subprogram's object:
( the volume id of the addx run-time
, the act'rec* owning the object
, the object's absolute address
);
*: an act'rec (activation record)
is the file system created by an app's startup
to serve as its set of local variables
(it is said to contain objects instead of files)
-- these objects are located within a virtual folder
within the addx virtual volume;
it is a separate namespace belonging to the app .
. a heap provides memory for growing objects;
a subheap is a subset of the global heap
that is considered to be local to an app;
and, all of the act'rec's objects
are growing from within the act'rec's subheap .

. the local direct pointer
needs only the physical address of the object;
because the obj's owning act'rec
is assumed to be the current act'rec;
and adda's process is the assumed volume .

. the adda-wide direct pointer
additionally needs the objects act'rec id
since the pointer is being shared across act'rec's;
as is found in subprogram parameters .
. when a local pointer is passed to
the parameter of a subprogram call,
it must be converted to an adda-wide pointer
by pairing it with the current act'rec id .

. a global direct pointer can address anything within
a particular computer account,
including all the files the user can see,
plus the objects local to an adda process .

. the adda symbolic pointer treats aggregates
(tuples, arrays, trees) like subfolders,
so the same pathname that applies to
files in folders,
also applies to subprogram locals in aggregates .
. an adda symbolic pointer is basically
the same as a web url,
except the format might not be text;
depending on subtype mgt's impl' choices .

. another dimension in types of pointers,
is whether the target is being aliased
vs owned by the pointer:
if a pointer is modified then
what happens to the target it pointed at?
the owning pointer will recycle it;
whereas, the aliasing pointer
assumes no ownership of its targets
and will not deallocate them .
. an example of an owning pointer
is a folder in a file system;
each folder is a pointer to a list of
files and folders;
so, when you replace a folder with another one,
that is deleting every subfolder
and deallocating the contained files .

proposed syntax for declaring pointers:
x/.t -- x is an owning ptr to type t;
x//.t -- aliasing ptr .

2013.9 notes with revisions:

13: 12.1: seeing a need for 2 pointer types:
. in our [2013.9] theory on pointer types,
we have one syntax (/.t)
for declaring both pointer types
-- aliasing vs owning --
and we tell them apart by this:
type mgt's are using only owning pointers,
and users of types are using only aliases .
. however,
there's trouble for that theory because
the type mgt may need internal components
that happen to be aliasing pointers,
which are alongside owning pointers .

12.3: 12.6: 12.7: intro to safe pointers:
. when pointers are not managed safely,
there can be memory leaks and dangling pointers;
leaks happen when all pointers lose access
to an object that has not been freed yet,
and there is no garbage collection system
to find such lost memory .
. a dangling pointer happens when
one pointer deallocates its target
but the target was aliased by another pointer
that should now be set to nil but isn't,
so the alias is still pointing at the same location
even though it belongs to the free mem list,
or was reallocated to a different object
that the alias is not supposed to be pointing at .
 . the owning pointer is safe because
the owner can use the pointer only to point at
the owner's private space,
and memory leaks get tracked back to the owner .
. a safe alias is a pointer that is
either a symbolic pointer
so that accesses involve run-time checks;
or, is a direct pointer that puts a
no-delete lock on its target,
or has some other assurance that
for the time that the alias is active,
the target's memory can't be deallocated,
or reallocated to some other object .
. to have the least impact on the rest of the system,
the only safe and simple alias is one that
avoids locks by being symbolic rather than direct .

16: 26: 27: 12.6:
. in looking at syntax, there are competing ideas:
an unsafe pointer is a rather simple concept,
but in our theory of safe pointers,
we need 2 very different sorts of pointers
whose capabilities complement each other .
. one implements the components of an object,
and another is an alias-only sort of pointer
that doesn't actually own any components
and won't be generating garbage
whenever it's assigned a new address .
. the owning pointer is soley responsible for
the memory that it points at;
so there is no reference counter on the object
because there is no mandatory sharing;
if an alias wants to avoid becoming nil,
it shouldn't point at any optional components;
eg, a tree in the same scope as the pointer
will be there as long as the pointer exists,
but a specific branch of that tree may evaporate .

21: mis: pointer subtypes:
. suppose the pointer type has a subtype.tag
that tells you whether the object is owned or not?
. if the user assigns a new obj,
the variant indicates ownership,
otherwise it indicates non-ownership [aliasing]?
[10.2:
. as detailed later, these are not compatible types,
hence they can't be subtypes of a generic pointer type .
12.7: well:
. they're not completely disjoint types either;
they both allow access to an object,
and if you are an subprogram accepting inputs
you don't care what type of pointer
is providing the access to your inputs
because you are copying the targeted value .
. anyway, the point that stands is that
we need separate notations for declarations
to distinguish aliases from owning pointers .]

21: 12.6: pointer subtypes/seems user-friendly but ...:
. users of a pointer type expect it to
either point to an existing object like an alias
or hold a new object provided by a function
like an owning pointer;
so a pointer with {owner, non-owner } subtypes
seems like a user-friendly idea;
because, then users wouldn't have to be
aware of the subtypes .
. in that case,
then one job of a function that returns pointer
is to specify this subtype?
the pointer's type mgt could set this subtype tag;
if it creates a new object and points at it,
then the subtype is set to owner;
if assignment is to an owner
then it deallocates before assignment .
... however:
. this will still be confusing to users of pointers
because they have implicit ideas about
who the owner should be;
ie, they assume an assignment never implies a delete
unless the last reference gets disconnected .
. what they need is object reference counting;
but, then every possible target object
has to include a reference counter !
. another reason to use a pointer
is to save space and time instead of copying
when an object is ReadOnly memory
and they can all share the same constant object .
. the point of avoiding the usual pointers
is to make oop efficient
but if you do offer a pointer type,
it should be the pointer that users expect
except hidden within the impl
not a user concern when trying to use the type .
... unfortunately, users can't have both
space efficiency and reference counting .

21: mis: trees need boundaries:
. what if some tree is owned
and some of its subtrees are not owned?
. are some tree additions not to local tree?
[10.2: 12.6:
. all we have to worry about
is that each object has a boundary,
and what they do with their aliasing pointers
happens to other objects,
who then need to own those changes .]

26: 27: compared to unix link varieties:
. unix hardlinks are local pointers;
ie, local to a volume;
unix symbolic links are url's
whereas adda's symbolic pointers are often not text;
they can be like machine instructions:
goto act'rec#..., goto symbol #...,
component given as a byte offset, etc .

26:
. there are 2 different kinds of pointers:
# local addresses impl'd with [12.6:
direct pointers -- not an index into
the target type's local heap ],
# potentially global symbolic addresses
that are impl'd as url's .

. the use of the syntax ( /.t )
for declaring an owning pointer
is reminding us of folders,
as when a folder is copied,
it copies all subfolders,
thus it has the semantics of a owning pointer .

. that is conflicting with a previous idea
that ( /.t ) should be an aliasing pointer:
being local [to an object]
is equivalent to ownership,
ie, the boundaries of objects are limited:
if an address points to a
component of that datastructure
that is the definition of an owning pointer
whereas if an address points to an externality,
something outside of the object,
then that is a non-owning pointer .

. so the new idea is that the user (12.1:
ie, the software of a subprogram
which is a user of various data types)
should be able to make data structures locally,
just like type mgt can;
in both situations
-- subprogram act'recs and type mgt bodies --
there is a need for both types of pointers:
aliases (//.t) that work like conventional pointers,
and owners (/.t) that are local to the
scope enclosing the object of type /.t,
and work like file folders .
[12.1: at the time,
I had not considered 4 types of pointers;
the "/" of /.t is reminding of folder sytems,
and thus of symbolic vs direct addressing ...
12.6: then again,
maybe symbolic vs direct should be a subtype
that the compiler could decide on;
so, users and perhaps type mgt would be
concerned only with owning pointers vs aliases  .]

. the syntax for symbolic links ( //.t )
reminds us of the global part of url's,
(ie, //global.address /local/file/system );
[12.1: another view,
 the "//" reminds of aliasing,
where an object is being targeted by
more than one pointer .]

. likewise, only symbolic links can
point to non-local things ...
[12.1: actually,
global direct pointers can do that too:
(act'rec id, obj id) identifies
any object within the addx virtual volume;
and "non-local" means outside a given act'rec .]

. assigments can happen between 3 classes of types:
both kinds of pointers and non-pointer types;
and, the outcome depends on type-matching:
[12.1:
. the 2 types of pointer at the time
were symbolic vs local;
what that means now is
aliasing vs owning pointer . 12.6:
. also consider pointer vs target assignments .]

# aliasing`= aliasing :
. the source's value is copied
(a conventional shallow assignment).
. to get the address of an alias
we can use the syntax: alias`address .

# aliasing`= owning :
# aliasing`= non-pointer :
. the source's symbolic address is assigned;
equivalent to:
aliasing`= src`address .

# owning`= owning :
# owning`= aliasing :
# aliasing/`= owning :
# aliasing/`= aliasing :
. the destination's target is deallocated,
and points to a copy of the source's target .

# owning`= non-pointer :
# aliasing/`= non-pointer :
. the destination is deallocated,
and points to a copy of the source .

safe pointers:
. notice that assignments to an owning pointer
always result in a deallocation
or reuse of the target,
so there is no garbage waiting to be collected
and no aliasing can take place [12.7:
(not by direct pointers only by symbolic addressing
where there are run-time checks target existence)]
whereas with assignments to aliasing pointers,
aliasing always happens,
because it got its target's address
from another pointer . [12.1:
. aliases can get addresses from functions too;
the function's return type
needs to match the alias's target type,
and then the return overwrites the target; 12.2:
. if the function returns an alias,
a shallow copy can be done;
but if the function returns an owning pointer,
that is a deep copy;
ie, instead of the alias losing what it pointed it,
the alias is using that target as the assignment target .
. if the alias has a null target at the moment,
then that is an error we can catch at run-time .]

21: mis: ID.type:
. to differentiate pointers from url's,
we can have an ID.type;
x.ID or x.ID.t .
. so can a user as well as type mgr
use owning pointers ( /.t )?
[10.2:
. we answer this elsewhere,
and settle on //.t instead of .ID.t .]

26: mis: @ is not the symbolic pointer type symbol:
. given that we need 2 types of pointers,
and that ( /.t ) is a local pointer to t,
what should the syntax of symbolic pointers be, @.t ?
. if ( @.t ) was a symbolic pointer type,
then given the current syntax design
the dereferencing operation would look like
x@y -- which conflicts with popular email syntax .

16: mis: what's missing without 2 pointer types:
. suppose we tried to simplify the syntax
by requiring that type mgt pointers
were always local [owning] pointers,
and user pointers were always aliases ?
. then there is no way for an object
to have a component of type [alias]  .
27:
. and conversely,
users could never use local [owning] pointers
which would obstruct the creation of
datastructures local to a subprogram  .

mis: minimal safety theory:
16:
. if we relaxed the safety requirments
then we could use one pointer type
to have the algorithms rather than the type mgt
determine how a pointer is used,
and whether it is an object boundary
or else an object component .
[ this is quite possible because
we are completely incapsulated,
and each type is responsible for defining
every operation on its type, including assignment;
so, the assignment operation would have its own method
for deciding whether a given pointer
needs a deep or shallow copy . ]
. in this sort of relaxed situation,
the system is still somewhat protected;
because, all mem'alloc's are [12.6:
attributable to a particular act'rec or type mgt; ]
so, even if an act'rec or obj is generating garbage,
it is all still tied to [12.6:
the source of the problem, providing accountability .]
27:
. this can still lead to very large mem'losses;
eg, a lax pointer assignment in a loop,
could use up the entire harddrive,
except for limitations imposed by the task mgt,
which would be monitoring the
per-act'rec resources usage .

adda/dstr/pointers/review of the dereference operator:
16:
. notice that Ada has no pointer dereference operator .
27:
. x.y.z could be a path down nested aggregates,
or it could be a path down a tree .
26:
. in adda the (/) is used
for dereferencing both kinds of pointers .
27:
... if adda uses (//.t) for typing symbolic pointers,
maybe it should use (//) for dereferencing .
. adda's local [owning] pointer syntax comes from
url notation, where subfolders are actually
links to other folders,
thus x/y/z is a path across links,
whereas www.x.com is the server www.x
with a server.type of com .

26: unix root confusion:
. when you see (dir/file), it should work like unix files?
a problem with being like unix
is that (/) means root .
27:
. that is only because unix has
an implicit current drive:
. if unix would follow the url syntax,
then they would name each volume,
and so, if it were also like MS`Windows,
(/) really means ( //c.drive/ ) .
12.6:
. well, the (/) notation means root because,
( file ) means
//my.acct/[current working dir]/file
whereas (/file) means //my.acct/file .

21: adda/dstr/pointers/the root object:
. every datastructure representing a graph
(a network of nodes) needs a root object:
when there is a pointer to a part of a tree,
that pointer needs to be paired with
a pointer to the root of the tree
or, at least a pointer to the graph's subheap
which is the array of nodes
that the whole graph is built of .
12.6:
. tree grafting is where a tree is composed of
subtrees that use different node pools,
rather than all nodes coming from the same pool;
this way a tree can use byte-sized pointers,
yet still have more than 255 nodes .
. in order to support grafting,
our subtree pointer should be paired with
the pointer to the current node pool .

21: adda/dstr/pointers/combining node-based objects:
. if you know a tree will be system-wide,
you could base its node pool in the global heap
and then merging subtrees is easy;
but if trees needing to be merged
are not of the same heap,
then we would need one of 2 approaches:
# grafting:
. make subtree a leaf node of type tree
indicating this subtree has its own nodes .
# hacking:
. add the trees by adding addresses:
trees are arrays of nodes in contiguous array addresses;
so, if tree#1 is in subheap#1(1.. 10)
and tree#2 is in subheap#2(1.. 10)
then copy subheap#2 to subheap#1(11..20)
and within the copied (and shifted) nodes,
shift the addresses those nodes contain
by simply adding 10 to them
(the size of copied part of subheap#2).

26: adda/dstr/pointers/the move operation:
. if you want the semantics where
2 pointers can alias the same object
then use a symbolic [12.6: aliasing] pointer type (//.t).
. when assigning local pointers
you are asking for a deep copy or a move;
therefore we need a move operator
as well as a copy-and-replace operator .
[27:
. according to the rule that
inout params should be done by the possessive syntax,
the move would look like this: ( x, y )`<- p="">because they are both getting modified
(one is getting overwritten,
and the other is getting nil'd).]

notes during review:

12.2:
. the pointer's context field is some act'rec id;
suppose we were to use subheap id's instead?
12.3: keep in mind all type mgt's needs:
the act'rec is 2 things:
# the array of type-specific subheaps,
# the stack record .
. type mgt needs access to that act'rec,
because the trailer for its objects
may be located in either
the act'rec's type-specific subheap for that type,
or the obj's private subheap .
. the object may be composed of more than one other type,
and the allocation of those components may be
based within the act'rec's other type-specific subheaps,
so type mgt needs a reference to the whole act'rec,
not just the the act'rec's
type-specific subheap for that type .
12.6:
. eg, a tree that has a private subheap for its nodes
may not want to so localize its leafage
consisting of strings,
choosing instead to use the string subheap
that belongs to the tree's home act'rec .
. in order to handle all cases,
the pointer would either need many context fields,
or it would need to contain just the home act'rec
knowing that from the act'rec
one could reach any type-specific subheap
that a type mgt in that scope might be interested in .

12.3: how can we shorten direct pointer size:
. first, let's assume that subheap mgt
has the freedom to relocate obj's
as part of growing them or their enclosing array;
then direct pointer size becomes variable .
. we have no problem with
direct pointers to nested tuples,
because we are talking about fixed public entries
that can be mapped to the stack,
but we do have a complication with extendable arrays,
and the trees impl'd with them;
because, these arrays are growing in a subheap space,
rather than on the stack .
. we need to keep the array index symbolic,
so that we can change absolute addresses
in order to accommodate array growth .
. direct pointers to components can thus be quite lengthy,
requiring an entry for
# the act'rec,
# each array index, and
# each tuple offset that is sandwiched
between array subscripts;
eg, say we have the path:
tuple1 tuple2 array tuple3 tuple4 tuple5,
then we need this direct pointer path:
( act'rec id -- for stack of first tuple
, stack index -- for tuple1 tuple2 array
-- array's fixed part contains array's subheap;
, array index -- for the array item
, item index -- for tuple3 tuple4 tuple5
) .
. now let's assume that
objects cannot be relocatable;
we can do this if subheaps can only grow,
they can never shrink,
and they grow by adding segments,
not by freeing the subheap and getting a larger one .
. in that case,
our direct pointer size is constant:
we need the obj's absolute address,
plus the obj's home act'rec
for its list of type-specific subheaps .

. but what about overwrites?
let's say I have a direct pointer to
an array item's component,
and then I replace that array with a new one .
. the easiest thing to do is free the old array,
and link-up the new one .
. if I can't shrink the array
then I can't delete it either,
so I would have to copy the source array,
into the components of the target array .

. we can have an aliasing direct pointer
if it doesn't point at optional components;
for simplicity,
alias typing should not make a distinction
between direct vs symbolic;
that could be a compiler decision .
12.6: ie:
. for any alias to be a safe pointer
while still pointing at components
and giving type mgt the freedom to
change the absolute address of a component,
we should assume it uses symbolic addressing,
but in special situations the compiler may use
a variant of aliasing that uses direct addressing .

12.4:
. there is still confusion about direct pointers;
when we access tuple components,
those are on the stack,
so we may use stack-relative short pointers,
but when accessing an extendable array,
that is generally going to be on the subheap,
so we can either use one machine pointer
(that would be an obviously direct pointer)
or we can use a path of short pointers
(array base, array index, ...)
--with additional array indices for any nested arrays .
(that would be a partially symbolic pointer;
it is considered to be partially direct because
if the array component is a nested tuple,
we can treat the array as one byte array
instead of the path (array# .tuple.tuple.tuple ...).
. so the local direct pointer
needs the physical address of the object;
the adda-wide direct pointer
additionally needs the object's act'rec
for possibly containing the object's trailer;
and finally, the global direct pointer
additionally needs the volume id,
(adda's process is on one virtual volume
containing all the processes under adda's control).

12.6:
. when a /.t type is declared,
the target type's init should be auto'ly allocated
and assigned to the pointer ...
does that work for trees?
sometimes you want things pointing to nil!
12.7:
. it should be up to the target type's init function .

12.7:
. notice that in previous ideas of subheap,
you had the obj's fixed part on the stack,
and the trailer was in a type-specific subheap;
so now with subprogram-local owning pointers
you are expecting the type-specific subheap
to have a place for the fixed part too?
maybe that should go in the act'rec's
"pointer's target subheap" ?
... if we know the pointer needs a target,
we can put the target on the stack .

12.7:
. shouldn't the direct pointer to adda objects
have the same semantics as direct pointers to files?
. those are like object references,
because while an absolute address represents
an arbitrary position within new objects,
an object id always refers to an object
not a location within that object .
. but that would severely impede space efficiency,
as all object entries would have to be pointers .

12.7:
. even with absolute addressing by pointers
our pointers are somewhat safe because,
the subheap heap manager insures encapsulation:
when one subheap is given a segment of memory
that segment can only be used by that subheap,
so after a pointer is obtained from
one type mgt or act'rec
it can never be used for accessing the space of
some other type mgt or act'rec .

No comments:

Post a Comment