managing self-modifying code

1.31: adda/managing self-modifying code:
(inspired by python unpickle vulnerability)
. the safe pickle is built by the system .
. it can be compared to the decompile,
how is it extensible? that is to ask
how are objects built in the first place?
. an arbi'object (an arbitrarily typed one,
whose pointer structuring is not known
by any one but the object itself)

. in the simple case
there is not pointer structuring:
it is treating its entire space
as a contiguous byte string .
. generally however,
an object is non-contiguous,
a graph of nodes connected by pointers;
and, the system doesn't know
what's a pointer to follow .
. that can't be allowed:
the object must be able to
dynamically declare its data structures,
and it gives this declaration to the system .
. generally, the datatype is defined by
a typedef being virtually pointed at
by the obj's type.tag .
. in the case of arbi'types,
we don't have a typetag,
and would thus need an on-board way
to tell the system our object's structure .
. what we need for dynamics
is a special typetag to indicate that
it's not one of the system's known types
but is instead indicating
a typedef that is onboard the object itself .

. type.tags come in 2 parts:
the first part is defining
which scope the type.def is in:
# global,
# local to a library unit,
# local to the object itself .

. within program code,
symbols are always pointers leading into
the current scope's symbol table .
. an obj's type.tag can be in
either the symbol table node,
or on the object itself(on the stack);
but if they are on the stack,
then the type is dynamic,
hence the type must be defined globally .
. if the type.tag is constant,
then it is contained in a symbol table node
as a 2-part address:
(lib unit id, node id)
. within any library unit,
we can have nested scopes,
each with their own symbol table
that is an array of ptr to symbol node
each scope shallow-copies
the symbol table of its parent scope;
and, if it redeclares any parents name
then it replaces that name's node
with a new node (an index into
the lib unit's node heap .
. a non-global typedef
is identified by the library unit id
and a symbol node ptr:
no matter how many nestings a library unit has
all scopes within it
share the same symbol node space;
and, for the sake of reducing typedef pointer size
lib units should have a separate spaces
for the symbol nodes of datatypes vs objects .]

. obj's can have a dynamic structuring
by using a graphing map
which is a set of tags to say:
"( the records in my graph-structured data
use tags that describe how to
format that byte string into a
mixed sequence of byte strings and pointers .
. at the end of the map,
is a list of the types of each pointer;
all pointers are relative to a type
(ie, an index into a type mgt's heap
within an act'rec's subheap)
or an index into a lib unit
(ie, an index into the
current scope's symbol table
[3.9: or more likely, into the
lib unit's symbol node heap .]).).
. some pointers may refer to self .
. physically, self is always segmented
so that it can be constructed incrementally
whenever its size exceeds a sys'segment;
but virtually,
it can be either contiguous or segmented .
. so, when it has a ptr to a node of self
it really has an index
that the system can map to an actual sys ptr
that is 2 parts:
a sys'heap seg + an offset into that seg .

. a self ptr comes in several sizes
incrementing in half bytes,
assuming it doesn't mind pre-declaring heap size;
. if the user doesn't want to do that?
. make it part of the record's map:
the map is saying:
( I started with byte-sized ptrs;
but I ran out of that space;
so for this record,
all ptrs are longer than previously .
). however,
if records are very small that's too much space ?
[you could test it(that's real cs)]
. the simple idea for pointer size policy
is to assume a default of 16bit ptrs;
and otherwise the user can declare
either 1, 1.5, 2, 3, or 4 bytes,
(that is 8,12,16,24, or 32 bits).

managed code:
. in a managed system we can have
pointers with permissions built in:
user code can't change the ptr;
because, when the system is managed,
no aritrary code can change a system field
without mgt'code doing it .
. we can still do c-style address manipulations
but they apply only to local pointers,
not system pointers imported from a parameter .
. that rule messes up orthoganality:
if we have parameters and global pointers
we should have them usable together;
the safe way to provide both is to
translate address modification into
an object-oriented address handling;
ie, pointers are never anonymous:
they always belong to some object,
and then when you arithmetically
generate a new address
you can't do anything with that address
except ask the originating object
to do something with it for you .
. what are the chances of that making sense?
so the same restriction practically applies
but by more-general rules
that stay true to orthoganality .]

managed code needs hardware-based isolation:
. any time unmanaged code is entered,
its domain has to be hardware-isolated;
or its inputs have to be protected;
and, without hardware-based isolation,
no code on the entire machine
can be invulnerable to crashing;
because that crash could be a stack or heap smashing
that causes execution to jump into your process;
and then your managed subsystem may be corrupted,
so either the whole system needs to be managed,
or there needs to be hardware-based isolation .

( . if they want 32 bit
then its space efficient to make those direct pointers?
but they are still safe because ?
-- you don't want direct pointers anyway,
because you are using a vax system
-- virtual address extension --
so that even if the program wants trillions of objects
the size is limited only by the terabytes of harddrive space;
the actual pointer will be even larger,
because often the pointer wants an arbi'type,
so there's a type code,
possibly an act'rec where the type is defined,
and finally an index into that type's heap space,
or into the heap space that type can use
in a particular act'rec .

(. no matter what arbi'code changes the value to,
the premissions and the type stay the same;
eg, any seg you switch to is your seg!
but that seg could be undefined; ...
-- nonsense .
. if it's an actual sys ptr
you can't chg the ptr value arb'ly without
poking unrelated system .
. so back tracking
the system must be managed
-- slow but fits in tight space .