web.adds/{super, supra, sur}-type:
summary:
. there are uses in the wild of each of
surtype, supratype, and supertype
to mean something like an ancestor type
if not specifically the parent type .
. in genetics, type has a special meaning:
what the gene expresses functionally,
so then the term, supertype,
is used for what the gene expresses incidentally, in tests;
in this system of nomenclature,
the highest type might be called the supratype .
. within knowledge engineering,
the terms are surtype or supertype,
rather than supratype .
. the term surtype was associated with
being complete; ie,
a surtype has several subtypes,
and every instance belongs to a subtype .
. surtype may be related to the term Surjection:
in the surjective mapping,
every instance of the function's codomain
belongs to the mapping .
. wikipedia's take is lean so far:
supertype redirects to Subtype polymorphism;
and neither wiktionary nor wikipedia
have supratype or surtype .
. wiktionary's supertype:
(computing) The data type represented by a superclass .
2012-02-29
the subfiler subprogram
2.23: adde/subfiler:
intro:
. the first subprogram I should make is to
input a subject file
and decompose it into several subfiles .
. pay attention to the datestamps,
applying them to each associated output file .
. it shows each of the subfiles intended for output
along with any chunks not assigned to separate files,
showing the relationship as a hierarchy .
. it shows the full internal file name
that it's basing that filename on
(it follows portability rules for filenames).
details:
. there are 2 main formats defined by
either {zero, one} number of spaces
between {title, body};
the one-spacer should have had
2 or blank lines before title,
and then the end of that title's body is defined as
2 or more consecutive lines .
. for a zero-spacer,
the end of subfile is defined as the next blank line .
[. there may be some situation where an end of file
is not followed by either a {title, eof}
-- this is an error
or where one subfile is nested within another?
that should be rare, and review should correct it.]
. the title may be missing a colon,
but a subfile title has one div in it,
and is preceded by at least one blank line .
TOC: table of contents:
. it shows you a TOC of what it thinks the subfiles are,
and it shows any other lines that might be titles*
but that it decided were not,
so you can see how sane it is
(these are content lines).
*:( any line with a div in it preceded by a blank line )
. all content lines are indented,
so only top level (unindented lines)
are the heads of a separate subfile .
. for the first version,
instead of a fancy interactive menu for
adjusting mistakes you or it made,
it should give you the original line numbers
so then you can use your usual editor to visit those line#'s
(the next version can integrate an editor).
[2.29:
. it can help you do the file syntax reviewing:
just print every line preceded by a blank line,
print a blank line if the file has
2 or more blank lines in a row .
. it also prints the associated file numbers .
. it then asks you "(ok to run subfiler?). ]
. a good function to start with is
size.int`= [number of blank lines](inout [next line].string);
. after it returns,
[next string] contains the next non-blank,
and size is the number of blanks .
. if [next line]`length =0,
then eof has been reached .
. a next version might move things to proper folders
according to what the subfile paths are .
. the expectation is that a reviewer has
checked the syntax and made the changes to the file path
so that in case it was written a long time ago,
the path is changed to become currently relevant .
[2.29:
. a next version can help you edit like so:
it generates a file of numbered lines
rather than printing them;
then after you adjust any lines in your editor
and run subfiler on this file,
it copies your changes back to the original file .
it then proceeds to do the subfiler routine,
complete with parking files in the right folders .]
intro:
. the first subprogram I should make is to
input a subject file
and decompose it into several subfiles .
. pay attention to the datestamps,
applying them to each associated output file .
. it shows each of the subfiles intended for output
along with any chunks not assigned to separate files,
showing the relationship as a hierarchy .
. it shows the full internal file name
that it's basing that filename on
(it follows portability rules for filenames).
details:
. there are 2 main formats defined by
either {zero, one} number of spaces
between {title, body};
the one-spacer should have had
2 or blank lines before title,
and then the end of that title's body is defined as
2 or more consecutive lines .
. for a zero-spacer,
the end of subfile is defined as the next blank line .
[. there may be some situation where an end of file
is not followed by either a {title, eof}
-- this is an error
or where one subfile is nested within another?
that should be rare, and review should correct it.]
. the title may be missing a colon,
but a subfile title has one div in it,
and is preceded by at least one blank line .
TOC: table of contents:
. it shows you a TOC of what it thinks the subfiles are,
and it shows any other lines that might be titles*
but that it decided were not,
so you can see how sane it is
(these are content lines).
*:( any line with a div in it preceded by a blank line )
. all content lines are indented,
so only top level (unindented lines)
are the heads of a separate subfile .
. for the first version,
instead of a fancy interactive menu for
adjusting mistakes you or it made,
it should give you the original line numbers
so then you can use your usual editor to visit those line#'s
(the next version can integrate an editor).
[2.29:
. it can help you do the file syntax reviewing:
just print every line preceded by a blank line,
print a blank line if the file has
2 or more blank lines in a row .
. it also prints the associated file numbers .
. it then asks you "(ok to run subfiler?). ]
. a good function to start with is
size.int`= [number of blank lines](inout [next line].string);
. after it returns,
[next string] contains the next non-blank,
and size is the number of blanks .
. if [next line]`length =0,
then eof has been reached .
. a next version might move things to proper folders
according to what the subfile paths are .
. the expectation is that a reviewer has
checked the syntax and made the changes to the file path
so that in case it was written a long time ago,
the path is changed to become currently relevant .
[2.29:
. a next version can help you edit like so:
it generates a file of numbered lines
rather than printing them;
then after you adjust any lines in your editor
and run subfiler on this file,
it copies your changes back to the original file .
it then proceeds to do the subfiler routine,
complete with parking files in the right folders .]
higher-order function programming
2.26: adda/aop/higher-order function programming:
. an example of AOP (aspect oriented prog'ing)
is a logger: you want every function to
log what it's doing .
. your job as a programmer is to
go into every new function and apply a recipe
for how to rewrite the function
so as to have it do this logging .
. in AOP, your job is to automate the
programming of that recipe
with something like this:
for each new function defined:
. if this function is recipe's target
(passes some tests like being of a certain type)
then search for some class of expressions,
and use list processing to modify the expression
(this is routinely done in lisp).
. just like you can find an expression with wildcards
(eg, find a function named f, with any params)
you can replace that find with something like
a declare block that includes f .
. this is the same way optimizations can be done;
and it could simplify the way
HLL-to-HLL compilers are designed
(these are compilers that translate one high-level lang
to another high-level lang, eg, adda lang -> C lang ).
. an example of AOP (aspect oriented prog'ing)
is a logger: you want every function to
log what it's doing .
. your job as a programmer is to
go into every new function and apply a recipe
for how to rewrite the function
so as to have it do this logging .
. in AOP, your job is to automate the
programming of that recipe
with something like this:
for each new function defined:
. if this function is recipe's target
(passes some tests like being of a certain type)
then search for some class of expressions,
and use list processing to modify the expression
(this is routinely done in lisp).
. just like you can find an expression with wildcards
(eg, find a function named f, with any params)
you can replace that find with something like
a declare block that includes f .
. this is the same way optimizations can be done;
and it could simplify the way
HLL-to-HLL compilers are designed
(these are compilers that translate one high-level lang
to another high-level lang, eg, adda lang -> C lang ).
Labels:
adda,
aop,
higher-order functions
adda's equivalent to Python decorators
2.26: adda/functions/Python decorators:
. a note of mine that was wondering how to
impl' AOP (aspect oriented prog'ing)
[@] 11.9.16: mis.adds/aop (aspect oriented prog'ing)
had me responding now that
a first-class lisp system would make it easy to
add AOP later .
. the topic of higher-order function
(the class of functions that
input and output other functions)
reminded me that python does decorators;
the syntax is:
@fun1 @fun2 ... def funx(): body .
. an adda-style syntax for that would be:
funx().proc: fun1 fun2 body .
. a note of mine that was wondering how to
impl' AOP (aspect oriented prog'ing)
[@] 11.9.16: mis.adds/aop (aspect oriented prog'ing)
had me responding now that
a first-class lisp system would make it easy to
add AOP later .
. the topic of higher-order function
(the class of functions that
input and output other functions)
reminded me that python does decorators;
the syntax is:
@fun1 @fun2 ... def funx(): body .
. an adda-style syntax for that would be:
funx().proc: fun1 fun2 body .
set generator syntax
2.23: adda/cstr/set generator:
(^ x,y,z: int . f(x,y,z) ) -- this is the set generator;
see how the power param is terminated by a period?
the reason that can't be the usual math operators {: | }
is due to (:) being confused with the label declaration
and with other math uses for (| ) .
. (x.t, y.t) can also be defined by ( x,y: t ).
2.29:
. even if it was ok to overload (| ),
the form of it is opposite of what I need:
math puts the control var to the right of (|):
{ f(x) | x in t }
when I want the control var as the head:
{^ x in t .
f(x)
} -- that way var's are declared before they are used,
and it's easier to read generally,
when the body of the generator is getting large
(I have to admit that for math's one-liner's,
the math way looks neater ).
(^ x,y,z: int . f(x,y,z) ) -- this is the set generator;
see how the power param is terminated by a period?
the reason that can't be the usual math operators {: | }
is due to (:) being confused with the label declaration
and with other math uses for (| ) .
. (x.t, y.t) can also be defined by ( x,y: t ).
2.29:
. even if it was ok to overload (| ),
the form of it is opposite of what I need:
math puts the control var to the right of (|):
{ f(x) | x in t }
when I want the control var as the head:
{^ x in t .
f(x)
} -- that way var's are declared before they are used,
and it's easier to read generally,
when the body of the generator is getting large
(I have to admit that for math's one-liner's,
the math way looks neater ).
multi-inheritance
2.20: adda/oop/multi-inheritance:
. the essence of what a call to the supertype does
is manage its own ivars (instance vars)
during alloc, init, dealloc;
so, when you multi-inherit, then your call to super is
sent to each parent class .
. self`super(function)
-- this would be a good syntax because
it shows the self is being modified by a super function
which takes as arg, self and a function to send to supers .
. so how does each supertype in a multi-inheritance
know where its ivars are within the given obj?
. the first part of the object
has to be a dictionary of classes that
tells each super where their ivars are located .
. in the most general form of oop operation
there are 2 type tags:
# which type among all unrelated types:
(eg, is it a character string or a number?);
# within a type, which subtype it is:
(eg, is the number an integer, float, or quotient?).
supertype.tags can be implicit:
. we can save space in a var's storage
by ensuring that it's always assigned
values of a particular type
so that the type tag is no longer needed
because the owning subprogram's symbol table
already has it .
. if, according to the symbol table,
this object is typed,
then we can expect that
only that type has been assigned;
if not, we can expect only that
the field#(-1) will be a supertype ID .
[2.29:
. now, the next job for multi-inheritance,
is accepting a message for a function to execute .
. the type mgt who accepted the multiple inheritances
was the final determiner of what a message means;
so, for example if type mgt inherits religious and hungry,
but they both implement [pass the bread],
then the type mgt in this case has to decide,
should [pass the bread] be sent to hungry, to religious,
or overridden to do something integrated ?
. thus there are 2 fixed fields in an obj:
# the dictionary of supers' ivars,
# the type mgt ID, that gets sent the message .]
. the essence of what a call to the supertype does
is manage its own ivars (instance vars)
during alloc, init, dealloc;
so, when you multi-inherit, then your call to super is
sent to each parent class .
. self`super(function)
-- this would be a good syntax because
it shows the self is being modified by a super function
which takes as arg, self and a function to send to supers .
. so how does each supertype in a multi-inheritance
know where its ivars are within the given obj?
. the first part of the object
has to be a dictionary of classes that
tells each super where their ivars are located .
. in the most general form of oop operation
there are 2 type tags:
# which type among all unrelated types:
(eg, is it a character string or a number?);
# within a type, which subtype it is:
(eg, is the number an integer, float, or quotient?).
supertype.tags can be implicit:
. we can save space in a var's storage
by ensuring that it's always assigned
values of a particular type
so that the type tag is no longer needed
because the owning subprogram's symbol table
already has it .
. if, according to the symbol table,
this object is typed,
then we can expect that
only that type has been assigned;
if not, we can expect only that
the field#(-1) will be a supertype ID .
[2.29:
. now, the next job for multi-inheritance,
is accepting a message for a function to execute .
. the type mgt who accepted the multiple inheritances
was the final determiner of what a message means;
so, for example if type mgt inherits religious and hungry,
but they both implement [pass the bread],
then the type mgt in this case has to decide,
should [pass the bread] be sent to hungry, to religious,
or overridden to do something integrated ?
. thus there are 2 fixed fields in an obj:
# the dictionary of supers' ivars,
# the type mgt ID, that gets sent the message .]
Labels:
adda,
multi-inheritance,
oop
dynamic module import
2.18: adda/cstr/dynamic module import:
. dynamic module import?
at first I was perplexed at how to parse functions
if you can't be sure what's active,
conflicting with each other in the namespace .
. but this does work out; and,
it's essential for the mandate to have
complete lisp-like control over your machine .
. lisp can treat any control structure (live or template)
as if it were also data:
inspecting its space, and then using what it finds .
. this could be called having 1st-class plug-in's .
parsing for conditional imports:
. this will be like generics, but instead of
the programmer specifying the type of instance,
the run-time is checking parameter types,
to figure out which of several subprograms to run .
. after an explicit import,
rather than just a situation where a program
looks around the file system on its own,
some of the imported names may be overloaded .
. if multiple imports are conditional,
then we must assume that any combination
of imports will happen .
. the programmer may have it arranged such that
in practice no overloading ever occurs;
but generally this can't be known at compile time .
. therefore,
we should be responding to every import
without regard to the condition intended;
and, if there is then overloading,
we should tell the compiler user
that a run-time binding has been used,
so they have the option of specifying a particular module
rather than slow things down by
testing the run-time param'type in order to
find the suitable subprogram from among
the multiplicity of modules currently in effect .
. a run-time binding works like this:
the function to call is dynamicCall,
which is given the intended subprogram's param's,
along with a link to the intended subprogram's symbol
(found in the caller's run-time symbol table);
and, that symbol has a list of pointers to
the subprograms of the same name
in the various imported modules .
. in summary,
the symbol table needs 2 listings:
one for the types it finds,
and another for the other (instance) symbols .
. in a simple example with no run-time binding,
the symbol table's list of instance symbols is temporary
(not of use after the compiler is done);
but, in the case of needing dynamic calls,
the symbol table needs a permanent list of instance symbols .
. dynamic module import?
at first I was perplexed at how to parse functions
if you can't be sure what's active,
conflicting with each other in the namespace .
. but this does work out; and,
it's essential for the mandate to have
complete lisp-like control over your machine .
. lisp can treat any control structure (live or template)
as if it were also data:
inspecting its space, and then using what it finds .
. this could be called having 1st-class plug-in's .
parsing for conditional imports:
. this will be like generics, but instead of
the programmer specifying the type of instance,
the run-time is checking parameter types,
to figure out which of several subprograms to run .
. after an explicit import,
rather than just a situation where a program
looks around the file system on its own,
some of the imported names may be overloaded .
. if multiple imports are conditional,
then we must assume that any combination
of imports will happen .
. the programmer may have it arranged such that
in practice no overloading ever occurs;
but generally this can't be known at compile time .
. therefore,
we should be responding to every import
without regard to the condition intended;
and, if there is then overloading,
we should tell the compiler user
that a run-time binding has been used,
so they have the option of specifying a particular module
rather than slow things down by
testing the run-time param'type in order to
find the suitable subprogram from among
the multiplicity of modules currently in effect .
. a run-time binding works like this:
the function to call is dynamicCall,
which is given the intended subprogram's param's,
along with a link to the intended subprogram's symbol
(found in the caller's run-time symbol table);
and, that symbol has a list of pointers to
the subprograms of the same name
in the various imported modules .
. in summary,
the symbol table needs 2 listings:
one for the types it finds,
and another for the other (instance) symbols .
. in a simple example with no run-time binding,
the symbol table's list of instance symbols is temporary
(not of use after the compiler is done);
but, in the case of needing dynamic calls,
the symbol table needs a permanent list of instance symbols .
2012-02-28
wordcoding and code paging
2.12: adda/wordcode/
code page for programming symbols:
. some places where cocoa uses strings
it means to use symbols,
a 4byte number would do it:
each library has a 16-bit serial number;
it may contain a 16-bit number of symbols .
. english words will be encoded too,
so maybe both could be part of the same coding system?
or be modular, part of its own code page .
. the programming symbols can be used alone
or the english words can embed it by saying
this is jargon from computing:
the next 4bytes belong to it .
adda/wordcode/styles of code paging:
. code paging is a system for saving space
by using shorter codes for frequently used words .
. ascii is size byte, but the sign.bit = 0 .
. there are 2 ways to signal a larger size:
# 2 bits as discriminants:
0x -> 1 byte: 7-bit space,
10 -> 2 bytes: 14-bit space
11 -> 4 bytes 30-bit space .
# sign.bit discriminant:
. sign.bit =1 means word is 2 bytes
so then word space is 15 bits .
. sign.bit=0 means 1 byte,
but reuse the control codes as discrimininants:
most serve no addx purpose except
( 9: tab, 10: newline, 13: carriage return )
all the others -- ( 0..31 -{9,10, 13} ) --
could be indicating the code pages that are
using codes of size 2, 3, or 4 bytes .
. null could be the escape,
it would mean the next byte is the size field,
and then the next string of that size
should be taken as ascii -- even the control codes .
pages for strings:
. that same string idea could apply to all code pages;
code pages would come in twins,
one for single char, and one is a string
where the 2nd byte of the string is the size,
coded like so:
0: null terminated
1 ... 127: actual size in bytes
128..255: size is {128, 256, 512, ... 2^134 } .
. you would combine these large strings
just as the number system combines large digits .
. one reason unicode did it their way
(where null was reserved for end-of-string)
is that it was preferred by unix/c coders
because it saves space if using only ascii
(no matter how long your strings are,
the size field need be only one byte wide
-- the size of your terminating null character).
. but that takes a huge chunk of 4-byte space;
considering the high-order bytes can't be zero .
. so, when do you need to communicate with unix/c?
the only time is when it not only expects no nulls,
but also expects all ascii,
so just convert to ascii at those times;
for other communications, use sockets or temp files .
2.13: prefixes and suffixes:
. some words in the 3 or 4 byte range
had a base code for finding the base word
then the extension code would indicate
the fom of the word .
. there is some main combination of prefixes and suffixes,
and then one code page can be adding the prefix word;
ie, stating which prefix combinations are being used .
. there might need to be 2 versions:
one is giving the main combinations,
(eg, -ing-ly is one combination)
the other version, for uncommon combinations,
is using word strings, eg:
anti.dis.establisment.arianism
might be 4 words .
the [all one word] codepage:
. one string code is for all one word:
if (understand) was not a common word,
it would be represented as (under, stand)
this would be independent of code page:
it is saying print the next 2 words without a space .
. another way is to use the backspace char:
every word implies adding a space afterward,
but if there's a backspace char, then it connects words
instead of separating them .
. yet another more generic function is to
have a container with 2 params:
it shows how many of the following words are contained,
that may be up to 5 bits (32 word max)
and it uses 3-bits for indicating
a style for the arrangement of the words:
{ no spaces between words
, nonbreaking spaces, dashes, underscores
, camelcasing, all initials capitalized
, underline
}.
the hash table:
. to build the word code array,
look at all text in the library;
hash each word and place in hash table;
the hash table entry has a record of stats for the word:
how it is spelled (for the sake of collisions)
and how many times it was found .
. then find the 32768 most popular words
and assign them to the shorter, 2-byte codes .
. make a new hash table of (spelling, wordcode)
for turning words into worcodes .
. make an array 0..32767 of (spelling, pre/postfix spellings)
for turning worcodes into words .
code page for programming symbols:
. some places where cocoa uses strings
it means to use symbols,
a 4byte number would do it:
each library has a 16-bit serial number;
it may contain a 16-bit number of symbols .
. english words will be encoded too,
so maybe both could be part of the same coding system?
or be modular, part of its own code page .
. the programming symbols can be used alone
or the english words can embed it by saying
this is jargon from computing:
the next 4bytes belong to it .
adda/wordcode/styles of code paging:
. code paging is a system for saving space
by using shorter codes for frequently used words .
. ascii is size byte, but the sign.bit = 0 .
. there are 2 ways to signal a larger size:
# 2 bits as discriminants:
0x -> 1 byte: 7-bit space,
10 -> 2 bytes: 14-bit space
11 -> 4 bytes 30-bit space .
# sign.bit discriminant:
. sign.bit =1 means word is 2 bytes
so then word space is 15 bits .
. sign.bit=0 means 1 byte,
but reuse the control codes as discrimininants:
most serve no addx purpose except
( 9: tab, 10: newline, 13: carriage return )
all the others -- ( 0..31 -{9,10, 13} ) --
could be indicating the code pages that are
using codes of size 2, 3, or 4 bytes .
. null could be the escape,
it would mean the next byte is the size field,
and then the next string of that size
should be taken as ascii -- even the control codes .
pages for strings:
. that same string idea could apply to all code pages;
code pages would come in twins,
one for single char, and one is a string
where the 2nd byte of the string is the size,
coded like so:
0: null terminated
1 ... 127: actual size in bytes
128..255: size is {128, 256, 512, ... 2^134 } .
. you would combine these large strings
just as the number system combines large digits .
. one reason unicode did it their way
(where null was reserved for end-of-string)
is that it was preferred by unix/c coders
because it saves space if using only ascii
(no matter how long your strings are,
the size field need be only one byte wide
-- the size of your terminating null character).
. but that takes a huge chunk of 4-byte space;
considering the high-order bytes can't be zero .
. so, when do you need to communicate with unix/c?
the only time is when it not only expects no nulls,
but also expects all ascii,
so just convert to ascii at those times;
for other communications, use sockets or temp files .
2.13: prefixes and suffixes:
. some words in the 3 or 4 byte range
had a base code for finding the base word
then the extension code would indicate
the fom of the word .
. there is some main combination of prefixes and suffixes,
and then one code page can be adding the prefix word;
ie, stating which prefix combinations are being used .
. there might need to be 2 versions:
one is giving the main combinations,
(eg, -ing-ly is one combination)
the other version, for uncommon combinations,
is using word strings, eg:
anti.dis.establisment.arianism
might be 4 words .
the [all one word] codepage:
. one string code is for all one word:
if (understand) was not a common word,
it would be represented as (under, stand)
this would be independent of code page:
it is saying print the next 2 words without a space .
. another way is to use the backspace char:
every word implies adding a space afterward,
but if there's a backspace char, then it connects words
instead of separating them .
. yet another more generic function is to
have a container with 2 params:
it shows how many of the following words are contained,
that may be up to 5 bits (32 word max)
and it uses 3-bits for indicating
a style for the arrangement of the words:
{ no spaces between words
, nonbreaking spaces, dashes, underscores
, camelcasing, all initials capitalized
, underline
}.
the hash table:
. to build the word code array,
look at all text in the library;
hash each word and place in hash table;
the hash table entry has a record of stats for the word:
how it is spelled (for the sake of collisions)
and how many times it was found .
. then find the 32768 most popular words
and assign them to the shorter, 2-byte codes .
. make a new hash table of (spelling, wordcode)
for turning words into worcodes .
. make an array 0..32767 of (spelling, pre/postfix spellings)
for turning worcodes into words .
portable cross-volume hardlinks
2012.2.26: adde/fs/volume id for cross-volume linking,
[2011.10.2: preface:
. addx is meant to be a portable environment
available for mac, linux, pc, ...;
and one problem to be faced
is a filesystem that has portable links:
they are all incompatible with each other,
except where mac and linux use unix links;
but those are not generally useful;
eg, I can't put a unix link on a fat32 volume,
not even a symbolic (vs hard) link?! [12.2.26:
( a symbolic link is just a text file
that includes the pathname of the resource);
one problem with unix allowing a link on fat32
is that it can be seen by multiple unix versions
each with its own style of pathnames:
# mac unix: /Volumes/myDrive/
# linux unix: /media/myDrive/
but that problem would be true for any removable drive,
not just one formatted with fat32 .
. what's peculiar about fat32 is a competing link format .
. it appears the volume's file system
is responsible for supporting links,
when what is needed is to have a file type
associated with link data ).]
. Apple has a friendly link that works in {mac,fat32},
and it keeps pointing to its target
even if you move the target;
however, each such link weighs half a megabyte!
[2011.10.2: intro:
. for link targets to be recognized across all platforms,
the portable filesystem manager (adde)
must have its own way to identify
every possible link target,
and all the volumes they could be on .]
2012.2.26: converting other OS's links:
. when dealing with other links
(those created by {windows, linux, mac})
adde needs to convert these .
. other apps see the addx link as a text file,
but when asking adde to traverse the file system,
it will follow the link's directions .
12.2.27: tracking unwritable volumes:
. any time a new volume is mounted
adde tries to register it by adding the volume ID;
however, some volumes are not writable,
so they can't be registered in the usual way;
in that case,
adde needs a list of those volumes that were
found to be unregisterable .
. does each volume have some readable serial#?
. you can't depend on checksumming;
because, being unwritable doesn't mean it can't be
modified by another platform
(eg, a writable mac volume will be
unwritable on linux);
so, the only reliable way is to
keep a list of unregisterable volumes,
and ask user which one this is .
. the menu should be sorted by device;
for instance, if this is a mac volume on usb,
there's no use showing cd's on the list).
12.2.27: the 3 types of pointers:
. filenames are pointers to content units;
links are usually considered to be aliases,
where pointers are sharing the same target .
. there are subtle differences of intention
among the various uses of pointers:
# the absolute address:
. goto whatever's at this address
(the address includes the filename
but not the file being named);
if nothing's at this address,
then make a new file with this filename .
# the relative address:
. use the address relative to some ancestor folder
or relative to the current volume,
rather than at a specific full path .
# the object reference:
. stay connected to what's currently at this address
(at the time the link was created);
if the file was moved, then find that file again .
. adde is putting the file ID inside the file;
that makes it easy to stay linked to an object reference .
2012.2.26: types of link exceptions:
# broken link:
. the target file was moved or renamed;
# unmounted volume:
. the target volume is not currently available .
. adde has agreements with the user
for each type of link exception,
whether to report exceptions via mailbox,
or interrupt the user with an alert dialog .
2011.9.2: details:
. adde is assigning a unique ID to every
volume, folder, and file; [12.2.28:
within the addx filesystem only:
for external files the object linking is less robust .]
. since volumes can be shared among acct's,
the volume itself must have a file ID database;
and then each acct must use that to update
the ID database kept on the acct's volume .
. mac's .dmg (disk image) is a virtual volume:
the enclosing volume considers it to be one file,
but after mounting it, it's like an external drive;
so, when looking for volumes to register,
it needs to be on the look-out for mounted .dmg's .
2012.2.26:
. the volume ID has 2 parts:
# the name of the acct that created it;
# the date of this volume's registration .
. the user's ID could be something like their email address,
or a social networking acct address .
file ID uniqueness:
. if there is an endless stream of creations and deletions,
then a volume could run out of file-ID's;
whereas, if they were reused,
then we must make sure there are no clashes;
so, the file ID should include the creation date .
[12.2.28:
. the file ID in a volume database
is specific to that volume
so it need be nothing more than a timestamp,
but that's only true for an infinite precision timestamp;
because, if the timestamp changes only every second,
then the ID needs a serial number for the situation where
the user has asked for thousands of links,
and they were all generated within one second .
. alt'ly, we could keep track of the last timestamp used,
and then we'd have use of
every timestamp between then and now .
... so then it seems,
the simplest idea is to have a serial number
but one of infinite type, like dates are:
we never run out of file-ID's
because we can just keep adding digits .]
[12.2.28: the addx file ID:
. this article has used the same term, file ID,
for both a volume database index,
and for the addx file ID tag inside every file .
. the addx internal file ID is formatted as
creation.date & subject name
but if these files are moved to a shared drive
then to make the ID unique,
the creators name will need to be added .
. and so, in anticipation of this,
an object reference link should always include
the author's name like so:
. check the addx file for an author's name,
if it has none,
then the link's object identifier should include
the name of the person making the link
since it is their acct,
and the file's in it are by default theirs .]
2012.2.26: volume db separate vs integrated:
. in building a volume's file ID database,
the user has the option of efficiency or less clutter:
. some are not happy about hidden system files
because when you bring your removable to another OS,
all the hidden files become visible .
. if hidden folders are considered clutter,
then adde could avoid such clutter by
building a 2nd partial folder system
instead of reusing the current one .
12.2.27: the acct db:
. each user's acct needs an acct database
that contains the list of familiar volume ID's .
. when a link is being resolved
it first has to translate the volume ID
into the string understood by the current OS .
. it has a list of mounted volumes
and expects each to have an adde system file,
containing the volume's ID
and the file ID database .
. it then collects these from each volume
into the acct database
in a table of (volumes currently mounted)
that provides the maps:
volume ID -> volume pathname,
and its inverse .
. any time adde makes a link across volumes
it uses this table to convert the volume's name
into the corresponding volume ID .
12.2.27: sorting unmountables:
. a volume shared with other users
could have links to
volumes that are unknown to the user's acct
so the acct database has 3 lists:
# volumes known to user:
(mountable even if not currently mounted)
# unmountable volumes:
(user indicated these can't be found)
# volumes not asked about yet:
(we won't ask a user whether a volume is mountable
until they click on a link that targets that volume).
12.2.27: volume database has a list of target volumes:
. for each volume, self,
if any of the links on self are
pointing into a volume, v, other than self,
then v's volume ID should be copied into
self's list of target volumes .
. for each volume ID on this list,
there is volume meta info,
(such as a user hint about how to
locate and identify the volume;
or a picture of the removable media);
and during a mount, this volume meta info is
copied into the acct database to build the
table of (volumes currently mounted).
2012.2.26: version# simple links:
. the simplest capability doesn't include
dealing with broken links;
it is creating links and having an engine for
translating them into system-specific links .
. in version# simple links, the link has 3 parts:
# a text string indicating the pathname
(either absolute or relative);
# a volume ID .
[12.2.28:
. those 2 fields specify an file address
(ie, a pointer to a file with a given name)
but for an object reference
(where we are tracking a particular file)
there could also be a 3rd part:
# file ID:
. an object reference means that
when you get to this address,
make sure the file there is the one I had linked to .
. the file's data should include the indicated file ID .
. a file ID can also be used for
jumping into the middle of a file,
to point at a subfolder .
. if the volume and address are unspecified
then it means search for this file ID,
and return me a folder of links to any files containing it .]
2012.2.26: version# broken link finder:
[12.2.28: intro:
. in this advanced version,
we not only make a universal link,
but also repair links that are missing targets .
. of course, this concerns only links to
object references (particular content blocks)
rather than address references
which require only a particular pathname .]
. it might be the case that, after a file rename,
the only way to tell a file's identity
is by searching for the same attributes:
(size, modify date, checksum).
. if the file has been modified and moved or renamed
we might still be able to find the identity
by checking for partial checksums .
. this is possible for files like text,
where blank lines naturally define paragraphs .
. of course in the case of addx files,
we just look inside the file for the file ID
rather than relying on the file's name .
. we cannot assume that every file system will
divide the job into file pointers and content nodes;
therefore we can't depend on
hardlinks to provide a sort of file ID .
12.2.27: web: fat32 hardlink limitations:
. in fat32, the directory entries contain
file pointers rather than handles:
they tell you where on the disk the content is,
rather than where an immovable info node is
that will contain the current pointer;
so then compacting the drive moves the files
thus breaking any hardlinks you had .
. and if you delete a link's target file,
then your hardlinks are pointing at garbage .
. here's a unix command for making hardlinks in FAT32:
mount -o bind /origdir /newdir
. chkdsk will "fix" hardlinks:
. it reports them as cross-links and repairs by
making copies for each alias to own .
. fat32 symbolic links (aka shortcuts)
are before the era of POSIX symbolic links
which are supported by Windows only in Vista+ .
. to support version# broken link finder,
the link's 3rd part, file ID, becomes mandatory .
. back at the volume where the target is,
the file ID locates meta info about that file:
(size, checksum, name
, current url (a cache built from folder list)
, pointer to parent folder
)
. and from there it has the full chain of folders
because each folder has a pointer to its parent folder
(2011: doing it that way saves a lot of
file pathname rewrites
when one of the top level folders gets moved ).
. if the folder system has not been modified recently
then this goes very quickly because
the file meta info includes the full pathname
but if things have changed, then that cache will be stale,
and it has to rebuild the cache by
following the chain of parent folders
and copying their name strings into a unix pathname .
. if the link is dangling,
and the volume database is in a separate folder
rather than integrated with the volume's file system
then it needs to sync the database
with the current file system . [12.2.27:
. the database is only a partial copy of the file system
since the only files included are the aliased ones,
and the only folders included
are the ones that contain the aliased files .]
. to see if the missing file was just moved,
search everywhere starting on the expected volume;
if still not found,
then ask the user whether to search more
or just use a backup .
. if the missing file was both moved and renamed
then this could be a case where
the file has been repurposed,
and so the link should just be deleted .
. the user may know what to do,
but some file changes are made by programs .
[12.2.28:
. a (move, rename, modify) of a non-addx file
is usually indistinguishable from a {delete, create};
and, if the name is quite common,
then a (move, modify) is not always fixable either .
. if it was a text file, then adde would have done
some partial checksums on it,
and in that case, the file is likely findable
because a file modification doesn't usually affect
every single paragraph in the file .
. adde needs to run in the background
and watch the file system for changes .
. modifications can be quite frequent,
so, adde has to be watching first for
file renames and moves,
and check with the volume's file ID database
for whether that's an aliased file or not .
. it should then check for file changes
in order to promptly update partial checksums .]
[2011.10.2: preface:
. addx is meant to be a portable environment
available for mac, linux, pc, ...;
and one problem to be faced
is a filesystem that has portable links:
they are all incompatible with each other,
except where mac and linux use unix links;
but those are not generally useful;
eg, I can't put a unix link on a fat32 volume,
not even a symbolic (vs hard) link?! [12.2.26:
( a symbolic link is just a text file
that includes the pathname of the resource);
one problem with unix allowing a link on fat32
is that it can be seen by multiple unix versions
each with its own style of pathnames:
# mac unix: /Volumes/myDrive/
# linux unix: /media/myDrive/
but that problem would be true for any removable drive,
not just one formatted with fat32 .
. what's peculiar about fat32 is a competing link format .
. it appears the volume's file system
is responsible for supporting links,
when what is needed is to have a file type
associated with link data ).]
. Apple has a friendly link that works in {mac,fat32},
and it keeps pointing to its target
even if you move the target;
however, each such link weighs half a megabyte!
[2011.10.2: intro:
. for link targets to be recognized across all platforms,
the portable filesystem manager (adde)
must have its own way to identify
every possible link target,
and all the volumes they could be on .]
2012.2.26: converting other OS's links:
. when dealing with other links
(those created by {windows, linux, mac})
adde needs to convert these .
. other apps see the addx link as a text file,
but when asking adde to traverse the file system,
it will follow the link's directions .
12.2.27: tracking unwritable volumes:
. any time a new volume is mounted
adde tries to register it by adding the volume ID;
however, some volumes are not writable,
so they can't be registered in the usual way;
in that case,
adde needs a list of those volumes that were
found to be unregisterable .
. does each volume have some readable serial#?
. you can't depend on checksumming;
because, being unwritable doesn't mean it can't be
modified by another platform
(eg, a writable mac volume will be
unwritable on linux);
so, the only reliable way is to
keep a list of unregisterable volumes,
and ask user which one this is .
. the menu should be sorted by device;
for instance, if this is a mac volume on usb,
there's no use showing cd's on the list).
12.2.27: the 3 types of pointers:
. filenames are pointers to content units;
links are usually considered to be aliases,
where pointers are sharing the same target .
. there are subtle differences of intention
among the various uses of pointers:
# the absolute address:
. goto whatever's at this address
(the address includes the filename
but not the file being named);
if nothing's at this address,
then make a new file with this filename .
# the relative address:
. use the address relative to some ancestor folder
or relative to the current volume,
rather than at a specific full path .
# the object reference:
. stay connected to what's currently at this address
(at the time the link was created);
if the file was moved, then find that file again .
. adde is putting the file ID inside the file;
that makes it easy to stay linked to an object reference .
2012.2.26: types of link exceptions:
# broken link:
. the target file was moved or renamed;
# unmounted volume:
. the target volume is not currently available .
. adde has agreements with the user
for each type of link exception,
whether to report exceptions via mailbox,
or interrupt the user with an alert dialog .
2011.9.2: details:
. adde is assigning a unique ID to every
volume, folder, and file; [12.2.28:
within the addx filesystem only:
for external files the object linking is less robust .]
. since volumes can be shared among acct's,
the volume itself must have a file ID database;
and then each acct must use that to update
the ID database kept on the acct's volume .
. mac's .dmg (disk image) is a virtual volume:
the enclosing volume considers it to be one file,
but after mounting it, it's like an external drive;
so, when looking for volumes to register,
it needs to be on the look-out for mounted .dmg's .
2012.2.26:
. the volume ID has 2 parts:
# the name of the acct that created it;
# the date of this volume's registration .
. the user's ID could be something like their email address,
or a social networking acct address .
file ID uniqueness:
. if there is an endless stream of creations and deletions,
then a volume could run out of file-ID's;
whereas, if they were reused,
then we must make sure there are no clashes;
so, the file ID should include the creation date .
[12.2.28:
. the file ID in a volume database
is specific to that volume
so it need be nothing more than a timestamp,
but that's only true for an infinite precision timestamp;
because, if the timestamp changes only every second,
then the ID needs a serial number for the situation where
the user has asked for thousands of links,
and they were all generated within one second .
. alt'ly, we could keep track of the last timestamp used,
and then we'd have use of
every timestamp between then and now .
... so then it seems,
the simplest idea is to have a serial number
but one of infinite type, like dates are:
we never run out of file-ID's
because we can just keep adding digits .]
[12.2.28: the addx file ID:
. this article has used the same term, file ID,
for both a volume database index,
and for the addx file ID tag inside every file .
. the addx internal file ID is formatted as
creation.date & subject name
but if these files are moved to a shared drive
then to make the ID unique,
the creators name will need to be added .
. and so, in anticipation of this,
an object reference link should always include
the author's name like so:
. check the addx file for an author's name,
if it has none,
then the link's object identifier should include
the name of the person making the link
since it is their acct,
and the file's in it are by default theirs .]
2012.2.26: volume db separate vs integrated:
. in building a volume's file ID database,
the user has the option of efficiency or less clutter:
. some are not happy about hidden system files
because when you bring your removable to another OS,
all the hidden files become visible .
. if hidden folders are considered clutter,
then adde could avoid such clutter by
building a 2nd partial folder system
instead of reusing the current one .
12.2.27: the acct db:
. each user's acct needs an acct database
that contains the list of familiar volume ID's .
. when a link is being resolved
it first has to translate the volume ID
into the string understood by the current OS .
. it has a list of mounted volumes
and expects each to have an adde system file,
containing the volume's ID
and the file ID database .
. it then collects these from each volume
into the acct database
in a table of (volumes currently mounted)
that provides the maps:
volume ID -> volume pathname,
and its inverse .
. any time adde makes a link across volumes
it uses this table to convert the volume's name
into the corresponding volume ID .
12.2.27: sorting unmountables:
. a volume shared with other users
could have links to
volumes that are unknown to the user's acct
so the acct database has 3 lists:
# volumes known to user:
(mountable even if not currently mounted)
# unmountable volumes:
(user indicated these can't be found)
# volumes not asked about yet:
(we won't ask a user whether a volume is mountable
until they click on a link that targets that volume).
12.2.27: volume database has a list of target volumes:
. for each volume, self,
if any of the links on self are
pointing into a volume, v, other than self,
then v's volume ID should be copied into
self's list of target volumes .
. for each volume ID on this list,
there is volume meta info,
(such as a user hint about how to
locate and identify the volume;
or a picture of the removable media);
and during a mount, this volume meta info is
copied into the acct database to build the
table of (volumes currently mounted).
2012.2.26: version# simple links:
. the simplest capability doesn't include
dealing with broken links;
it is creating links and having an engine for
translating them into system-specific links .
. in version# simple links, the link has 3 parts:
# a text string indicating the pathname
(either absolute or relative);
# a volume ID .
[12.2.28:
. those 2 fields specify an file address
(ie, a pointer to a file with a given name)
but for an object reference
(where we are tracking a particular file)
there could also be a 3rd part:
# file ID:
. an object reference means that
when you get to this address,
make sure the file there is the one I had linked to .
. the file's data should include the indicated file ID .
. a file ID can also be used for
jumping into the middle of a file,
to point at a subfolder .
. if the volume and address are unspecified
then it means search for this file ID,
and return me a folder of links to any files containing it .]
2012.2.26: version# broken link finder:
[12.2.28: intro:
. in this advanced version,
we not only make a universal link,
but also repair links that are missing targets .
. of course, this concerns only links to
object references (particular content blocks)
rather than address references
which require only a particular pathname .]
. it might be the case that, after a file rename,
the only way to tell a file's identity
is by searching for the same attributes:
(size, modify date, checksum).
. if the file has been modified and moved or renamed
we might still be able to find the identity
by checking for partial checksums .
. this is possible for files like text,
where blank lines naturally define paragraphs .
. of course in the case of addx files,
we just look inside the file for the file ID
rather than relying on the file's name .
. we cannot assume that every file system will
divide the job into file pointers and content nodes;
therefore we can't depend on
hardlinks to provide a sort of file ID .
12.2.27: web: fat32 hardlink limitations:
. in fat32, the directory entries contain
file pointers rather than handles:
they tell you where on the disk the content is,
rather than where an immovable info node is
that will contain the current pointer;
so then compacting the drive moves the files
thus breaking any hardlinks you had .
. and if you delete a link's target file,
then your hardlinks are pointing at garbage .
. here's a unix command for making hardlinks in FAT32:
mount -o bind /origdir /newdir
. chkdsk will "fix" hardlinks:
. it reports them as cross-links and repairs by
making copies for each alias to own .
. fat32 symbolic links (aka shortcuts)
are before the era of POSIX symbolic links
which are supported by Windows only in Vista+ .
. to support version# broken link finder,
the link's 3rd part, file ID, becomes mandatory .
. back at the volume where the target is,
the file ID locates meta info about that file:
(size, checksum, name
, current url (a cache built from folder list)
, pointer to parent folder
)
. and from there it has the full chain of folders
because each folder has a pointer to its parent folder
(2011: doing it that way saves a lot of
file pathname rewrites
when one of the top level folders gets moved ).
. if the folder system has not been modified recently
then this goes very quickly because
the file meta info includes the full pathname
but if things have changed, then that cache will be stale,
and it has to rebuild the cache by
following the chain of parent folders
and copying their name strings into a unix pathname .
. if the link is dangling,
and the volume database is in a separate folder
rather than integrated with the volume's file system
then it needs to sync the database
with the current file system . [12.2.27:
. the database is only a partial copy of the file system
since the only files included are the aliased ones,
and the only folders included
are the ones that contain the aliased files .]
. to see if the missing file was just moved,
search everywhere starting on the expected volume;
if still not found,
then ask the user whether to search more
or just use a backup .
. if the missing file was both moved and renamed
then this could be a case where
the file has been repurposed,
and so the link should just be deleted .
. the user may know what to do,
but some file changes are made by programs .
[12.2.28:
. a (move, rename, modify) of a non-addx file
is usually indistinguishable from a {delete, create};
and, if the name is quite common,
then a (move, modify) is not always fixable either .
. if it was a text file, then adde would have done
some partial checksums on it,
and in that case, the file is likely findable
because a file modification doesn't usually affect
every single paragraph in the file .
. adde needs to run in the background
and watch the file system for changes .
. modifications can be quite frequent,
so, adde has to be watching first for
file renames and moves,
and check with the volume's file ID database
for whether that's an aliased file or not .
. it should then check for file changes
in order to promptly update partial checksums .]
Labels:
adde,
journaled fs
kindle++ (emulating a book)
2.23: adde/kindle++ (emulating a book):
. to be like a book
kindle needs a virtual sideview of the book
so you see an array of pages,
then you open book at a page .
. you can cursor down approximates page,
and then brushing sideways provides
fine adjustment of page selection .
. it can show recent and frequent pages as being wider,
and it can show page number and subtitles of current page
which would be even better than a book .
[2.28:
. you can have a toc not of the book's content
but of all your notes and starred pages .]
. to be like a book
kindle needs a virtual sideview of the book
so you see an array of pages,
then you open book at a page .
. you can cursor down approximates page,
and then brushing sideways provides
fine adjustment of page selection .
. it can show recent and frequent pages as being wider,
and it can show page number and subtitles of current page
which would be even better than a book .
[2.28:
. you can have a toc not of the book's content
but of all your notes and starred pages .]
2012-02-27
efficiently panning large documents
2011.9.15: adde/gui/efficiently panning large documents:
. when a window is much smaller than
the document of a network graph,
do you have to keep the whole bit-image in memory?
[9.18:
. the ideal is having a small engine that is
generating a small data set as-needed,
expanding from drawing instructions
into bit maps,
rather than sending expanded pages to off-ram storage;
then again,
speedy solid-state drives are getting common,
and energy is getting precious ... .]
. for saving energy,
a network graph is stored as an image;
for save space and hard drive accesses,
it would be a list of nodes with attributes:
(inout edges, global position, ...).
. a search function given window area
would return any nodes within that region .
. I had this mem-saving scheme for drawing it:
rather than allow panning,
move to birdseye view (zoom out some),
then set your zoom-in box to go there;
the window sized image is kept in memory,
but few or no closups are cached .
. this idea is not that important on a fast computer;
because, you can respond on the fly:
. whichever way the user is panning
find the sections about to be exposed,
and create the images just in time .
. for each section,
you need to know which nodes are in that section,
and then to draw the edges,
you need to know the nodes that connect to
your drawn nodes;
the edge path is a style of line
{linear, Bézier curve, ...}
between the global positions of 2 nodes
which is then mapped across any needed section
but drawn only on currently existing sections
(those expected to be viewed soon).
9.16: edges as objects:
. to help speed things up
by quickly finding per-section edge lists,
you could have an edges list too;
each edge would know:
# the locations of its end-point nodes;
# the points that define how it curves;
# a list of the sections
that the edge's line runs through .
9.16: space walking:
. if you have nodes and edges in 3-d,
then you may want to follow an edge
and see things nearby along the way;
this can be done with some of the previous ideas:
. a line function is a series of global position points,
and for each point you figure the 3-d region,
then you know the nearby regions,
and finally,
you can search the edges list for a region,
to get a list of every edge intersecting that region .
9.16: web.adde/gui/terminology:
. how is the graphics industry using "(sector)?
Portal rendering:
. when a window is much smaller than
the document of a network graph,
do you have to keep the whole bit-image in memory?
[9.18:
. the ideal is having a small engine that is
generating a small data set as-needed,
expanding from drawing instructions
into bit maps,
rather than sending expanded pages to off-ram storage;
then again,
speedy solid-state drives are getting common,
and energy is getting precious ... .]
. for saving energy,
a network graph is stored as an image;
for save space and hard drive accesses,
it would be a list of nodes with attributes:
(inout edges, global position, ...).
. a search function given window area
would return any nodes within that region .
. I had this mem-saving scheme for drawing it:
rather than allow panning,
move to birdseye view (zoom out some),
then set your zoom-in box to go there;
the window sized image is kept in memory,
but few or no closups are cached .
. this idea is not that important on a fast computer;
because, you can respond on the fly:
. whichever way the user is panning
find the sections about to be exposed,
and create the images just in time .
. for each section,
you need to know which nodes are in that section,
and then to draw the edges,
you need to know the nodes that connect to
your drawn nodes;
the edge path is a style of line
{linear, Bézier curve, ...}
between the global positions of 2 nodes
which is then mapped across any needed section
but drawn only on currently existing sections
(those expected to be viewed soon).
9.16: edges as objects:
. to help speed things up
by quickly finding per-section edge lists,
you could have an edges list too;
each edge would know:
# the locations of its end-point nodes;
# the points that define how it curves;
# a list of the sections
that the edge's line runs through .
9.16: space walking:
. if you have nodes and edges in 3-d,
then you may want to follow an edge
and see things nearby along the way;
this can be done with some of the previous ideas:
. a line function is a series of global position points,
and for each point you figure the 3-d region,
then you know the nearby regions,
and finally,
you can search the edges list for a region,
to get a list of every edge intersecting that region .
9.16: web.adde/gui/terminology:
. how is the graphics industry using "(sector)?
Portal rendering:
In computer-generated imagery
and real-time 3D computer graphics,
portal rendering is an algorithm
for visibility determination.
For example,
consider a 3D computer game environment,
which may contain many polygons,
only a few of which may be visible on screen
at a given time.
A portal system is based on using
the partitioning of space
to form generalizations about
the visibility of objects within those spaces.
Regions of map space are divided into
polygonal, generally convex, areas called sectors.
Adjacent sectors are linked to one another via
shared dividing polygons termed portals.
Approaches that pre-compute visibility for sectors
are referred to as PVS (potentially visible set) methods.
Labels:
adde,
algorithms,
gui
Subscribe to:
Posts (Atom)