Americium Dream Documents: wordcode

2009-12-28

wordcode

9.27: todo.adde/wordcode/decomposing word`parts:

. teasing words apart could get done pretty fast,

once you have a list of of basic parts,

and then have a list of

all words in your db that have those parts .

. it lists the words, you study them,

and it makes it easy for you catch the exceptions

or find new patterns in words for more efficient coding .

10.22: adda/unix/tools communicating with binary pipes:

. the unix way

is to have tools communicating with text pipes,

whereas, the goal of adda

is to have a comm'standard that's binary;

. unix is the primary target platform;

so, I'm wondering how to efficiently pack binary

into unix text strings

(where there can be no null's; ie, no bytes = 00) .

sockets:

. use of string may be a requirement of

tool communications within a std unix shell;

but, for connecting tools within your own shell,

unix sockets can provide binary app-to-app pipes .

. that way,

you can have your app's talk to each other in binary

while exports to others can be done by

translating your binary to their {unicode, xml, ...} .

10.22: adda/unix/wrapping binary files in text:

. the new std is to use unicode,

and these values

can be reused for a binary std's wordcodes

(similar to the way chinese text

has a separate character for each word) .

. a more efficent way

is to think of each byte as being one digit of a number

(there are 255 non-zero values in a byte) .

. if practicality requires your number be in base 2**n

then a byte can support a number system of base 128:

(having 1..127 map to the same,

and zero is quickly flipped to be FF#16 (-1) )

. that still leaves each byte's other negative values

to mean something else;

eg, when finding a negaitive byte,

get the binary complement;

and if not 0,

then have the byte represent n+1 consecutive zeroes .

[10.28:

. or more likely,

they could be reserved for indicating

the type or length of the next digit sequence;

eg, then your number stream could be variable-length

like unicode,

except it could have string descriptors,

where a negative would say that until the next descriptor,

the default number length would be 4bytes instead of 1 .

(unlike unix where everything was byte-based,

this would be word-based,

so apon reading the next element of a file,

it uses these descriptors to find complete elements)

] .

11.9: engl/word.ules:

A lexeme is an abstract unit of morphological analysis in linguistics,

that roughly corresponds to a set of forms taken by a single word.

For example, in the English language,

run, runs, ran and running are forms of the same lexeme, run .

Lexemes are often composed of smaller units

with individual meaning called morphemes .

Americium Dream Documents

2009-12-28

wordcode

No comments:

Post a Comment

(As an Amazon Associate I earn from qualifying purchases.); pages of alpha doc's

posts by category

Blog Archive

tags

About Me

Facebook

search Wikipedia

Search This Blog