2012-05-01

converting attribute-text to html

4.29: adde/gui/converting attribute-text to html:
. a pyqt text object uses a system of fragments to express
each change in the text's combinations of attributes in
(color, bold shade, underline, italics, superscript, subscript );
but when converting to html,
you want to combine contiguous modes;
eg, say you have 3 words, 1,2,3,
with u=underline, and b=bold;
so, when pyqt gives you:
1(u), 2(b,u), 3(u)
your html writer should be converting that to:
u(1, b(2), 3) .

. the first job is to convert a pyqt fragment
into a text-record: ( text, list of attribute symbols ).
. the first idea is to build a tree directly from this,
but it seemed easier to just
traverse the list of text-records,
and adjust the list of attrib's,
so that they instead indicated starts and stops;
ie, the current attribute symbols will be starts,
and when we find the end of an attrib,
we add an attrib-end to the following text-record;
finally, add an empty text-record to the list
as the end-of-file sentinel;
and we can begin the next phase,
processing the text-tecords:

. curr-attrib is the list of currently active attributes;
start out with curr-attrib = empty;
here's the cases that need to be handled:

for each text-record:
  for attrib in list-of-supported-attribs:
    rec.truth = (attrib is in text-record's attrib list);
    cur.truth = (attrib is in curr-attrib);
    -- true = 1, and false = 0;
    -- so we make a number from truth, and case that:
    rec*2 + cur ?
    # 2+0 --[ in rec, not in cur ] :
        keep the start symbol in text-record;
        and add same to curr-attrib .
    # 2+1 --[ in rec, in cur (middle of attrib)] :
        remove attrib from text-record
    # 0+0 --[ not in rec, not in cur ] :
        --[ nothing to do in this case ] pass .
    # 0+1 --[ not in rec, in cur (attrib has ended)]:
        add the attrib's end-symbol to text-record .
    #.

. now we convert text records to html:
for each attrib in the text record's list of attrib's,
there is either no mention of that attrib,
or an attrib symbol indicating a start of attrib,
or an attrib-end symbol .
. the attrib-ends are converted first,
then the attribs, and finally the text .

. if there is more than one attrib-end in a text-record
then you need to make sure the order entails a nesting
instead of an overlapping of modes
to suit html's tree-shape expectations .
5.1:
. end-set = (the attrib-end's in current text-record),
begin-set = (attrib's corresponding to our end-set)
. we need to traverse backwards in the list of text records
to find the first occurrence of each item in the begin-set;
then we can order those occurrences,
and apply the reverse of that order to our end-set .
. from what we need to do at this stage,
it seems apparent that
instead of having a new set of symbols for the attrib-ends
we should use the same attrib symbols
but put them in separate {start, stop} lists;
thus, we should structure the text record as:
( text, start list, stop list)
then for a attrib's {starts, stops} we put the attrib symbol
within the text record's {start, stop} lists, respectively .