2012-02-28

portable cross-volume hardlinks

2012.2.26: adde/fs/volume id for cross-volume linking,
[2011.10.2: preface:

. addx is meant to be a portable environment
available for mac, linux, pc, ...;
and one problem to be faced
is a filesystem that has portable links:
they are all incompatible with each other,
except where mac and linux use unix links;
but those are not generally useful;
eg, I can't put a unix link on a fat32 volume,
not even a symbolic (vs hard) link?! [12.2.26:
( a symbolic link is just a text file
that includes the pathname of the resource);
one problem with unix allowing a link on fat32
is that it can be seen by multiple unix versions
each with its own style of pathnames:
# mac unix: /Volumes/myDrive/
# linux unix: /media/myDrive/
but that problem would be true for any removable drive,
not just one formatted with fat32 .
. what's peculiar about fat32 is a competing link format .
. it appears the volume's file system
is responsible for supporting links,
when what is needed is to have a file type
associated with link data ).]

. Apple has a friendly link that works in {mac,fat32},
and it keeps pointing to its target
even if you move the target;
however, each such link weighs half a megabyte!

[2011.10.2: intro:

. for link targets to be recognized across all platforms,
the portable filesystem manager (adde)
must have its own way to identify
every possible link target,
and all the volumes they could be on .]

2012.2.26: converting other OS's links:
. when dealing with other links
(those created by {windows, linux, mac})
adde needs to convert these .
. other apps see the addx link as a text file,
but when asking adde to traverse the file system,
it will follow the link's directions .

12.2.27: tracking unwritable volumes:
. any time a new volume is mounted
adde tries to register it by adding the volume ID;
however, some volumes are not writable,
so they can't be registered in the usual way;
in that case,
adde needs a list of those volumes that were
found to be unregisterable .
. does each volume have some readable serial#?
. you can't depend on checksumming;
because, being unwritable doesn't mean it can't be
modified by another platform
(eg, a writable mac volume will be
unwritable on linux);
so, the only reliable way is to
keep a list of unregisterable volumes,
and ask user which one this is .
. the menu should be sorted by device;
for instance, if this is a mac volume on usb,
there's no use showing cd's on the list).

12.2.27: the 3 types of pointers:

. filenames are pointers to content units;
links are usually considered to be aliases,
where pointers are sharing the same target .
. there are subtle differences of intention
among the various uses of pointers:

# the absolute address:
. goto whatever's at this address
(the address includes the filename
but not the file being named);
if nothing's at this address,
then make a new file with this filename .

# the relative address:
. use the address relative to some ancestor folder
or relative to the current volume,
rather than at a specific full path .

# the object reference:
. stay connected to what's currently at this address
(at the time the link was created);
if the file was moved, then find that file again .
. adde is putting the file ID inside the file;
that makes it easy to stay linked to an object reference .

2012.2.26: types of link exceptions:
# broken link:
. the target file was moved or renamed;
# unmounted volume:
. the target volume is not currently available .
. adde has agreements with the user
for each type of link exception,
whether to report exceptions via mailbox,
or interrupt the user with an alert dialog .

2011.9.2: details:

. adde is assigning a unique ID to every
volume, folder, and file; [12.2.28:
within the addx filesystem only:
for external files the object linking is less robust .]
. since volumes can be shared among acct's,
the volume itself must have a file ID database;
and then each acct must use that to update
the ID database kept on the acct's volume .

. mac's .dmg (disk image) is a virtual volume:
the enclosing volume considers it to be one file,
but after mounting it, it's like an external drive;
so, when looking for volumes to register,
it needs to be on the look-out for mounted .dmg's .

2012.2.26:
. the volume ID has 2 parts:
# the name of the acct that created it;
# the date of this volume's registration .
. the user's ID could be something like their email address,
or a social networking acct address .

file ID uniqueness:
. if there is an endless stream of creations and deletions,
then a volume could run out of file-ID's;
whereas, if they were reused,
then we must make sure there are no clashes;
so, the file ID should include the creation date .
[12.2.28:
. the file ID in a volume database
is specific to that volume
so it need be nothing more than a timestamp,
but that's only true for an infinite precision timestamp;
because, if the timestamp changes only every second,
then the ID needs a serial number for the situation where
the user has asked for thousands of links,
and they were all generated within one second .
. alt'ly, we could keep track of the last timestamp used,
and then we'd have use of
every timestamp between then and now .
... so then it seems,
the simplest idea is to have a serial number
but one of infinite type, like dates are:
we never run out of file-ID's
because we can just keep adding digits .]

[12.2.28: the addx file ID:
. this article has used the same term, file ID,
for both a volume database index,
and for the addx file ID tag inside every file .
. the addx internal file ID is formatted as
creation.date & subject name
but if these files are moved to a shared drive
then to make the ID unique,
the creators name will need to be added .
. and so, in anticipation of this,
an object reference link should always include
the author's name like so:
. check the addx file for an author's name,
if it has none,
then the link's object identifier should include
the name of the person making the link
since it is their acct,
and the file's in it are by default theirs .]

2012.2.26: volume db separate vs integrated:
. in building a volume's file ID database,
the user has the option of efficiency or less clutter:
. some are not happy about hidden system files
because when you bring your removable to another OS,
all the hidden files become visible .
. if hidden folders are considered clutter,
then adde could avoid such clutter by
building a 2nd partial folder system
instead of reusing the current one .

12.2.27: the acct db:
. each user's acct needs an acct database
that contains the list of familiar volume ID's .
. when a link is being resolved
it first has to translate the volume ID
into the string understood by the current OS .
. it has a list of mounted volumes
and expects each to have an adde system file,
containing the volume's ID
and the file ID database .
. it then collects these from each volume
into the acct database
in a table of (volumes currently mounted)
that provides the maps:
volume ID -> volume pathname,
and its inverse .
. any time adde makes a link across volumes
it uses this table to convert the volume's name
into the corresponding volume ID .

12.2.27: sorting unmountables:
. a volume shared with other users
could have links to
volumes that are unknown to the user's acct
so the acct database has 3 lists:
# volumes known to user:
(mountable even if not currently mounted)
# unmountable volumes:
(user indicated these can't be found)
# volumes not asked about yet:
(we won't ask a user whether a volume is mountable
until they click on a link that targets that volume).

12.2.27: volume database has a list of target volumes:
. for each volume, self,
if any of the links on self are
pointing into a volume, v, other than self,
then v's volume ID should be copied into
self's list of target volumes .
. for each volume ID on this list,
there is volume meta info,
(such as a user hint about how to
locate and identify the volume;
or a picture of the removable media);
and during a mount, this volume meta info is
copied into the acct database to build the
table of (volumes currently mounted).

2012.2.26: version# simple links:

. the simplest capability doesn't include
dealing with broken links;
it is creating links and having an engine for
translating them into system-specific links .

. in version# simple links, the link has 3 parts:
# a text string indicating the pathname
(either absolute or relative);
# a volume ID .
[12.2.28:
. those 2 fields specify an file address
(ie, a pointer to a file with a given name)
but for an object reference
(where we are tracking a particular file)
there could also be a 3rd part:
# file ID:
. an object reference means that
when you get to this address,
make sure the file there is the one I had linked to .
. the file's data should include the indicated file ID .
. a file ID can also be used for
jumping into the middle of a file,
to point at a subfolder .
. if the volume and address are unspecified
then it means search for this file ID,
and return me a folder of links to any files containing it .]

2012.2.26: version# broken link finder:
 
[12.2.28: intro:
. in this advanced version,
we not only make a universal link,
but also repair links that are missing targets .
. of course, this concerns only links to
object references (particular content blocks)
rather than address references
which require only a particular pathname .]

. it might be the case that, after a file rename,
the only way to tell a file's identity
is by searching for the same attributes:
(size, modify date, checksum).
. if the file has been modified and moved or renamed
we might still be able to find the identity
by checking for partial checksums .
. this is possible for files like text,
where blank lines naturally define paragraphs .
. of course in the case of addx files,
we just look inside the file for the file ID
rather than relying on the file's name .

. we cannot assume that every file system will
divide the job into file pointers and content nodes;
therefore we can't depend on
hardlinks to provide a sort of file ID .

12.2.27: web: fat32 hardlink limitations:
. in fat32, the directory entries contain
file pointers rather than handles:
they tell you where on the disk the content is,
rather than where an immovable info node is
that will contain the current pointer;
so then compacting the drive moves the files
thus breaking any hardlinks you had .
. and if you delete a link's target file,
then your hardlinks are pointing at garbage .
. here's a unix command for making hardlinks in FAT32:
mount -o bind /origdir /newdir
. chkdsk will "fix" hardlinks:
. it reports them as cross-links and repairs by
making copies for each alias to own .
. fat32 symbolic links (aka shortcuts)
are before the era of POSIX symbolic links
which are supported by Windows only in Vista+ .

. to support version# broken link finder,
the link's 3rd part, file ID, becomes mandatory .

. back at the volume where the target is,
the file ID locates meta info about that file:
(size, checksum, name
, current url (a cache built from folder list)
, pointer to parent folder
)
. and from there it has the full chain of folders
because each folder has a pointer to its parent folder
(2011: doing it that way saves a lot of
file pathname rewrites
when one of the top level folders gets moved ).

. if the folder system has not been modified recently
then this goes very quickly because
the file meta info includes the full pathname
but if things have changed, then that cache will be stale,
and it has to rebuild the cache by
following the chain of parent folders
and copying their name strings into a unix pathname .

. if the link is dangling,
and the volume database is in a separate folder
rather than integrated with the volume's file system
then it needs to sync the database
with the current file system . [12.2.27:
. the database is only a partial copy of the file system
since the only files included are the aliased ones,
and the only folders included
are the ones that contain the aliased files .]

. to see if the missing file was just moved,
search everywhere starting on the expected volume;
if still not found,
then ask the user whether to search more
or just use a backup .

. if the missing file was both moved and renamed
then this could be a case where
the file has been repurposed,
and so the link should just be deleted .
. the user may know what to do,
but some file changes are made by programs .
[12.2.28:
. a (move, rename, modify) of a non-addx file
is usually indistinguishable from a {delete, create};
and, if the name is quite common,
then a (move, modify) is not always fixable either .
. if it was a text file, then adde would have done
some partial checksums on it,
and in that case, the file is likely findable
because a file modification doesn't usually affect
every single paragraph in the file .

. adde needs to run in the background
and watch the file system for changes .
. modifications can be quite frequent,
so, adde has to be watching first for
file renames and moves,
and check with the volume's file ID database
for whether that's an aliased file or not .
. it should then check for file changes
in order to promptly update partial checksums .]

No comments:

Post a Comment