2009-10-15

scm on google`code hosting

scm (source control mgt)
[10.15:
. I've been eagerly preparing to have my code hosted on Google`code,
reading up on svn,
but I was more impressed by Linus Torvald's presentation of Git,
so I was quite interested in Google's adoption of Hg instead of Git . ]



6.7: look! google`code hosting now with hg!-- now supporting mercurial? why not git?
my summary of google-code-updates.blogspot.com:
. The primary reason
was to support google's large base of existing Subversion users
that want to use a distributed version control system
. For these users we felt that Mercurial had the lowest barrier to adoption
because of its similar command set,
great documentation (including a great online book),
and excellent tools such as Tortoise Hg .
. Second,
In terms of implementation effort,
Mercurial has a clear advantage due to its efficient HTTP transport protocol.
In terms of features, Git is more powerful,
but this tends to be offset by it being more complicated to use.
[also:
. the Python repository is moving to hg
http://www.python.org/dev/peps/pep-0374/(google is a champion of the Python.lang)
(
At PyCon 2009, a decision was made to go with Mercurial.
four important reasons:

* According to a small survey,
Python developers are more interested in Mercurial than in Bazaar or Git.
* Mercurial is written in Python,
which is congruent with the python-dev tendency to 'eat their own dogfood'.
[6.7.1924:
.. but the whole idea of python is boa constriction:
you start loose with scripting,
and then tighten things up with c code;
so depending how large the code base is,
you might say git jumped the gun,
but it's still the same dog food ]
* Mercurial is significantly faster than bzr
(it's slower than git, though by a much smaller difference).
* Mercurial is easier to learn for SVN users than bzr.

Agreeing on a common terminology is surprisingly difficult,
primarily because each VCS uses these terms when describing
subtly different tasks, objects, and concepts.
Where possible, we try to provide a generic definition of the concepts,
but you should consult the individual system's glossaries for details.
Here are some basic references for terminology,
from some of the standard web-based references on each VCS.
You can also refer to glossaries for each DVCS:
* Subversion : http://svnbook.red-bean.com/en/1.5/svn.basic.html
* Bazaar : http://bazaar-vcs.org/BzrGlossary
* Mercurial : http://www.selenic.com/mercurial/wiki/index.cgi/UnderstandingMercurial
* git : http://book.git-scm.com/1_the_git_object_model.html
) ]

background:
. Mercurial, like Git and Bazaar,
is a distributed version control system (DVCS)
that enables developers to work offline
and define more complex workflows
such as peer-to-peer pushing/pulling of code.
. the ease of cloning and merging of remote repositories
also makes it easier for outside contributors to contribute to projects .


Analysis of Git and Mercurial (summer 2008)

In traditional version control systems,
there is a central repository that maintains all history.
Clients must interact with this repository
to examine file history, look at other branches, or commit changes.
Typically, clients have a local copy of their current version
but no local storage of previous versions or alternate branches.
whereas, Distributed Version Control Systems (DVCS)
do have previous versions or alternate branches.
. individuals can share versions with each other
without having to share with everyone .
A push operation transfers some local information to a remote repository,
and a pull copies remote information to the local repository.

Note that neither repository is necessarily
"authoritative" with respect to the other
. Both repositories may have some local history
that the other does not have yet.
One key feature of any DVCS system is to make it easy for repositories
to unambiguously describe the history they have
(and the history they are requesting).
Both Git and Mercurial do this by using SHA1 hashes
to identify data (files, trees, changesets, etc).

. the terms client and server
apply only when pushing or pulling a central repository,
private communications are referred to occurring between
local and remote repositories .

Git Advantages

* Client Storage Management.
Both Mercurial and Git allow users to selectively pull branches
. This provides an upfront mechanism for
narrowing the amount of history stored locally
. In addition, Git allows previously pulled branches to be discarded.
Git also allows old revision data to be pruned from the local repository
(while still keeping recent revision data on those branches).
With [present versions of] Mercurial,
if a branch is in the local repository,
then all of its revisions (back to the very initial commit)
must also be present,
and there is no way to prune branches other than by
creating a new repository and selectively pulling branches into it.

* Number of Parents.
Git supports an unlimited number of parent revisions during a merge.
Mercurial only allows two parents.
In order to achieve an N-way merge in Mercurial,
the user will have to perform N-1 two-way merges
. Although in many cases this is also the preferred way to merge N parents
regardless of DVCS,
with Git the user can perform an N-way merge in one step if so desired.

Mercurial Advantages

. Git has a steeper learning curve than Mercurial
Git has more commands and options,
Mercurial's documentation tends to be more complete
and its terminology also closer to Subversion and CVS,
making it familiar to people migrating from those systems.
Mercurial is Python based,
and the official distribution runs cleanly under Windows
(as well as Linux, Mac OS X, etc).

* Maintenance
. Git requires periodic maintenance of repositories (i.e. git-gc),
Mercurial does not require such maintenance.
Note, however, that Mercurial is also a lot less sophisticated
with respect to managing the clients disk space

Other Differences

* Rename/Copy Tracking
. Git does not explicitly track file renaming or copying.
Instead, commands such as git-log look for cases where
an identical file appears earlier in the repository history
and infers the rename or copy.
Mercurial takes the more familiar approach
of providing explicit rename and copy commands
and storing this as part of the history of a file.
Each approach has both advantages and disadvantages,
and it isn't clear that either approach is universally "better" than the other.

* Architecture.
Git was originally a large number of shell scripts
and unix commands implemented in C.
Over time, a common library that shared between commands has been developed,
and many of the commands have been built into the main git executable.
Mercurial is implemented mostly in Python
(with a small amount of C),
and has an extension API for Python plug-in's .

* Private History
. In Git, the default mode is for developers to have their own
local (and private) tags/branches/revisions,
and to exercise a lot of control over what becomes public .
. the Mercurial emphasis is the other way around:
its default push/pull behavior is to share all information
whereas extra steps are needed in order to share a subset .
( both systems are generally capable of supporting
either mode of operation ) .

* Branch Namespace
. In Git, each repository has its own branch namespace,
and users set up a mapping between local branchnames and remote ones.
With Mercurial, there is a single branch namespace
shared by all repositories.

* Rebasing.
( Mercurial only recently implemented rebasing )
Git allows you to take a local branch and
change its branch point to a more recent revision.
. while merging is the right thing to do from an SCM perspective
-- if the focus is on 'reproducibility of past states' --
sometimes the focus is instead
'authoring a clean software revision history' .
Rebasing is about safely merging every commit individually
and then deleting the old local-based versions
so they don't clutter the tree.

Implementation

Data Storage
. Both Git and Mercurial internally work with very similar data:
revisions of files along with
a small amount of meta information (parents, author, etc).
They both have objects that represent a project-wide commit,
and these are also versioned.
They both have objects that associate a commit with a set of file versions.
In Git, this is a tree object
(with tree objects for directories
and references to file revisions as the leaves).
In Mercurial, there is a manifest
(a flat list mapping pathnames to file revision objects).
Aside from the manifest/tree difference, both are very similar in terms of
how objects are searched and walked.
. Git uses a combination of
storing objects directly in the file system (indexed by SHA1 hash)
and packing multiple objects into larger compressed files;
. Mercurial uses a revlog structure
(a concatenation of revision diffs
with periodic snapshots of a complete revision).

The only major difference for the data storage layer
is the implementation language.
If a significant amount of Git/Mercurial code is to be reused,
then a Git implementation would be in C,
and a Mercurial one would be in Python
(or perhaps C++ with SWIG bindings).

Mercurial Integration

Mercurial has very good support for HTTP-based
stateless pushing and pulling of remote repositories.
-- reducing the number of round trips between client and server .
This is a good match for Google's infrastructure,
so no modifications will be required on the client side.

Git Integration

Git includes support for HTTP pulls (and WebDAV pushes),
but the implementation assumes that the server knows nothing about Git.
It is designed such that you can have an
Apache simply serve the Git repository as static content.
This method requires numerous synchronous round trip requests,
and is unsuitable for use in Google Code (1).

Git also has a custom stateful protocol
that supports much faster exchanges of information,
but this is not a good match for Google infrastructure.
Specifically, it is very desirable to use a stateless HTTP protocol
since there is already significant infrastructure in place
to make such transactions reliable and performant.
Note:
There has been some discussion about improving HTTP support
since this analysis was done.


selected Comments

[10.15:
. some of these comments were collected with the idea in mind of
"(what would I want to work with on my desktop),
and "(wouldn't I rather just use git locally,
and then convert to svn to sync with google`code?) . ]

. git is the prefered DVCS of Google's Android developers
so don't think that Google is somehow "favoring" one over another .
. The point here is, for public deployment and use, what is a better bet.

The question that matters is:
potential new customers * lifetime customer value > implementation cost

Sourceforge supports cvs/subversion/bazaar/git/mercurial.
yet for some reason we have sourceforge, github, and bitbucket

Hg does not have a 'staging area' concept analogous to Git's index.
To start tracking a new file, you need to hg add it
. If you modify a tracked file, and aren't explicit about omitting it,
it will be part of the next commit .

It's an easy transition from SVN in this way,
but hg is generally a little more flexible when you want it to be.
For functionality like git add --interactive,
see the Darcs-inspired Record extension for hg,
which is included in the official distribution,
but needs to be enabled in config.
I do wish you could split hunks with it like git's add -i .
Of course the graphical tools allow selecting changes to commit.

HGs static-http is as slow as git's cloning from http
because git has a dump webserver on the other end.
HG uses hg over a cgi to pipe their custom transport.
BUT... have you ever used hg for a longer timer?
We had e.g the problem, that
older mercurials can't clone a new repo (over http)
as the internal serilization changes (addition of branches).
So old mercurials just exit with a backtrace,
but this incompatibility was not documented nor fixed.

HG has changed their on-disk layout on every release
which requires a local reclone (with --pull) of
all local repositories to update.

The biggest problem with mercurial is their unuseable branch support
and their unwillingness to fix long standing design errors.

- One repo per branch brings a quite high administration overhead
not to mention the ressource wastage.
- Branch lookup in mercurial is pain slow per design,
they tried to cure it with a branchcache,
but a cache is not the same as O(1) lookup as in git.

- Tag handling is horrible.
Mercurial save their tags in the file .hgtags in the repository root.
To locate a tag mercurial checks out all heads of all branches,
to get their .hgtags file and merges them internally to just print the tags.
Another problem is that it's impossible to
cleanly remove a wrong tag from a repository.
...
That's because in mercurial history is sacred.
I am not sure whether it's objectively a wart though.

- There are not only branches,
but every branch can have multiple heads (unnamed branches)
which is a pain to mess with.

- Automatic creation of heads seems like a quite safe feature
in the beginning.
But new users tend to lose changes in unnamed heads
and for advanced users
the destinction of heads and branches is just annoying.
...
. every branch can have multiple heads (unnamed branches)
that's annoying, I agree with that,
however,
most people do not use branches or multiple heads for long lived branches,
so in reality this issue does not really come up.

Whenever people start talking about how
mercurial is easier to use on Windows than git,
auto-cr-lf conversion comes to my mind.
(last time I checked, maybe half a year ago)
compared to git's, auto-cr-lf conversion was almost non-existant
(seriously, checking for nul character
is enough to tell if file is binary or not?!),
and buggy, as those conversions didn't always fire
and I'd be looking at a diff as if my entire file has changed
(and things like that).
In git I didn't experience such problems,
and git has the excellent .gitattributes file,
where I can specify which files are what, exactly.


- Branch lookup in mercurial is pain
slow per design,
they tried to cure it with a branch cache,
but a cache is not the same as O(1) lookup as in git.

. the mercurial way of doing things is one branch per repo
plus using bookmarks/hgtasks/localbranches extensions for local branches.
. if working on long lived branches
and you can not or do not want to clone on the remote server,
just share your long lived branches with others directly
using hg serve.

(Also, in many situations even "hg serve" is overkill,
you can just share your work with others using mq (a patch queue).
Worth noting that patches can be checked into the repository)



6.18: proj.addn/mac/svn clients:

. where are svn gui app's for mac?
http://subversion.tigris.org/links.html#clients. here is some mac code:
http://code.google.com/p/svnx/source/checkout
subversion operations from the Finder.
The goal of the SCPlugin project
is to integrate Subversion into the Mac OS X Finder.
The inspiration for this project came from the TortoiseSVN project.

Current Features:
* Support for Subversion.
* Access to commonly used source control operations via contextual menu
* Dynamic icon badging for files under version control.
Shows the status of your files visually.


Installation:

If you want to uninstall it, remove these:
/Library/Contextual Menu Items/SCFinderPlugin.plugin
/Library/Receipts/SCPlugin.pkg
and log out or restart.

We're still having a bit of trouble getting the plug-in initially configured.
If you notice that the badging isn't working,
open any Finder window in icon mode,
and execute the "Subversion -> Refresh Icons" command.
You should only have to do this once per log in (or Finder restart).

my details:

my mac's svn version is 1.4.4
. I'm betting that I have not used mac's svn for anything except
to bring src down from other's svn's .

7.8:
. to use this, choose {file,folder}`context.menu/more/subversion/...





bk"svn
http://svnbook.red-bean.com/
Subversion's web site--. I got my plugin from scplugin.tigris.org

Subversion, CVS, and many other version control systems
use a copy-modify-merge model as an alternative to locking.
In this model, each user's client contacts the project repository
and creates a personal working copy
— a local reflection of the repository's files and directories.
Users then work simultaneously and independently,
modifying their private copies.
Finally, the private copies are merged together into a new, final version.
The version control system often assists with the merging,
but ultimately, a human being is responsible for making it happen correctly.
eg,
When Harry attempts to save his changes to a file updated by sally,
the repository informs him that his file A is out of date.
So Harry asks his client to merge any new changes from the repository
into his working copy of file A.
Chances are that Sally's changes don't overlap with his own;
once he has both sets of changes integrated,
he saves his working copy back to the repository.
[1418:
. from here I'm getting the impression that the reason svn is sufficient
is that it's meant for collaborations only by teams that are
lead by mgt, or concensus, so that concurrent efforts are well orchestrated .
. the only time there would be a need for a merge
is when one person is doing the coding
while the other persons are doing only review and corrections,
or additions to documentation sections;
ie, being meant as additions, the merging should be a snap .
]