2017-03-01

git uses SHA-1 deprecated by NIST in 2011

2.25: news.cyb/dev/sec/git uses SHA-1 deprecated by NIST in 2011:
3.1: summary:
. git allows teams to concurrently work on software;
it uses SHA-1 hashing of versions,
to tell when files of a version have been modified
to help it merge versions of the software.
. SHA-1 has been cracked so you can modify a file
and yet have it result in the same SHA-1 hash,
thereby hiding the fact that it has been modified.
. the leader of git would like to replace SHA-1
with a more secure hash using more bits,
but would like to use a truncated version of that hash
so that git would only have to store and compare
the same number of bits as SHA-1.
. git is assumed by the leader of git, Linus Torvalds,
to be less vulnerable to the SHA-1 attack
because it includes not just the hash of a file
but also its size;
he gives no proof other than appeal to intuition:
can you imagine a way to add working malware to a file
while also keeping both the hash and the size the same?

SHA-1 cracked.
details:
broken SHA-1 in practice.
This industry cryptographic hash function standard
is used for digital signatures and file integrity verification,
and protects a wide spectrum of digital assets,
including credit card transactions, electronic documents,
open-source software repositories and software updates.
It is now practically possible to craft two colliding PDF files
and obtain a SHA-1 digital signature on the first PDF file
which can also be abused as a valid signature on the second PDF file.
git:
GIT strongly relies on SHA-1 for the identification
and integrity checking of all file objects and commits.
It is essentially possible to create two GIT repositories
with the same head commit hash and different contents,
say a benign source code and a backdoored one.
3.1:
git's leader Linus Torvalds 2017.2.23:

> Since we now have collisions in valid PDF files, collisions in valid git
> commit and tree objects are probably able to be constructed.

I haven't seen the attack yet,
but git doesn't actually just hash the data,
it does prepend a type/length field to it.
That usually tends to make collision attacks much harder,
because you either have to make the resulting size the same too,
or you have to be able to also edit
the size field in the header.

pdf's don't have that issue,
they have a fixed header and you can fairly arbitrarily
add silent data to the middle that just doesn't get shown.

So pdf's make for a much better attack vector,
exactly because they are a fairly opaque data format.
Git has opaque data in some places
(we hide things in commit objects intentionally, for example,
but by definition that opaque data is fairly secondary.

Put another way: I doubt the sky is falling for git
as a source control management tool.
Do we want to migrate to another hash? Yes.
Is it "game over" for SHA1 like people want to say?
Probably not.

I haven't seen the attack details, but I bet

(a) the fact that we have a separate size encoding
makes it much harder to do on git objects
in the first place

(b) we can probably easily add some extra sanity checks
to the opaque data we do have,
to make it much harder to do the hiding of random data
that these attacks pretty much always depend on.

I don't think you'd necessarily want to
change the size of the hash.
You can use a different hash
and just use the same 160 bits from it.

Update from Torvalds 2017.2.25:

So in git, the hash is used for de-duplication and error detection,
and the "cryptographic" nature is mainly because
a cryptographic hash is really good at those things.

I say "mainly", because yes, in git
we also end up using the SHA1 when we use
"real" cryptography for signing the resulting trees,
so the hash does end up being part of a
certain chain of trust.
So we do take advantage of some of the actual
security features of a good cryptographic hash,
and so breaking SHA1 does have real downsides for us.

Which gets us to ...

(2) Why is this particular attack
fairly easy to mitigate against
at least within the context of using SHA1 in git?

There's two parts to this one:
one is simply that the attack is not a pre-image attack,
but an identical-prefix collision attach.
That, in turn, has two big effects on mitigation:

(a) the attacker can't just generate any random collision,
but needs to be able to control and generate
both the "good" (not really) and the "bad" object.

(b) you can actually detect the signs of the attack
in both sides of the collision.