Back

Whatever happened to SHA-256 support in Git?

398 points16 hourslwn.net
corbet15 hours ago

It's nice to see LWN on HN for the second time in one day, but please remember: it is only LWN subscribers that make this kind of writing possible. If you are enjoying it, please consider becoming a subscriber yourself — or, even better, getting your employer to subscribe.

O__________O14 hours ago

For easy of reference, here is the link to subscribe, which includes a description of the benefits:

https://lwn.net/subscribe/Info

And the Wikipedia page for LWN, if you’re not familiar with it:

https://en.m.wikipedia.org/wiki/LWN.net

williadc10 hours ago

Googlers can subscribe through work by visiting go/lwn and following the instructions.

jra_samba14 hours ago

Just want to second this ! Please subscribe to lwn. I learn new things from lwn every week. It's really worth the money.

bloopernova8 hours ago

subbed, thanks for the reminder!

cockhole_desu13 hours ago
avar10 hours ago

I'm the person and Git developer (Ævar) quoted in the article. I didn't expect this to end up on LWN. I'm happy to answer any questions here that people might have.

I don't think the LWN article can be said to take anything out of context. But I think it's worth empathizing that this is a thread on the Git ML in response to a user who's asking if Git/SHA-256 is something "that users should start changing over to[?]".

I stand by the comments that I think the current state of Git is that we shouldn't be recommending to users that they use SHA-256 repositories without explaining some major caveats, mainly to do with third party software support, particularly the lack of support from the big online "forges".

But I don't think there's any disagreement in the Git development community (and certainly not from me) that Git should be moving towards migrating away from SHA-1.

lalaland112510 hours ago

Have you considered moving over to a combined SHA-1, SHA-256 model where both hashes are calculated, with SHA-1 shown to the user and SHA-256 only used in the background to prevent collisions?

There is a compute cost for that, but it should be minimal relative to the security benefits?

avar7 hours ago

Someone probably brought it up at some point, I can't remember. But I'm not aware of any known scenario where the SHA1DC library Git uses doesn't give you the benefits of that and more.

"And more" because to detect a collision with a background SHA-256 you'll need both objects, whereas SHA1DC detects attempts to spoof SHA1 in a way that leads to collisions. So it won't pass along an object that collides with another one, even though it only has 1/2 objects.

That distinction is something that's generally considered important, e.g. there's been past exploits in Git where you could trick a client into doing something bad by e.g. a crafted .gitmodules file.

The fix has not only been to patch clients, but also to patch "git fsck" to detect and reject such bad contents, so that e.g. the forges can't be used to relay a repository exploit to users running older versions.

A viable hash collision exploit in the wild might likewise want to make use of such an attack scenarios, so having servers capable of detecting collisions without having both sides is preferable to doing so by re-hashing with SHA-256.

bostik3 hours ago

> SHA1DC detects attempts to spoof SHA1 in a way that leads to collisions

This is the kind of detail I would have loved to see quoted directly in the article. Sure enough, it's prominently displayed on the project's README::about section, but your very succinct explanation here made it clear in immediate context.

The idea of counter-cryptanalysis is eye-opening.

michaelt10 hours ago

Thanks for your work on Git!

> I'm happy to answer any questions here that people might have.

Is there any way to achieve a gradual, staged rollout of SHA256?

What's the impact of converting a repo to SHA256 - will old commit IDs become invalid? Would signed commits' signatures be invalidated?

avar10 hours ago

The answer is somewhat hand-waivy, because this code doesn't exist as anything except out-of-tree WIP code (and even in that case, incomplete). But yes, the plan is definitely to support a gradual, hopefully mostly seamless rollout.

The design document for that is shipped as part of git.git, and available online. Here's the relevant part: https://git-scm.com/docs/hash-function-transition/#_translat...

Basically the idea is that you'd have a say a SHA-256 local repository, and talk to a SHA-1 upstream server. Each time you'd "pull" or "push" we'd "rehash" the content (which we do anyway, even when using just one hash).

The interop-specific magic (covered in that documentation) is that we'd use a translation table, so you could e.g. "git show" on a SHA-1 object ID, and we'd be able to serve up the locally packed SHA-256 content as a result.

But the hard parts of this still need to be worked out, and problems shaken out. E.g. for hosting providers what you get when you "git clone" is an already-hashed *.pack file that's mostly served up as-is from disk. For simultaneously serving clients of both hash formats you'd essentially need to double your storage space.

There's also been past in-person developer meet-up discussion (the last one being before Covid, the next one in fall this year) about the gritty details of how such a translation table will function exactly.

E.g. if linux.git switches they'd probably want a "flag day" where they'd transition 100% to SHA-256, but many clients would still probably want the SHA-1<->SHA-256 translation table kept around for older commits, to e.g. look up hash references from something like the mailing list archive, or old comments in ticketing systems.

Currently the answer to how that'll work exactly is that we'll see when someone submits completed patches for that sort of functionality, and doubtless issues & edge cases will emerge that we didn't or couldn't expect until the rubber hits the road.

dwheeler7 hours ago

Has anyone considered doing "add16" on the first character of the sha-256 hash, e.g., so the SHA-256 hash 1d06... becomes hd06... ? Then you could see, from the first character, if it is SHA-1 or SHA-256. Having a clear distinction on the first character would make it clear which hash is being used (without needing lots of chars).

avar22 minutes ago

I think this was discussed at some point early on, there's many outstanding issues with Git's SHA-1<->SHA-256 interop and potential unknowns, but figuring out which format a given abbreviated hash is in isn't really one of them.

For any internal part of Git the hash type is already known, e.g for the wire protocol dialog that "fetch" runs. Other commentators have pointed out that you can use the hash length to disambiguate the two, but the way it's implemented internally we could tell them apart even if we hypothetically had two hashes of the same length.

But that leaves abbreviated hashes, e.g. if you do "git show deadbeef" is that a SHA-1 or SHA-256 hash? We don't know.

It's forseen that in those cases we'll look up both, and in the case of ambiguity do the same thing as e.g. "git show dead" does now (try it on a non-trivially sized repo). Then just as you can do e.g. "git show dead^{commit}" now to disambiguate, you'd use the ^{sha1} or ^{sha256} peel syntax (still unimplemented).

If we changed the hash format from [0-9a-f]{4,40} to something that didn't match the first part of that regex there's even more downstream systems that would need adjusting. A lot of programs that work with Git's output use some variant of that regex. For those that don't limit themselves to "40" things usually Just Work.

So that's basically the reason, it's also less future-proof, as you'll run out of magical prefix characters faster than you'll run out of hash names. Although I'm hoping to be dead way before that small space would be exhausted :)

tux310 hours ago

Has there been any feedback/communication with forges happening, on or off-list?

I'm curious how closely (if at all) they've been following this effort

avar9 hours ago

The forges are keenly aware of this effort, e.g. the person who's by far done most of the work on the SHA-256 transition (brian m. carlson) has I believe mostly or entirely been done so on behalf of GitHub.

I myself do some work on upstream Git on behalf of GitLab, although none of it's been on anything related to the SHA-256 transition.

As to why no big forge has SHA-256 support, I think it's a bit of a chicken & egg problem (and these comments are entirely my own, and not on behalf of anyone).

I think it's safe to say that all of the forges are expecting the transition, e.g. I don't think there's anyone creating CHAR(40) database tables for Git hashes anymore (or if they are, someone is planning to deal with it).

Another is that for a successful transition for anything except entirely new repository networks (which already use SHA-1) you really need the "git" client to play along, see my other comment discussing hash interop plans. Some of that same code then needs to run on the server-side.

That code isn't part of git yet, and it's really needed for any sort of viable migration plan.

I mean, it's not really needed, at some point a lot of people reading this migrated from CVS and/or SVN to Git. But a full export/import with a lot of users is painful. We really want it to suck less for Git, to the point that it should Just Work for most or all users.

And a major one is the human factor. For things to happen in free software development someone needs to submit patches, brian m. carlson has been performing a heroic amount of effort over the year on the transition over the years. As he notes in the linked ML thread he's had life reasons for why he hasn't been able to work on it as actively recently as he did in the past.

rurban4 hours ago

brian moved from Texas to Canada, but mostly his employee, GitHub, is not priotizing the remaining sha256 patches. someone needs to finish up the transition patches, and forges need to double their disc space.

+1
cerved2 hours ago
Zamicol15 hours ago

This is one of the reasons why Go has its own versioning system. From a project's `go.sum`:

example.com/example v0.0.0-20171218180944-5ea4d0ddac55 h1:jbGlDKdzAZ92NzK65hUP98ri0/r50vVVvmZsFP/nIqo=

Where "h1" is an upgradeable hash (h1 is SHA-256). If there's ever a problem with h1, the hash can be simply upgraded.

Git's documentation describes how to sign a git commit:

$ git commit -a -S -m 'signed commit'

When signing a git commit using the built in gpg function the project is not rehashed with a secure hash function, like SHA-256 or SHA3-256. Instead gpg signs the SHA-1 commit digest directly. It's not signing the result of a secure hash algorithm.

SHA-1 has been considered weak for a long time (about 17 years). Bruce Schneier warned in February 2005 that SHA-1 needed to be replaced. Git development didn't start until April 2005. Before git started development, SHA-1 was identified as needing deprecation.

brasic4 hours ago

> Instead gpg signs the SHA-1 commit digest directly

A minor correction: when signing a commit, gpg does not sign the SHA-1 digest of that commit. This is impossible since the signature becomes part of the commit header which is one of the inputs to the hash function that produces the oid.

Instead, GPG signs the serialized data (parents,headers,tree,message) which would otherwise be the input to SHA-1. Then the sig is inserted into the buffer at the end of the header and the string is digested to produce an oid.

Source: https://github.com/git/git/blob/39c15e485575089eb77c769f6da0...

lewisl902910 hours ago

Also check out multihash from the IPFS folks: https://github.com/multiformats/multihash

It's a more robust, well-specified, interoperable version of this concept.

Though it's probably overkill if you control both the consumer and producer side (i.e. don't need the interoperability) and are just looking to make hash upgrades smoother, in that case a simple version prefix like Go's approach described above has lower overhead.

Groxx13 hours ago

There's no need to explicitly version your first version of this though. Those first-version values are easy to identify: they don't contain versioning information :)

E.g. say you have `5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8`. What version is that?

Well. It's exactly as long as a SHA1 hash. It doesn't start with "sha256:" or "md5:" or "h1:" or "rot13:". So it's SHA1. Easy and totally unambiguous.

Versioning can almost always begin with version 2.

arccy34 minutes ago

versioning also allows you to change the inputs on the future to include/exclude more info

inconsistencies in how data is presented (optional version number) is a pain to deal with in code

morelisp13 hours ago

Me, sowing: "Each record begins with a 4 octet BE value indicating the record length."

Me, reaping: "Each record begins with a single byte indicating the record format version. In version 0, this is followed by a 3 octet BE value indicating the record length."

kazinator11 hours ago

That's not applicable to Grox's example. The initial version uses only hexadecimal digits for the SHA256.

If you had: "each record begins with an 8 character record length, in hexadecimal, giving 32 bits", you have no problems. The new version has a 'V' character in byte 0, which is rejected as invalid by the old implementation.

+1
morelisp11 hours ago
Groxx13 hours ago

if you're storing the raw binary rather than hex or base64: yeah. there are often no illegal values, so there's no way to safely extend it, unless you can differentiate on length.

for those, you have to leave versioning room up-front. even 1 bit is enough, since a `1` can imply "following data describes the version", if a bit wastefully in the long run.

nh23423fefe12 hours ago

sow then reap

+1
morelisp12 hours ago
kelnos13 hours ago

I think the implications for Go are a bit different, though. It's a very simple matter to change the hash algorithm used for go.mod. Even if there was no hash version prefix, it's trivial to add one after the fact, though older tools would probably give a confusing error message without foreknowledge of the concept of an unrecognized hash algorithm. And adding a new hash algorithm is just a matter of writing a relatively small amount of code, and then probably waiting a few Go releases before making it the default and assuming most people will have it.

Git's entire foundation relies on SHA1 hashes. Each commit is its own hash, and contains a list of the hashes of all files that are a part of it. Branches have hashes, tags have hashes. Everything has a hash. A repository that uses a different hash algorithm is a completely different repository, even if the contents and commits are otherwise identical. You can't even store your code on someone else's server (well, aside from manually copying the repository data over, though that won't be too useful) unless that server has upgraded their git version.

samatman11 hours ago

The counterpoint: Fossil did it, it was easy, no big deal.

Well, Fossil's database is much better designed, you reply.

That it is!

jupp0r9 hours ago

I think the argument that gp is trying to make is that it's really hard for git to implement this in a backwards compatible way. You may be right (I don't know anything about Fossil, will take a look!) that Fossil allowed for this by making good design decisions in the past. This is not something that git maintainers can do right now without a time machine though. Old versions are in use out there and will need to keep working if the goal is to make the transition easier for users.

er4hn14 hours ago

Just to nit on your portion of signing: wouldn't you need to rehash all prior commits as well so that they used the better hash function? Otherwise someone could find a collision for a prior commit hashed with sha-1, slip that in, and the final commit being hashed with sha256 wouldn't matter.

This then makes the signing code use its own form of hashing that is different from the rest of git's commmit hashing, and seems like a novel way to introduce tooling issues / bugs / etc.

chimeracoder14 hours ago

> and the final commit being hashed with sha256 wouldn't matter.

Git stores content, not diffs. So the signature verifies all content stored in that commit. It does t verify anything that came before it, unless those are specifically signed as well.

ElectricalUnion11 hours ago

> Git stores content, not diffs.

But the "contents" is just pointers to tree roots with a trusted hash. If the hash is no longer secure, you can't garantee that any such trees are your content, or safe.

Arnavion6 hours ago

The assumption in this context is that all those have been rehashed to SHA-256 too. The point was about whether that rehashing needed to be extended to previous commits.

kazinator11 hours ago

Whenever the word "upgrade" rears its head, beware.

The intent behind it is obsolescence and phasing out, resulting in an endless make-work treadmill for the users.

If there is ever a "problem with h1", and you neglect to upgrade your data right there and then, five to ten years, it will be unreadable.

arccy30 minutes ago

or you know, find the version that handles transition and run it to upgrade

stepping through required versions a common operation

howinteresting11 hours ago

What in the world are you talking about? Generally, systems with upgradeable hashes will remain backwards-compatible with old ones forever.

bawolff13 hours ago

Versioning hashes is definitely not a new idea with go - just look at how unix stores password hashes.

barsonme11 hours ago

The author of the comment did not imply this.

harryvederci13 hours ago

Relevant quote from the Fossil website[0]:

"Fossil started out using 160-bit SHA-1 hashes to identify check-ins, just as in Git. That changed in early 2017 when news of the SHAttered attack broke, demonstrating that SHA-1 collisions were now practical to create. Two weeks later, the creator of Fossil delivered a new release allowing a clean migration to 256-bit SHA-3 with full backwards compatibility to old SHA-1 based repositories. [...] Meanwhile, the Git community took until August 2018 to publish their first plan for solving the same problem by moving to SHA-256, a variant of the older SHA-2 algorithm. As of this writing in February 2020, that plan hasn't been implemented, as far as this author is aware, but there is now a competing SHA-256 based plan which requires complete repository conversion from SHA-1 to SHA-256, breaking all public hashes in the repo."

[0]: https://fossil-scm.org/home/doc/trunk/www/fossil-v-git.wiki#...

ludwigvan12 hours ago

Migrations are easier when you are the only one using your software. :p

Joking aside, expected from a developer whose work is the recommended storage format for Library of Congress.

YesThatTom213 hours ago

GitHub won’t feel any heat about this until Microsoft salespeople start demanding it.

I’ve added to my todo list a reminder to raise this issue with mine. In fact, I’m going to give them a deadline for when we will start evaluating competitors that do support SHA256.

I suspect that most people on HN do not interact with their MS account team. That relationship is probably managed by your CIO or IT department. They probably have monthly or quarterly “business review” meetings. You should get this issue on the agenda of that meeting.

lewisl902911 hours ago

Just the other day, I was actually forced to downgrade the file hash used in the product I'm working on to sha1 in order to interact with GitHub's APIs efficiently (to avoid having to download the entire file just to recompute a sha256 for matching).

Luckily I've versioned the internal hash so the upgrade path back to sha256 should be as smooth as the downgrade was. I'm still bitter about it though.

codazoda12 hours ago

Is there something special about GitHub on this? This seems like a Git issue and not a GitHub issue to me; unless I'm missing something.

lucb1e12 hours ago

They don't accept pushes of repositories in that format.

The article says "none of the Git hosting providers appear to be supporting SHA-256", and while GH is not mentioned by name (and I applaud them for indeed not strengthening this "git == github-the-brand" trap), I can't imagine GH was left out of scope when checking the major hosting providers.

evil-olive11 hours ago

as the article says, you can create a local git repository with SHA-256 hashes today, and it should work fine...but the moment you try to push your repo up to Github, you'll hit a brick wall.

Gitlab also appears to be lacking support [0], and the same with Gitea [1].

so it's a grey area where Git itself supports SHA-256-based repos, but without the major Git hosting services also supporting them, the support in core Git is somewhat useless.

0: https://gitlab.com/groups/gitlab-org/-/epics/794

1: https://github.com/go-gitea/gitea/issues/13794

sdfhdhjdw312 hours ago

Thank you.

bradhe12 hours ago
girvo11 hours ago

You sound like someone who didn’t read the article.

Git basically supports it already. GitHub et al do not, and that is what is holding it back.

vulcan0112 hours ago

Git is not GitHub and GitHub is not Git. This article is about Git, the software, not GitHub, the Git hosting service.

sdfhdhjdw312 hours ago

Did you skip the bit that discusses hosting providers?

ElectricalUnion11 hours ago

Git supports it, GitHub doesn't. People use forges, therefore they are mislead to believe Git doesn't support it.

chrisseaton11 hours ago

Forges?

beermonster2 hours ago

Forge is the general term for Git service providers such as gitlab, GitHub, sourcehut, bitbucket et al

+1
lucb1e11 hours ago
red_admiral1 hour ago

Is SHA-1 being used in a security-critical way in git, though? I tend to agree with the points made towards the end of the article.

If someone malicious can make commits to your repo, you have a much worse problem than a possible hash collision.

Most languages these days have a built-in hashtable type, and they use some kind of non-security-critical hash function, albeit usually with a random seed, that's a lot weaker than SHA-1 - often you only need something like 32-64 bits of output in the first place as no-one allocates more than 2^64 buckets as far as I know. From a security perspective that's even worse than MD5!

I'd argue (and so does the article, in part) that the use of SHA-1 in git is closer to the use of a hash in a hashtable than to the use of a cryptographic checksum - using SHA-1 for subresource integrity on the web is an obvious security risk (luckily, it's not allowed there) but I can't see an obvious attack on git's use that doesn't also assume you can do much worse things.

haileys47 minutes ago

> If someone malicious can make commits to your repo, you have a much worse problem than a possible hash collision.

This is exactly how GitHub works though: all repositories in a fork network share a single underlying repository, only the ref namespaces are separated.

simias15 hours ago

It's frankly amateurish for the git dev to delay this. The longer this lasts, the more painful it'll be whenever the switch will finally take place.

Linus shouldn't have used SHA-1 in the first place, it was already being deprecated by the time git got its original release. Then every time a new milestone is reached to break SHA-1 we see the same rationalization about how it's not a big deal and it's not a direct threat to git and blablabla.

It'll keep not mattering until it matters and the longer their wait the more churn it'll create. Let's rip the bandaid that's been hanging there for over 15 years now.

runeks15 hours ago

> Linus shouldn't have used SHA-1 in the first place, it was already being deprecated by the time git got its original release.

Using SHA-1 to begin with was fine. However, commit hashes should have been prepended with a version byte to make it easier to transition to the next hash algorithm.

This would mean an old Git client could report an error to the user of the nature “please upgrade your software to support cloning from this Git server” instead of failing with an error that’s inseparable from “the Git server is broken” when trying to clone a Git repo using SHA-256.

jackweirdy14 hours ago

There’s already a version byte: if it’s [0-9a-f], that’s version 1 ;)

LeifCarrotson13 hours ago

That's a 4-bit nibble, the version byte is 0x00 to 0xFF.

Arnavion6 hours ago

They're talking about the hex representation. It doesn't make sense to think they were referring to the nibble as the version, given that all 16 values of that nibble are already in use.

simias15 hours ago

By the time Git was first released the first attacks on SHA-1 had already been published, but I agree with your general point about allowing for backward compatible updates.

layer812 hours ago

The problem is not a missing version byte. SHA-256 is trivially distinguishable from SHA-1 by hash length. The problem is that that the length of a SHA-1 hash (20 bytes) is (or was) hardcoded in too many places.

wahern12 hours ago

Linus' original excuse for using SHA-1 was that Git hash trees and hash identifiers were never meant to be cryptographically secure. GnuPG signing support, the popular belief that Git trees had a strong security property, etc, came afterward, along with increasingly awkward excuse-making.

So strictly speaking Linus and subsequent maintainers weren't being amateurish in the beginning. (You didn't say that explicitly, but it would be a fair criticism given what was known about SHA-1 at the time, including known by Linus--he knew and made a choice.) Rather, in the beginning it was naivety in believing that people wouldn't begin to depend on Git's apparent security properties.

avar8 hours ago

I don't know if/how this played into it, but if you check out the original version of Git whose commit date is April 7th, 2005 it uses OpenSSL for SHA-1.

The first OpenSSL release that has general SHA-256 support seems to have been 0.9.8, released on July 5th, 2005, the code first appeared in OpenSSL's source tree in May of 2004.

Perhaps Linus has commented on it. I don't know, but I wouldn't be surprised if the actual reason is that Git was thrown together as a weekend project, that he vaguely knew SHA-256 was preferable, but his distro's OpenSSL didn't have it yet.

So the initial version used SHA-1 instead, and the rest is history...

1. https://marc.info/?l=openssl-users&m=135355590501495

jopsen10 hours ago

Yeah, on hindsight maybe he should have made his own 160bit CRC variant :)

Honestly, I think it's fair to say that hashes isn't meant to be a security feature.

But signed tags/commits/etc. probably need a better hash.

hinkley12 hours ago

I worked on code signing for civilian aviation years ago and there were people trying to pressure me into supporting MD5 and SHA-1 signatures. I told the first group to jump off a cliff, and the second group got a firm no. The first papers on theoretical SHA-1 attacks had already been published, we were still a couple years out from active use, and people were already beginning to talk about starting to organize the SHA-3 process.

Once a system expects to handle SHA-1, then you have to deal with old assets that have deprecated signatures, and that's a fight I 1) didn't want to have and 2) was fairly sure I wouldn't be around to win.

Git was still brand new, largely unproven at that point, and I don't understand why he picked SHA-1.

armada65115 hours ago

> Adding my own 0.02, what some of us are facing is resistance to adopting git in our or client organizations because of the presence of SHA-1. There are organizations where SHA-1 is blanket banned across the board - regardless of its use. [...] Getting around this blanket ban is a serious amount of work and I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

Seems like this company could just use the current SHA-256 support then? Especially if it's the type of company that does all its development in-house and there's no need for SHA-1 interoperability.

skissane15 hours ago

> > There are organizations where SHA-1 is blanket banned across the board - regardless of its use.

Reminds me of the time a security audit (which literally just involved running some scanning tool and dumping the results on us) complained that some code I had written was using MD5 - but in a use case in which we weren’t relying on it for any security purposes. I ended up replacing MD5 with CRC-32 - which is even weaker than MD5, but made the security scanning tool mark the issue as remediated. It was easier than trying to argue that it was a false positive.

bawolff14 hours ago

Honestly, this isn't a bad idea.

The big problem with using sha1/md5 in non-secure contexts is:

*Someone later might think its secure and rely on that when extending the system.

*it can make it difficult for security people to audit code later as you have to figure out if each usage is security critical

Using a non crypto hash makes both those concerns go away since everyone knows crc32 is insecure. The alternative of using sha256 also works (performance wise it is close enough, so why not just use the secure one and be done with it.)

gorkish15 hours ago

> There are organizations where SHA-1 is blanket banned across the board - regardless of its use.

> I have very recently seen customers move to older much less functional (or useful) VCS platforms just because of SHA-1.

A company this dysfunctional has problems far beyond their choice of revision control system.

bostik14 hours ago

I can name a couple of industries where compliance (and their enforcement arm, security[0]) teams require N+1 different monitoring and enforcement agents on all systems because Compliance[TM]. Due to these agents the systems' IDLE load is approaching 1.00 - on a good day. On a less good you need four cores to have one of them available for workload processing.

0: I use the word "security" only because the teams themselves are named like that. You can probably infer my opinion from the tone.

avar8 hours ago

In a past life I used to work for an anti-virus company who in addition to the Windows product sold the very portable virus-scanning engine for pretty much any other OS you could name. I worked in the *nix department, where we ported it to everything from Linux to the BSDs, HP/UX, Solaris & beyond, as well as more obscure setups like z/OS.

So, we sold people software that would run on some fridge-sized Sun machine running Solaris, to ensure that their Solaris machine wasn't about to get infected with the latest Windows virus.

The occasional support calls with technically minded *nix admins were amusing. We knew that what we were selling them was completely useless and made to secret of that fact, they likewise knew that the software they were running was useless to them. The one thing they cared about was that it didn't contribute to the load, and we did our best.

But some PHB somewhere in their organizations had decreed that all computers everywhere must have an anti-virus scanner, and if you're sufficiently motivated to buy something eventually someone will sell it to you, even while telling you that you don't need it :)

the_biot13 hours ago

I definitely see your point -- who hasn't seen or heard of companies ruined by officious rulemakers with no clue, rules to make something more secure that do the exact opposite etc. I've seen my share.

But blanket-banning an obsolete and insecure hash algorithm isn't a bad thing, it's entirely reasonable. In this case, as the article makes clear, it's git that's at fault.

cratermoon15 hours ago

Except said company likely uses one of the Git forge providers, either in-house or as a SaaS, as the (oxymoronic for git) central repo. Until they support SHA-256, or the company goes with a its own git repo solution that is set up for it, companies won't make the move.

wepple14 hours ago

Not just git forge but probably the myriad other ancillary tools that assume SHA1

ivoras15 hours ago

Is there an explanation of what would go wrong with the naive approach? E.g.:

- Change the binary file format in repos to support arbitrary hash algorithms, in a way which unambigously makes old software fail.

- Increment the Git major version number to 3.0

- Make the new version support both the old version repos and the new ones. Make it a per-repo config item that allows/disallows old/new hash formats. In theory, there's nothing wrong with having objects hashed with mixed algorithms as long as the software knows how to deal with that.

- The old format will probably have to be supported forever because of Linux.

Most user-facing utilities don't care what the hash algo actually is, they just use the hash as an opaque string.

runeks14 hours ago

Releasing new software is the simple part. The problem is that versioning is lacking in the old software, and therefore it doesn’t know how to talk to the new software. So for the old software there’s no difference between “invalid data” and “I’m too old, please upgrade me”.

dingleberry42013 hours ago

> So for the old software there’s no difference between “invalid data” and “I’m too old, please upgrade me”.

And why is this an issue? Release the new version that can read new repo formats, but doesn't write them yet. Wait a year. Release new version that can write new repo formats and encourage users to upgrade.

Anyone who hasn't upgraded in the past year probably doesn't care about security and should be left behind. Besides, once they google the error message they'll figure it out soon enough. It's not like git is known for its great UX anyway.

kzrdude13 hours ago

All of what you wrote, except the version bump, is already implemented. It's the nicer features that are missing, the nice migration path.

kelnos13 hours ago

> In theory, there's nothing wrong with having objects hashed with mixed algorithms as long as the software knows how to deal with that.

That's an interesting idea, actually. I'm not sure they plan to support that, though? That would make things a lot easier on existing repositories; without support for mixed hashes, repos would have to have their history entirely rewritten, which would invalidate things like signed commits/tags.

rurban4 hours ago

No, study the transition document, please.

there is one hash version, plus a translation table for the other format. no history rewrite.

new repos will use the new hash. old repos will eventually fully convert to the new hash, then all old hash links after the transition period will become obsolete.

yjftsjthsd-h15 hours ago

> In his view, the only "defensible" reason to use SHA-1 at this point is interoperability with the Git forge providers.

Okay, but that's a pretty big reason! A git repo that can't be pushed to github/lab is... not always useless, but certainly extremely impaired.

kragen15 hours ago

In case anyone has forgotten, the process for pushing it to your own server is three shell commands. You run, on the server:

    git init --bare public_html/mything.git
    cd public_html/mything.git/hooks/
    mv post-update.sample post-update  # runs git update-server-info on push
(This assumes that your public_html directory exists and is mapped into webspace, as with the usual configuration of Apache, NCSA httpd, and CERN httpd. If you don't have an account on such a thing you can get such PHP shared hosting accounts with shell access anywhere in the world for a dollar or two a month.)

And then on your dev machine, it's precisely the same as for pushing to Gitlab or whatever, except that you use your own username instead of git@:

    git remote add someremotename user@myserver:public_html/mything.git
    git push -u someremotename master # assuming you want it to be your upstream
Then anyone can clone from your repo with a command like this:

    git clone https://myserver/~user/mything.git
They can also add the URL as a remote for pulls.

If you want them to be able to push, you'll need to give them an account on the same server and either set umasks and group ownerships and permissions appropriately or set a POSIX ACL. Alternatively they can do the same thing on their server and you can pull from it. There are reportedly permission bugs in recent versions of Git (the last five years) that prevent this from being safe with people you don't trust (https://www.spinics.net/lists/git/msg298544.html).

Of course source control is only part of the overall development project workflow, so for many purposes adding SHA-256 support to Gogs or Gitlab or Gitea or sr.ht is probably pretty important: you want a Wiki and CI integration and bug tracking and merge requests. But the git repo still works fine with a bog-standard ssh and HTTP server, though slightly less efficiently. It's easier than setting up a new repo on GitLab etc.

Running a git repack -an && git update-server-info in the repo on the server can help a lot with the efficiency, and for having a browseable tree on the server as well as a clonable repo I put this script at http://canonical.org/~kragen/sw/dev3.git/hooks/post-update:

    #!/bin/sh
    set -e

    echo -n 'updating... '
    git update-server-info
    echo 'done. going to dev3'
    cd /home/kragen/public_html/sw/dev3
    echo -n 'pulling... '
    env -u GIT_DIR git pull
    echo -n 'updating... '
    env -u GIT_DIR git update-server-info
    echo 'done.'
That's very far from being GitLab (contrast http://canonical.org/~kragen/sw/dev3 with any GitHub tree view), and it's potentially dangerously powerful: if you're doing this in a repo where you pull from other people, and the server is configured to run PHP files or server-side includes in your webspace (mine isn't!) or CGI scripts (mine is!), then just dropping a file in the repo can run programs on the server with your account privileges. This is great if that's what you want, and it's a hell of a lot better than updating your PHP site over FTP, but that code has full authority to, for example, rewrite your Git history.

In theory you can do other things from your post-update hook as well, like rebuild a Jekyll site, send a message on IRC or some other message queueing system, or fire off a CI build in a Docker container. (Some of these would run afoul of guardrails common in cheap PHP shared hosting providers and you'd have to upgrade to a US$5/month VPS.)

isomorphic14 hours ago

People also forget about Gitolite, which provides lightweight shared access control around Git+SSH+server-repos. For me it's a much simpler alternative than systems with a heavyweight web UI. Although to be honest I don't know whether Gitolite handles SHA256 hashes (I've never tested it).

https://gitolite.com

https://github.com/sitaramc/gitolite

kragen14 hours ago

I did forget about Gitolite! Thanks for the reminder! Do you have suggestions for what sorts of CI tooling and bug trackers people might want to use with it?

dikei8 hours ago

Most developers don't run their own server, and that's probably for the best.

donatj15 hours ago

Potentially stupid question, would it be reasonable to use SHA-256 truncated to the first 40 digits?

It seems like that could ease much of the migration problems if it's not a problem?

Zamicol14 hours ago

I don't believe the length is a major issue. It's "upgrading" references to a new hashing algorithm that's the issue.

If for some reason length was an issue, a base64 encoded 256 bit string, like a SHA-256 digest, is 43 characters. That too can be truncated to 40 characters, which has 238 bits of security. SHA-256 is not only a better hashing algorithm than SHA-1 but it could also result in higher effective security even when truncated.

stingraycharles15 hours ago

I found this, which says that the SHA algorithm allows for truncation: https://csrc.nist.gov/publications/detail/sp/800-107/rev-1/f...

Dylan1680714 hours ago

Not just allows, it becomes more secure when you truncate.

tatersolid14 hours ago

Truncated SHA-* hashes are more secure against length-extension attacks, but are very much less secure against collision and pre-image attacks (which are more important in most scenarios).

Dylan1680713 hours ago

But also, 256 is overkill for collisions and pre-image.

There's a point where truncating starts to make it weaker, but when you first start chopping off bytes the benefits outweigh the drawbacks.

neon_electro14 hours ago

Care to elaborate? This is not something I would've intuited.

+1
bawolff14 hours ago
jjtheblunt13 hours ago

that makes collisions more likely

pornel13 hours ago

Sigh, no it doesn't in any meaningful way.

160 bit output, without a cryptographic weakness, is good for about 30 trillion commits per second continuously for 1000 years.

For SHA the cryptographic strength isn't primarily from the length of the hash, but from the internal number of rounds is (e.g. 160-bit SHA-1 with fewer rounds has been badly broken way earlier, and 160-bit SHA-1 with more rounds would be safer).

Cryptographic hashes are designed to be safe to truncate and still have all the safety the truncated length can provide. It's basically a requirement for them being cryptographically strong. Even in the SHA-2 family, the SHA-224 and SHA-384 are just truncated versions of larger hashes.

dspillett13 hours ago

It makes random collisions more likely when comparing truncated SHA256 to pure SHA256, but given the collisions and pre-image attacks shown so far is truncated SHA256 still safer than SHA1 in that respect? I have seen an article that claimed so (sorry, I can't re-find it ATM so I can't offer it for criticism, if anyone else has good information either way please respond with relevant links), and it is immune to extension attacks which is a significant advantage if this is part of your threat sensitivity surface and SHA1 is used without other protective wrappers like HMAC.

bawolff12 hours ago

Truncated sha256 is safer than sha-1 (depending of course on how much you truncate it, but given context lets assume truncating to size of sha-1 - 160 bits).

SHA-1 is quite broken at this point. SHA-256 is not. There aren't any practical non-generic attacks on full sha-256 and thus there wouldn't be any on the truncated version. The Wikipedia article goes into the different attacks on the two algorithms.

That said, if your concern is length extension attacks - strongly reccomend using sha-512/256 instead of trying to do your own custom thing.

mjw100715 hours ago

> All that is left is the hard work of making the transition to a new hash easy for users — what could be thought of as "the other 90%" of the job.

If that was all that was left, we could at least be using sha256 for new repositories.

It seems to me the big missing piece is support in libgit2, which is at least showing signs of progress:

https://github.com/libgit2/libgit2/pull/6191

xyzzy_plugh12 hours ago

libgit2 isn't an official library, and even if it did support sha256 dependents would still need to update, so I really don't perceive this as a missing piece.

If everyone started using sha256 then all these problems would be addressed practically overnight.

jiggawatts12 hours ago

If you’re going to “fix” the hash algorithm, do it properly!

Sha256 can only be computed in a single sequential stream (thread) by definition.

For large files this is increasingly becoming a performance limitation.

A Merkle tree based on SHA512 would have significant benefits.

SHA512 is faster than SHA256 on modern CPUs because processes 64 bits per internal register instead of 32 bits.

A tree-structured hash can be parallelised across all cores.

For repositories with files over 100MB in them on an SSD this would make a noticeable difference…

dchest11 hours ago

Most git objects are tiny files, so internal tree-based parallelization won't bring much compared to file parallelization (git is a hash tree itself, with variable-length leaves).

SHA256 is actually a lot faster on modern CPUs due to https://en.wikipedia.org/wiki/Intel_SHA_extensions (and similar on Arm), which are implemented for SHA-256 but not for SHA-512, e.g. openssl speed sha256 sha512 on M1:

  type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
  sha256           89474.97k   283341.15k   901724.41k  1730980.24k  2339109.86k
  sha512           66160.19k   262139.03k   365675.96k   487572.26k   545142.91k
jiggawatts11 hours ago

A fair point about the instruction sets, and it is also true that “most” files are small.

But again, due precisely to their size, large files take a disproportionate amount of time to process.

Don’t confuse the typical use-case with the fundamental concept: versioning.

Git could be a general purpose versioning system with many more use-cases, but limitations like this hold it back unnecessarily…

dchest2 hours ago

Hashing is not the only thing that stops git from being useful for large file versioning. For this purpose, splitting files into chunks using a rolling hash (similar to how git packs, rsync, tarsnap or IPFS) would work better. This again doesn't require "internal" tree hashing, since each chunk would be hashed separately.

akvadrako10 hours ago

Actually, SHA256 is faster since many common processors have special instructions to accelerate it.

TillE15 hours ago

Bjarmason has a good response about the practicalities of an attack; it explains why a "broken" hash is rarely a running-around-with-your-hair-on-fire level emergency. It would clearly be better to use a better hash, but is it actually urgent for anyone? Probably not.

mdavidn15 hours ago

I don’t depend on the collision resistance of SHA-1 for the security of my git repos because I don’t accept pushes from people I don’t trust. If I did, objects with hash collisions would not be transferred or (I hope) accepted. Am I missing something?

Granted, signed tags do depend on this collision resistance, but I don’t use that feature. Signing entire releases from a trusted repo seems like a better approach.

teraflop15 hours ago

It's not just the pushes themselves; anyone who can create commits or blobs that eventually get merged into your repository, directly or indirectly, can potentially engage in a collision attack.

Sure, if you use git with a very closed development model, this doesn't necessarily affect you much. But it's (potentially) a big problem for collaborative open-source projects, because it requires trust in every single contributor. And the trust requirement can't necessarily be mitigated using ordinary means like code reviews.

pornel13 hours ago

Collision isn't a spooky action at distance. Even if they tricked the victim into accepting a file they have a collision for, they still can't do anything nefarious. Attack requires an opportunity to replace the colliding file with its evil twin, and that requires write access to victim's repository or tricking the victim into re-fetching their files from an attacker-controlled repository.

Besides, the known collision attack generates files with blocks of binary garbage, which makes it difficult to trick someone into accepting. It won't look like source code, and if someone accepts binary blobs of executable code, you don't need collisions to pwn them.

staticassertion3 hours ago

> Besides, the known collision attack generates files with blocks of binary garbage, which makes it difficult to trick someone into accepting. It won't look like source code, and if someone accepts binary blobs of executable code, you don't need collisions to pwn them.

IDK, I could see this happening in multiple ways.

1. Images / media artifacts stored for display purposes

2. Cached files - 'zero install' config for yarn comes to mind, where every dependency has its file cached in git.

Plus binary files aren't displayed in git diffs so it seems somewhat easy to sneak in.

Otherwise, yeah, agree. Most people don't rely on Git's security model, they rely on Github's.

altfredd3 hours ago

> binary files aren't displayed in git diffs

They are (albeit not as prominently as they should). And you can add your own diff engines to show full diffs for different binary formats.

bradhe15 hours ago

Does the usage of SHA-1 in Git actually have security implications, though? It's basically only used to generate addresses for refs and hunks and all that.

blakesterz15 hours ago

The article does address that:

"Given the threat that the SHA-1 hash poses, one might think that there would be a stronger incentive for somebody to support this work. But, as Bjarmason continued, that incentive is not actually all that strong. The project adopted the SHA-1DC variant of SHA-1 for the 2.13 release in 2017, which makes the project more robust against the known SHA-1 collision attacks, so there does not appear to be any sort of imminent threat of this type of attack against Git. Even if creating a collision were feasible for an attacker, Bjarmason pointed out, that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all."

avar11 hours ago

(I'm the "Bjarmason" quoted in the article)

To elaborate a bit: One thing that makes a viable attack against Git especially hard is that aside from the hash it's using has a behavior of never replacing an already hashed object[1].

So let's say I have a tool that can take a given file & SHA-1 pair and produce a collision, the next step is quite hard. I could in this scenario produce a file with an exploit whose hash matches that of Linus's kernel/pid.c or whatever.

But how do I get that object to propagate among forks of linux.git to distribute my exploited code?

If I e.g. push it to a fork of linux.git on a hosting provider that Linus uses the the remote "git-index-pack" process will hash my colliding object, but before it stores it check whether such an object ID exists in its object store, if it does it'll drop it on the floor. You don't need to store data you've already got in a content-addressable filesystem.

Which is not to say that a hash collision is a non-issue, and Git should certainly be migrating from SHA-1. There's no disagreement about that in the Git development community.

But it matters for how much you should panic how the software you're using could be exploited in the case of a hash collision.

Also, the scenario above presupposes a preimage attack, which is a much worse attack on a hash function than a collision attack. Currently no viable preimage attack on SHA-1 exists, only a collision attack.

Which means that before any of the above I'd have to have produced a viable version of say kernel/pid.c that Linus was willing to merge, knowing that my evil twin of that version is something I intended to exploit people with.

Then I'd need to patiently wait for that version to make it into a release, knowing that even a one-byte change to the file would foil my plans...

1. On the topic of running with scissors: I wrote a patch to disable that collision check for an ex employer, it helped in that I/O-bound setup, and we were confident in the lessened security being a non-issue for us in that particular setup. The patch never made it into git's mainline. The patch won't apply anymore, but the embedded docs elaborate on the topic: https://lore.kernel.org/git/20181113201910.11518-1-avarab@gm...

rurban4 hours ago

and with the Linux workflow every reviewer usually adds his -s signed by me tag, which changes the hash. so all the collision effort is gone.

mvkg10 hours ago

Regarding the collision attack replacement check, do you know if that is carried over into other git implementations (e.g. libgit2)?

avar10 hours ago

I had to look, but in the case of libgit2 yes they have. Like git they have a way to select SHA-1 backends, and the default is the SHA1DC library.

But, even supposing a libgit2 that didn't use SHA1DC I think most users would be protected in practice if the "git" they use used SHA1DC. Hosting providers, local editors etc. use libgit2 for a lot of things, but I think in most cases (certainly in the case of the popular hosting providers) it's some version of "/usr/bin/git" that's handling your push, and actually propagating your objects.

For stopping a colliding hash it's enough that any part of the chain of propagation is able to stop it.

Salgat15 hours ago

From what I've heard it's as simple as injecting the necessary garbage into a comment to fit the required hash for modified code.

bawolff13 hours ago

There is a big difference between having 2 file with the same garbage comment but different content that have the same hash and creating a new file that had a garbage comment and has the same hash as some other file not chosen by the attacker. (preimage vs collision).

Sha1 has a collision attack. We are far away from a preimage attack

layer812 hours ago

There is a middle course: You could get a pull request accepted with good content, but including a sensible comment whose exact wording you can choose, so later you can replace the contents of that commit with malicious code and a garbage comment. Such a collision is easier to create than a preimage attack (because you have some control over the preimage), but harder than if you could choose the preimage arbitrarily (which wouldn’t be accepted in the pull request). I admit that I have no idea how to quantify the difference in difficulty.

prepend14 hours ago

> as simple as injecting the necessary garbage into a comment to fit the required hash for modified code.

This seems true yet there are no demos or documented attacks using this method.

I think practically speaking it’s kind of a pain to do.

jandrese15 hours ago

The comment full of random garbage will probably look weird to a human, but by the time a person is looking at the code it will probably be too late.

But you could also hide it as a fake lookup table or inline XPM or something like that.

Zamicol15 hours ago

This is concerning from a signing perspective.

Example: `git commit -a -S -m 'signed commit'` signs the SHA-1 hash directly.

Even if the SHA-1 digest is rehashed with a secure hashing algorithm, SHA-256, it would hide the fact that the reference is to an insecure hashing algorithm. The project itself needs to be rehashed with a secure hashing algorithm for signing to be secure.

Dylan1680714 hours ago

It's more complicated than that. If the most recent signatures are entirely based on SHA-256, and you trust those signatures sufficiently, then they act as protection for all ancestor commits. In that case a SHA1-based signature on an older commit isn't a big deal.

Zamicol14 hours ago

>then they act as protection for all ancestor commits

How does that work? My understanding was that a git gpg signature only signs the project at that commit state.

It says nothing about past (or future) commits outside of a digest reference to past commits, which if that digest wasn't upgraded, would be considered insecure.

Said another way: Git does not rehash past commits, or the present commit, when gpg signing. A commit itself only includes the SHA-1 digest of the previous commit.

layer812 hours ago

You are correct. In the AdES signature world, the solution is to have a cryptographic (signed) timestamp using a newer hash algorithm that rehashes all previous commits, and to include that timestamp into a new commit. When verifying the hashes of old commits, the software would verify that those are covered by an appropriate timestamp that proves that they were created before the old hash algorithm was considered too weak.

This is very similar to the following: Instead of rehashing, i.e. replacing old hashes with new hashes, add the new hashes alongside the old ones, and sign the new hashes, together with the time mark, by a trusted authority. The old hashes and signatures then remain valid indefinitely as long as the new hashes and signatures are verified successfully.

Dylan1680714 hours ago

If you convert a repo to SHA-256, then surely it will recalculate all the hashes back to the start, right? Otherwise that's not a conversion. And then new signatures will use a hash that's SHA-256 all the way down.

The old signatures will still be SHA-1. But if you try to replace any part of a commit, the SHA-256 won't match. So the combination of "the commit is an ancestor of multiple securely signed commits in this repo" and "the SHA1 on the signature matches" is enough to know you have the right data in most use cases.

mmastrac15 hours ago

If a repo accepts third-party contributions, you can create a split brain where half the people see one set of contents and the others see a different set, but the same hashes are available.

I don't know if this would survive additional commits on top as I'm not familiar enough with git's internals.

progval13 hours ago

It will survive it until someone touches the affected blob, then they'll converge to the version that person has.

saghm15 hours ago

I don't think it does; sure, someone could potentially craft a malicious commit that causes a SHA1 collision in your repo, but I think if you are merging commits from malicious authors, you've got way bigger problems than that.

corbet15 hours ago

...and if you're merging commits from a developer who, unknown to either of you, had their laptop compromised and their repo corrupted? Remember that the compromise of kernel.org happened via a developer's laptop, and it was only the security of the hash chains that preserved confidence in the repositories stored there.

As noted in the article, an SHA-1 collision attack does not appear practical now, but that is a situation that can change.

shakna14 hours ago

GitHub actually makes pull requests available as an unlisted part of the original repository under refs/pull/$PR/head and refs/pull/$PR/merge, which allows a malicious author to add themselves to your index, without your involvement.

Not to say that this attack is in any way practical, yet. Just that some providers don't require active involvement to try and attempt it.

nonameiguess15 hours ago

It's difficult to exploit, but possible.

I think the actual issue here is environment accreditation not allowing the use of sha-1 at all, but that is still rare. It'll become a much larger issue if a future FIPS standard ever disallows sha-1, because that will impact a ton of environments. It means git won't even work on your servers any more.

wepple14 hours ago

I was curious about the “sha1dc” that git uses and reportedly helps protect against collision attacks.

Here’s the paper: https://marc-stevens.nl/research/papers/C13-S.pdf

rnhmjoj3 hours ago

I hope this doesn't turn into an IPv4/IPv6 situation, but it looks like it's directly headed that way. Without a proper transition mechanism companies won't implement SHA-256 because there's no demand for it and users can't switch until companies support SHA-256.

ainar-g15 hours ago

The article mentions that “none of the Git hosting providers appear to be supporting SHA-256”, but what about self-hosted solutions? In particular, sr.ht. Seems to be nothing[1] in their issue tracker.

[1]: https://todo.sr.ht/~sircmpwn/git.sr.ht?search=sha-256

ainar-g14 hours ago

Hah! I guess there should be one about smarter search as well, heh. Thanks!

hrgiger9 hours ago

Probably performance impact is little but still noticeable for a large project ,maybe even with modern hardware but wouldnt a portable approach would be better? Not the security but performance perspective maybe one would prefer cheap calculation, i.e. ext4 pre-calculate already crc32c , just found out google added alternative highwayhash recommends and claims it is faster.

https://github.com/google/highwayhash

janwas4 hours ago

Author here. HighwayHash is indeed reasonably fast for a PRF/MAC/fingerprint, but our security claims are not strong enough for it to serve as a cryptographic hash. No collision nor preimage attacks are known to us, but it would not be appropriate to use HighwayHash in this context.

londons_explore6 hours ago

Normally migrations of this type are done in two phases:

* Add support for new hash

* Migrate all data to new hash, dropping support for clients that don't support it.

It appears we're in the waiting phase between the two bullet points. I imagine that could be many years, because many people don't update their git clients often.

userbinator5 hours ago

MD5 is still strong to preimage attacks although you can generate collisions in seconds now. Thought experiment: What if Git had used MD5?

The news has been proclaimed loudly and often: the SHA-1 hash algorithm is terminally broken and should not be used in any situation where security matters.

There are organizations where SHA-1 is blanket banned across the board - regardless of its use

I think it's about time people understood the different types of attacks and what they mean for various uses of hash functions instead of blindly cargo-culting as the security (aka paranoia) industry seems to want to do. Then again, the people in those same organizations who promote this sort of anti-thinking would probably suddenly not care if you just renamed all occurrences of "SHA1" in everything they see to something else, because they are such incompetent idiots anyway --- true stories from experience...

kerblang13 hours ago

The part of software engineering they don't teach in college is migration. Some of the most creative work you'll do is figuring out how to get from X to Y without bringing everything crashing down around you (or at least only a couple things crashing down at a time).

iraqmtpizza4 hours ago

Curious why SHA-256 and not SHA-512 or SHA-512/256 which have higher throughput. Interoperability? Hardware acceleration?

hmsimha9 hours ago

Regarding the difficulty of generating a collision with working code, wouldn't it be as "trivial" as generating a SHA-1 collision in the first place?

Just have the malicious code in a file and append a multiline comment block, then have your collision generator insert random junk into that comment block

ed25519FUUU9 hours ago

Generating the “junk” is the extraordinarily challenging part, though it’s been proved possible.

hmsimha8 hours ago

Sure, I had the word "trivial" in quotes for that reason. I meant that if you're able to generate a collision, the rest of this doesn't follow (from the article)

> Even if creating a collision were feasible for an attacker, Bjarmason pointed out, that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

O__________O14 hours ago

Anyone aware of any exploits tied the SHA-1 weakness in the wild?

(I have seen proofs of concept [1], but never actually heard of an exploit in the wild using it; for example, on: digital certificate signatures, email PGP/GPG signatures, software vendor signatures, software updates, ISO checksums, backup systems, deduplication systems, Git, etc.)

[1] https://shattered.io/

password432114 hours ago

Applications of that collision:

https://twitter.com/rauchg/status/834770508633694208 > a SHA-1 "Pinata" [...] claimed

https://news.ycombinator.com/item?id=13723892 > Make your own colliding PDFs

https://news.ycombinator.com/item?id=13917990 > Collision Detection

bawolff13 hours ago

Most security critical systems have switched to sha256 at this point, and making a fresh collision still costs tens of thousands, so people arent really doing it for kicks (that said, once you have one collision you can reuse it for free as long as you keep the same prefix, so the proof of concept can be repurposed with certain constraints).

The most in the wild one i have ever heard of was when webkit accidentally broke their svn repo by checking in a collision.

However you can look at the history of md5 which had a similar flaw which was exploited by the flame malware.

O__________O13 hours ago

Thanks, agree the Flame’s use of a collision attack was both comparable and notable:

https://en.m.wikipedia.org/wiki/Flame_(malware)

pornel13 hours ago

The worst thing about the SHA-1 collision is the tedium of explaining the difference between a collision attack and a preimage attack.

chmaynard14 hours ago

Previous articles on this topic:

A new hash algorithm for Git https://lwn.net/Articles/811068/

Updating the Git protocol for SHA-256 https://lwn.net/Articles/823352/

WorldMaker15 hours ago

Almost feels like by the time git finally transitions to SHA-256 some bitcoin miner somewhere will have a solved preimage weakness on SHA-256.

le-mark14 hours ago

Addition modulo 2 paired with xor is a motherfucker ie a very difficult problem. That’s not even considering rotation of intermediate results.

jagger2715 hours ago

Thankfully existing Bitcoin ASICs don’t pose much of a threat because they’re only good for sha256(sha256(Bitcoin block)).

If a practical pre-image attack on SHA-256 comes around we have bigger problems than git.

WorldMaker14 hours ago

Obviously the concern is not the ASICs themselves but the ASIC designers. (Using miners here in the colloquial sense of human collectives/corporations backing the machines than than the specific sense of the raw machines themselves.)

Yes, a practical preimage weakness in SHA-256 is a nightmare scenario with huge implications to the rest of internet security beyond just get. It's why I sometimes can't sleep at night knowing how much energy bitcoin spends daily on a continuous massively distributed partial preimage attack on SHA-256.

marktangotango14 hours ago

> how much energy bitcoin spends daily on a continuous massively distributed partial preimage attack on SHA-256.

I would not be concerned about this. The way the asics operate is they discard the results. Also, the hashes are random strings which don't compress very well, so storing trillions upon trillions of them (for later analysis) is not practical.

WorldMaker13 hours ago

Again, the point of the fear is not the specifics of current operations (ASIC details; which y'all are talking about as if all of the miners are using the same hardware), but the fear of future operations and that there's an enormous industrial preimage attack effort at all. One that we can see in real time, in global energy consumption graphs.

Maybe you find "cold comfort" that because we can watch it in real time if someone discovers a weakness we will also watch its repercussions and the subsequent horrifying fall in real time, too, but I certainly don't.

phlogisticfugu7 hours ago

at the risk of sounding like a heretic:

if backwards compatibility is so hard, why not break it? replace git with git2 in all places (command line, URLs, protocol, etc) and make a git1to2 utility, and you are golden

sdfhdhjdw312 hours ago

> Even if creating a collision were feasible for an attacker, Bjarmason pointed out, that is only the first step in the development of a successful attack. Finding a collision of any type is hard; finding one that is still working code, that has the functionality the attacker is after, and that looks reasonable to both humans and compilers is quite a bit harder — if it is possible at all.

Sounds like there's money in this.

heynowheynow13 hours ago

It might be wiser to keep SHA1 and use SHA2, SHA3, etc. and GPG as overlays for compatibility and simplicity reasons.

slim13 hours ago

Why didn't Linux migrate their repo to SHA-256 ? is it that difficult to migrate a repo ?

encryptluks214 hours ago

It seems like something more modern, like b3sum would be better... no? What about b2sum?

kortex14 hours ago

I love the performance of blake3 but my understanding is it's still a bit of the new kid on the block. Blake2 is a SHA-3 finalist so should be perfectly sufficient, plus it has variable digest sizes, reasonably fast, and other nice features.

Either way, anything relying on hashes for data integrity should at least be flexible to the option of multiple hash algos. But with git, it's going to be hard enough as is to change to SHA-256, and I don't know how parametric it'll be.

notfed6 hours ago

Actually, BLAKE2 was not a SHA-3 finalist, BLAKE was.

Also, BLAKE3 is so significantly faster than it's probably worth waiting for (waiting for it to get vetted cryptanalysis-wise).

MrStonedOne14 hours ago
pmarreck9 hours ago

Couldn't there be a transition period where both are supported?

kazinator14 hours ago

> Given the threat that the SHA-1 hash poses

I give -3 flying ducks about this, and don't want the Git storage format to be diddled with in any way. Git in 2122 should read and write a git repo made in 2010.

Git is not a public crypto system.

If you think a commit is important and needs to be signed, you need to sign the files and add the signature to the commit.

kzrdude2 hours ago

Migrating to a non broken hash is the better chance at being a century long strong repository format.