The xz attack shell script

bandrami • 3 months ago

As a longtime autotools-hater I would say this justifies my position, but any build system complex enough to be multiplatform is going to be inscrutable enough to let somebody slip something in like this. But it really is a problem that so much software is built with what's essentially become a giant cargo cult-style shell script whose pieces no one person understands all of.

echoangle • 3 months ago

I think the common usage of bash and other languages with a dense and complicated syntax is the root problem here. If build scripts were written in python, you would have a hard time obfuscating backdoors, because everyone would see that it is weird code. Bash code is just assumed to be normal when you can’t read it.

kimixa • 3 months ago

I think the issue is that build systems are often build on a completely different language - python with some weird build framework is likely as inscrutable as bash and autotools to someone who doesn't use python.

You can write pretty straightforward readable bash, just as I'm sure you can write pretty gnarly python. Especially if you're intentionally trying to obfuscate.

sanderjd • 3 months ago

Man, this sounds right, but I dunno ... I feel like even "simple" shell scripts tend toward more inscrutability than all but the most questionable python code I've ever written. Just comparing my dotfiles - every bit of which I understood when I wrote them - to the gnarliest sections of a big python app I work on ... I just really feel like at least some of this inscrutability issue can truly be laid at the feet of shell / bash as a language.

+4

dboreham • 3 months ago

+2

bayindirh • 3 months ago

+3

clnhlzmn • 3 months ago

echoangle • 3 months ago

The problem is that obfuscated bash is considered normal. If unreadable bash was not allowed to be committed, it would be much harder to hide stuff like this. But unreadable bash code is not suspicious, because it is kind of expected. That’s the main problem in my opinion.

+1

kimixa • 3 months ago

+1

freedomben • 3 months ago

+1

owlbite • 3 months ago

+2

salawat • 3 months ago

Etheryte • 3 months ago

This feels a bit like saying you can run as fast as Usain Bolt. Theoretically many things are possible, but I don't think I've ever seen a readable bash script beyond a trivial oneliner and I've seen a lot of bash in my life. Or to maybe explain from a different perspective, ask a room full of developers to write a bash if-else without looking at the docs and you'll probably come back with more different options than developers in the room. Ask the same for a language such as Python and you'll mostly get one thing.

giantrobot • 3 months ago

We'll write build scripts in highly obfuscated Perl. There, now no one is happy.

deanishe • 3 months ago

Python makes it relatively hard to write inscrutable code, and more importantly, it's very non-idiomatic, and there would be pushback.

WTFing at shell scripts is normal.

cqqxo4zV46cp • 3 months ago

Yep. Python is my day job. If anyone on my team put something up that was as unreadable as a typical bash script, without really good justification, I’d immediately hit “request changes”.

This isn’t just because I’m more familiar with Python. I don’t even think it’s the main reason. It’s just that Python is more likely to be able to be read in a ‘natural language’ sort of way. It’s better at doing what it says on the tin. It’s more able to be read via pure intuition by a programmer that’s not familiar with Python specifically.

In bash land? “What the hell is [[?”

And yes, I could come up with 20 ways off the top of my head that Python is a far-from-perfect language.

And I’m not even saying that Python is the right tool for the job here. Maybe we’re better off with one of the many more modern attempts at a shell language. The main thing is that we should acknowledge that bash has overstayed its welcome in many of the areas in which it’s still used.

+1

egorfine • 3 months ago

cqqxo4zV46cp • 3 months ago

Yeah, nah. I’m all for “every tool has its place”, “religious wars over languages are silly”, etc. Don’t get me wrong. But anyone that claims that all but the 1% most simple bash scripts are “straightforward” and “readable” is, to be blunt, showing their age.

The reality is that we as a society have made meaningful progress as far as designing readable languages goes. And we LITERALLY have many orders of magnitude more resources to make that happen. That’s something to feel good about. It’s unreadable to continue to mischaracterise some sysadmin greybeard’s’ familiarity with bash as an indication that it is in any way intuitive or readable.

Like, sheesh, now we all sound like C developers trying to justify an absurdly footgun-laden standard library just because we happen to know the right secret incantations, or think that we know them, anyway. But now this is definitely becoming a religious war…

sunshowers • 3 months ago

It's rather easy to monkeypatch Python into doing spooky things. For something like this you really want a language that can't be monkeypatched, like I think Starlark.

echoangle • 3 months ago

Where are you going to hide your monkey patching though? As long as your code is public, stuff like this is always going to stand out, because no one writes weird magic one liners in python.

+2

zmmmmm • 3 months ago

cyanydeez • 3 months ago

This code was essentially monkey patched from a test script. Python automatically runs any code in a imported module, so not hard to see a chain of module imports that progressively modifies and deploys a similar structure.

+2

sunshowers • 3 months ago

DrFalkyn • 3 months ago

eval() is a big security hole

unhammer • 3 months ago

> In the beginning of Unix, m4 was created. This has made many people very angry and has been widely regarded as a bad move.

autoconf creates a shell script by preprocessing with m4. So you need to know not just the intricacies of shell scripting, but also of m4, with its arcane rules for escaping: https://mbreen.com/m4.html#quotes

If autoconf used m4 to generate python scripts, they would also look like https://pyobfusc.com/#winners

heavyset_go • 3 months ago

The things that Bash is good at can wind up obfuscated in Python code due to the verbosity and complexity that it translates to in Python.

Bash is great at dealing with files, text, running other programs, job handling, parallelism and IPC.

Those things in combination can end up being more complex in Python, which creates more opportunities for obfuscation.

AtlasBarfed • 3 months ago

Just wait for AI to start pumping spaghetti code everywhere.

That's AI phase one

Hey, I phase 2 is even better disguised exploit code hiding behind acres of seemingly plausible AI generated code

matheusmoreira • 3 months ago

Any language can become a turing tarpit if you try hard enough.

eru • 3 months ago

Some languages make you try harder, some less so.

And not all languages are Turing complete in the first place. Not even all useful languages.

DarkNova6 • 3 months ago

Yet it is python code which has the highest amount of security vulnerabilities found in public repos. And the most often times that pre-compiled code is commited as well.

vmfunction • 3 months ago

Cliffy is also pretty good: https://cliffy.io

And type safe.

livrem • 3 months ago

I see what everyone is saying about autotools, and I never envied those that maintained the config scripts, but as an end-user I miss the days when installing almost any software was as simple as ./configure && make && make install.

humanrebar • 3 months ago

Running that command is simple, sure. The development workflows when that doesn't work is full stop horrible.

Also the basic workflows for the alternative build systems have maybe ten more characters to type. It's not bad.

humanrebar • 3 months ago

Autotools need to go away, but most of the underlying use cases that cause build system complexity comes down to dependency management and detection. The utter lack of best practices for those workflows is the root cause of complexity. There is no way to have a mybuild.toml build config on the face of those challenges.

dpkirchner • 3 months ago

Agreed, and I think this leads to the question: how much risk do we face because we want to support such a wide range of platforms that a complex system is required?

And how did we get to the point that a complex system is required to build a compression library -- something that doesn't really have to do much more than math and memory allocation?

cesarb • 3 months ago

> And how did we get to the point that a complex system is required to build a compression library -- something that doesn't really have to do much more than math and memory allocation?

The project in question contained a compression library, but was not limited to it; it also contained a set of command line tools (the "xz" command and several others).

And a modern compression library needs more than just "math and memory allocation"; it also needs threads (to make use of all the available cores), which is historically not portable. You need to detect whether threads are available, and which threading library should be used (pthreads is not always the available option). And not only that, a modern compression library often needs hand-optimized assembly code, with several variants depending on the exact CPU type, the correct one possibly being known only at runtime (and it was exactly in the code to select the correct variant for the current CPU that this backdoor was hidden).

And that's before considering that this is a library. Building a dynamic library is something which has a lot of variation between operating systems. You have Windows with its DLLs, MacOS with its frameworks, modern Linux with its ELF stuff, and historically it was even worse (like old a.out-based Linux with its manually pre-allocated base address for every dynamic library in the whole system).

So yeah, if you restrict yourself to modern Linux and perhaps a couple of the BSDs, and require the correct CPU type to be selected at compilation time, you could get away with just a couple of pages of simple Makefile declarations. But once you start porting to a more diverse set of systems, you'll see it get more and more complicated. Add cross-compilation to the mix (a non-trivial amount of autotools complexity is there to make cross-compilation work well) and it gets even more complicated.

JonChesterfield • 3 months ago

Right there with you. It's really tempting to blame this entire thing existing on m4 but that's the trauma talking.

salawat • 3 months ago

Let me reword that for you:

>No one has any business saying they know what something does until they've actually read it.

Beneath the placid surface of abstraction is the den of the devil.

kjellsbells • 3 months ago

I'm glad to see I'm not a minority of one here. Looking at autoconf and friends I'm reminded of the dreck that used to fill OpenSSL's code simply because they were trying to account for every eventuality. Autotools feels like the same thing. You end up with a ton of hard to read code (autogenerated bash, not exactly poetry) and that feels very inimical to safety.

frankohn • 3 months ago

I do not agree with your generalisation, the Meson build system is well thought, it has a clear declarative syntax that let you just express what you want in a direct way.

The designer of Meson explicitly avoided making the language turing complete so for example you cannot define functions. In my experience this was an excellent decision to limit people tendency to write complex stuff and put the pressure on the Meson developer to implement themselves all the useful functionalities.

In my experience the Meson configuration are as simple as they can be and accommodate only a modicum of complexity to describe OS specific options or advanced compiler option one may need.

Please note that some projects' Meson file have been made complex because of the goal to match whatever the configure script was doing. I had in mind the crazy habits of autotools to check if the system has any possibly used function because some system may not have it.

humanrebar • 3 months ago

Meson is way better than autotools, but if you're too level to depend on python, you're probably needing to customize your build scripts in the ways you mention. I don't see meson being a silver bullet there.

Also, meson's build dependencies (muon, python) are a lot for some of these projects.

pknopf • 3 months ago

If a projects build system can't depend on python, that let's leave them in the dust, ffs..

humanrebar • 3 months ago

What if the project is python or a dependency of python?

Brian_K_White • 3 months ago

This just results in 500x worse build.sh on top of meson/ninja/whatever.

jongjong • 3 months ago

The devil is in the detail and nothing obscures details like complexity.

Same reason why I don't like TypeScript in its current form. It's not worth the extra complexity it brings.

shp0ngle • 3 months ago

Every time someone explains to me autotools, the individual pieces sort of make sense, yet the result is always this inscrutable unreadable mess.

I don't know why.

rgmerk • 3 months ago

Yeah.

I haven’t used it for some time but autoconf always seemed like a horrible hack that was impossible to debug if it didn’t work properly.

That was bad enough back in the days where one was mostly concerned with accidents, but in more modern times things that are impossible to debug are such tempting targets for mischief.

cyanydeez • 3 months ago

Just the fact that you have multiple platforms suggests few people will fully understand the entire complex.

jonhohle • 3 months ago

The issue was that built artifacts weren’t immutable during the test phase and/or the test phase wasn’t sandboxed from the built artifacts.

The last build system I worked on separated build and test as separate stages. That meant you got a lot of useless artifacts pushed to a development namespace on the distribution server, but it also meant later stages only needed read access to that server.

edflsafoiewq • 3 months ago

The malicious .o is extracted from data in binary test files, but the backdoor is inserted entirely in the build phase. Running any test phase is not necessary.

rafaelmn • 3 months ago

So if you ran build without test files it would fail. I get that this is hindsight thinking - but maybe removing all non essential files when packaging/building libraries reduces the surface area.

mittermayr • 3 months ago

As a developer, this amazes me, and it just shows what — to me — feels like a top-tier attack method, is probably only entry-to-mid level complexity for the folks working at that stage. Some of the things I see posted here on HN are well above this level, so I'd assume for the right kind of money (or other incentives), this is only the beginning of what's possible. And, if you think of ALL the packages and ALL the many millions of libraries on GitHub, this vector is SO EFFECTIVE, there will be hundreds of cases like it uncovered in the next few months, I am certain of it.

I worry about all the con/pro-sumer hardware makers, from Philips Hue to Alexas, from the SumUps to the camera makers, from Netgear to TP-Link. All their products are packed with open-source libraries. And I am 100% certain that most of their dev teams do not spend time scanning these for obscure injection vectors.

jimkoen • 3 months ago

> And I am 100% certain that most of their dev teams do not spend time scanning these for obscure injection vectors.

This rationale baffles me, it feels that the dependency-hell circlejerk crowd is working on making OSS maintainers look even more bad with this scenario.

Any given commercial operation that claims any credibility for itself does supply chain analysis before adopting a dependency. This is, among other things why ordinarily you'd pay RedHat to maintain a stable Linux Release for you and why projects such as FreeBSD severely limit the software they ship in the default install.

If you are affected by this mess, I'm sorry to say, but it's your fault. If you are worried about developers of software you use for free, as in free beer, going rogue, either put in incentives for them to not do that (i.e. pay them) or fork the project and implement your own security measures on top of what's already there.

If you're worried that you could encounter exploits from dependencies in commercial software you use, you should negotiate a contract that includes compensation from damages from supply chain attacks.

If you're unwilling to do that, sorry mate, you're just unprofessional.

Inb4: Yes, I am really trying to say that you should check the supply chain of even your most basic dependencies such as SSH.

finaard • 3 months ago

Unfortunately that's "industry standard" nowadays. I lost count how often I had that discussion over the last two decades.

Just look at stuff like pip, npm or pretty much any "modern" package manager in use by developers - they're all pretty much designed to pull in a shitload of arbitrary unaudited and in some causes unauditable dependencies.

And nobody wants to listen. That's why I prefer to work in heavily regulated areas nowadays - that way I can shorten that discussion with "yeah, but regulatory requirements don't let us do that, sorry"

The absolute basic should be having a local archive of dependencies which at least received a basic sanity check, and updates or additions to that should review changes being added. CI gets access to that cache, but by itself does not have network access to make sure no random crap gets pulled into the build. You'd be surprised how many popular build systems can't do that at all, or only with a lot of workarounds.

eternityforest • 3 months ago

Seems like a lot of this could be solved with a whitelist of trusted dependencies.

There are already lots of groups maintaining internal lists and analysis of dependencies they trust. If there was a platform for reporting safety, rather than reporting vulnerability, one could say "Only allow packages that someone from a fortune 500 company publish an analysis of".

pknopf • 3 months ago

Package managers that use git are less prone to this kinda of attack (goland, rust).

+2

steveklabnik • 3 months ago

jimkoen • 3 months ago

> they're all pretty much designed to pull in a shitload of arbitrary unaudited and in some causes unauditable dependencies.

No they're not. The dependency circle jerk went so far to prompt NPM to display all subsequent dependencies on each libraries page.

The issue lies with the industry as a whole exploiting the work of OSS developers for their own gain and having the audacity to complain when these volunteers won't additionally do a security audit for free.

Dunedan • 3 months ago

> Any given commercial operation that claims any credibility for itself does supply chain analysis before adopting a dependency. This is, among other things why ordinarily you'd pay RedHat to maintain a stable Linux Release for you and why projects such as FreeBSD severely limit the software they ship in the default install.

That sounds like you assume RedHat would've caught the vulnerability in xz-utils, before shipping it in the next release of RHEL. I'm not so sure about that, as there is only so much you can do in terms of supply chain analysis and such a sophisticated vulnerability can be pretty hard to spot. Also mind that it only got discovered by accident after all.

tcmart14 • 3 months ago

I don't know if RedHat would have caught it. But the benefit of Red Hat is, they would be the one to fall on the sword. Your product is built on RHEL. This happens. You get to shift blame to RHEL and RedHat would eat it. The positive is, after the dust has settled Red Hat could choose to sort of adopt the compromised piece (invest engineering effort and take it over) or take some stewardship (keeping an eye on it and maybe give a hand to whoever is maintaining it after).

+2

cqqxo4zV46cp • 3 months ago

jimkoen • 3 months ago

I said "ordinarily". I meant "this is what you'd expect from them by paying them". Obviously this is a big fauxpas on their end and I'd reconsider using their services after this scenario. After all, security hardening upstream packages is among the reasons you're supposed to use them.

rlpb • 3 months ago

I think it's more than an individual or an organisation. The industry as a whole has favoured not caring about dependencies or where they come from in order to increase velocity. Concerns about supply chain (before we even had the term) were dismissed as "unlikely and we won't be blamed because everyone's doing it").

The organisations that did have some measures were complained about loudly, and they diluted their requirements over time in order to avoid stagnation. Example: Debian used to have a "key must be signed by three other Debian developers" requirement. They had to relax the requirement in part because, from the perspective of the wider ecosystem, nobody else had these onerous requirements and so they seemed unreasonable (although Covid was the final straw). If we'd had an ecosystem-wide culture of "know your upstream maintainer", then this kind of expectation as a condition of maintainership would be normal, we'd have much better tooling to do it, and such requirements would not have seemed onerous to anyone. It's like there's an Overton Window of what is acceptable, that has perhaps shifted too far in favour of velocity and at the cost of security, and this kind of incident is needed to get people to sit up and take notice.

This incident provides the ecosystem as a whole the opportunity to consider slowing down in order to improve supply chain security. There's no silver bullet, but there are a variety of measures available to mitigate, such as trying to know the real world identity of maintainers, more cautious code review, banning practices such as binary blobs in source trees, better tooling to roll back, etc. All of these require slowing down velocity in some way. Change can only realistically happen by shifting the Overton Window across the ecosystem as a whole, with everyone accepting the hit to velocity. I think that an individual or organisation within the ecosystem isn't really in a position to stray too far from this Overton Window without becoming ineffective, because of the way that ecosystem elements all depend on each other.

> If you're unwilling to do that, sorry mate, you're just unprofessional.

There are no professionals doing what you suggest today, because if they did, they'd be out-competed on price immediately. It's too expensive and customers do not care.

begueradj • 3 months ago

>, is probably only entry-to-mid level complexity for the folks working at that stage.

On the contrary: the developers and maintainers who are more informed than us described it as highly sophisticated attack. I also read early InfoSec (information security) articles which were able to only describe a part of the code, not the whole strategy behind the attack because, again, the attack and code are sophisticated. You can also read early InfoSec articles which describe the attack in different ways simply because it was not that simple to understand. Then I read articles saying something like this: "Finally it seems it's an RCE attack".

Of course, now that even a scanner is developed to detect that vulnerability on your server, we can all claim: "Oh that was a so simple and stupid attack, how come no one detected it much earlier ?!"

bipson • 3 months ago

That's why I don't see e.g. TP-Link basing their router firmware on OpenWRT as a win, and why I want the "vanilla" upstream project (or something that tracks upstream by design) running on my devices.

Applies to all of my devices btw. I don't like Android having to use an old kernel, I didn't like MacOS running some ancient Darwin/BSD thing, etc. The required effort for backporting worries me.

Don't get me wrong, I'm not saying OSS has no vulns.

doubled112 • 3 months ago

More orgs directly contributing to upstream is best in my eyes too. I'm not against forking, but there are usually real benefits to running the latest version of the most popular one.

One opposite of this I've seen is Mikrotik's RouterOS. I'm under the understanding that they usually reimplement software and protocols rather than depending on an upstream.

I'd imagine that is what leads to issues such as missing UDP support in OpenVPN for 10 years, and I'm not sure it gives me the warmest fuzzy feeling about security. Pros and cons, I suppose. More secure because it's not the same target as everybody else. Less secure because there are fewer users and eyes looking at this thing.

philipwhiuk • 3 months ago

> there are usually real benefits to running the latest version of the most popular one.

Using the absolute latest version is acting as a beta tester for everyone else and this is not the first case where it means you get absolutely hosed.

fieldcny • 3 months ago

Any moderately well run shop will have a mechanism to get updates when a dependency of theirs has a security issues, depending on the line of business it may actually be required by a regulator or certification body (eg PCI etc)

We should probably be more afraid of the backdoors you can’t see in proprietary that would almost never be found.

0xdeadbeefbabe • 3 months ago

> there will be hundreds of cases like it uncovered in the next few months, I am certain of it.

"Given the activity over several weeks, the committer is either directly involved or there was some quite severe compromise of their system. Unfortunately the latter looks like the less likely explanation, given they communicated on various lists about the "fixes" mentioned above." (https://www.openwall.com/lists/oss-security/2024/03/29/4)

So, it's like story of those security researchers injecting bugs into the kernel https://thehackernews.com/2021/04/minnesota-university-apolo...

I'm saying this isn't that easy to pull off, and it's unlikely we'll see hundreds of similar cases.

diego_sandoval • 3 months ago

> this isn't that easy to pull off,

How much does it cost in money? 400k? 600k? 1M? That's peanuts for any three letter agency from the US, UK, China, etc.

mzs • 3 months ago

I haven't seen Thomas Roccia's infographic mentioned here yet: https://twitter.com/fr0gger_/status/1774342248437813525

ihsoy • 3 months ago

That seems to be largely grasped at straws and connecting dots without reason.

For example oss-fuzz was building xz by cloning the github repo directly. There was never a chance for oss-fuzz to discover the backdoor, because only the tarball had it, not the repo itself. So that oss-fuzz PR might genuinely just be a genuine thing unrelated to the backdoor.

sp332 • 3 months ago

The ifunc part looked more legitimate, it was used to switch between optimized implementations of CRC. So that part was in the git repo. https://github.com/google/oss-fuzz/pull/10667

_trampeltier • 3 months ago

Jia Tan asked distros to update quickly just before it became public. How possible is it, there is another account / person who learned earlyer from people around Andreas Freund, the backdoor would become public. How possible is it, there is still another insider around?

strunz • 3 months ago

That's probably because of this (as mentioned in the timeline):

>2024-02-29: On GitHub, @teknoraver https://github.com/systemd/systemd/pull/31550 to stop linking liblzma into libsystemd. It appears that this would have defeated the attack. https://doublepulsar.com/inside-the-failed-attempt-to-backdo... that knowing this was on the way may have accelerated the attacker’s schedule. It is unclear whether any earlier discussions exist that would have tipped them off.

supriyo-biswas • 3 months ago

rwmj did mention about an unintentional embargo break, so I wonder whether this is GitHub issue is actually it.

+1

teknoraver • 3 months ago

ptx • 3 months ago

There were also changes to systemd happening around that time which would have prevented the backdoor from working. See the timeline article by the same author linked in this one.

OneLeggedCat • 3 months ago

Right. Way too much coincidence. Jia Tan found out that it was about to become public and threw a Hail Mary. How did he find out?

rsc • 3 months ago

I think the RedHat Valgrind report on 2024-03-04 made the Jia Tan team panic, since the one public rwmj stack trace pointed the finger directly at the backdoor. All it would take is someone looking closely at that failure to expose the whole operation. They fixed it on 2024-03-09, but then two weeks later distros still had not updated to the new version, and every day is another day that someone might hit the Valgrind failure and dig. I think that's why the sockpuppets came back on 2024-03-25 begging Debian to update. And then on the Debian thread there was pushback because they weren't the maintainer (except probably they were), so once Debian was updated, Jia Tan had to be the account that asked Ubuntu to update.

+1

peteradio • 3 months ago

freedomben • 3 months ago

If the stakes weren't so high, this would be a damn fun game of murder mystery.

moopitydoop • 3 months ago

This just gives a partial high-level look at how the exploit gets planted in liblzma, it doesn't cover how the exploit works or its contents at all.

xomodo • 3 months ago

Thx. Timeline shows attack begun by adding ignore entry in .gitignore file. That is hard to detect nowadays.

hakdbha • 3 months ago

[dead]

klabb3 • 3 months ago

As a naive bystander, the thing that stands out most to me:

> Many of the files have been created by hand with a hex editor, thus there is no better "source code" than the files themselves.” This is a fact of life for parsing libraries like liblzma. The attacker looked like they were just adding a few new test files.

Yes, these files are scary, but I can see the reason. But at least can we keep them away from the build?

> Usually, the configure script and its support libraries are only added to the tarball distributions, not the source repository. The xz distribution works this way too.

Obligatory auto tools wtf aside, why on earth should the tarballs contain the test files at all? I mean, a malicious test could infect a developer machine, but if the tars are for building final artifacts for everyone else, then shouldn’t the policy be to only include what’s necessary? Especially if the test files are unauditable blobs.

finaard • 3 months ago

It's pretty common to run tests on CI after building to verify your particular setup doesn't break stuff.

Last time we were doing that we were preferring git upstream, though, and generated autocrap as needed - I never liked the idea of release tarballs containing stuff not in git.

klabb3 • 3 months ago

This strengthens the argument I’m making, no? You bring in the source repo when doing development and debugging. In either case - tarball or not - it doesn’t seem that difficult to nuke the test dir before building a release for distribution. Again, only really necessary if you have opaque blobs where fishy things can hide.

lathiat • 3 months ago

The distributions often run the same tests after it’s built to make sure it’s working correctly as built in the distribution environment. This can and does find real problems.

finaard • 3 months ago

We'd usually put the tests in a subpackage so it can be installed and run on devices later on.

wiml • 3 months ago

Because, despite containing some amount of generated autoconf code, they are still source tarballs. You want to be able to run the tests after compiling the code on the destination machine.

nhellman • 3 months ago

Besides for verifying that the compiled program works on the target, tests are also required to compile with PGO because you need to have a runtime example to optimize for.

sanxiyn • 3 months ago

> The first difference is that the script makes sure (very sure!) to exit if not being run on Linux.

The repeated check is indeed mysterious. My only hypothesis is that the attacker may have thought that it should look plausible as a test input to a compression library, hence repetition.

mercurialuser • 3 months ago

It can be to make space for script changes: you may overwrite the first bytes of the script.

Or just add some laziness.

Hakkin • 3 months ago

I also thought it was odd. There's also different random bytes (not random text, actual random bytes) prefixed to the start the scripts, but the bytes are prefixed by a hash symbol, which comments them out, so they don't affect the script. It seems intentional, but I can't think of why they would be there. I thought maybe xz would skip compression if the input was short/not complex enough or something, so they were added to pad the size, but removing them and re-compressing with xz seems to properly compress it, none of the original plaintext is in the compressed archive bytes.

One thing I noticed while trying to reproduce the exact bytes included in the .xz file committed to git is that the script's xz stream doesn't seem to be compressed by any of the default xz presets, I was only able to reproduce it by using `xz --lzma2=dict=65536 -c stream_2`. All the default numbered presents chose a different dictionary size. Another odd seemingly intentional choice, but again I don't understand the reasoning.

Hakkin • 3 months ago

Ah, I think I understand the random bytes at the start of the script now. They're prepended to make the partial output of the "corrupted" (pre-tr'd) test stream look like random data. Without those random bytes, you will see part of the start of the script if you observe the partially decompressed output before xz throws an error. They really thought quite carefully about hiding this well.

Still not sure about the repeated lines, though now I'm convinced there must be some reason for it.

fsniper • 3 months ago

Can it be to enlarge/or obfuscate parts of the compressed test file? Perhaps without the repetitions the compressed file has some strange binary triggering some security or antivirus software?

TheBlight • 3 months ago

Is it though? The attacker probably has Linux x86 target(s) in mind and IFUNC support isn't guaranteed to work with other platforms.

sanxiyn • 3 months ago

Checking for Linux makes sense. Doing the exact same check for Linux five times in a row is mysterious.

tamimio • 3 months ago

It would be an interesting plot twist if the whole thing was an AI hallucination.

ddalex • 3 months ago

How about an AI trying to make itself some spare CPU cycles available ?

neffy • 3 months ago

...or a stalking horse by somebody in Microsoft's marketing division.

dboreham • 3 months ago

Perhaps expected a very fast machine that might blast straight through the first few checks. Like how you needed two STOP statements in a Cray Fortran program in case it blew through the first one at 80MIPS.

ycombinatrix • 3 months ago

why doesn't IFUNC work on Linux ARM64?

bodyfour • 3 months ago

IFUNC is supported on several architectures, including ARM64.

The malicious code that the xz backdoor inserts into the library is a compiled x86_64 object file so it only is targeting one platform.

ycombinatrix • 3 months ago

yea i know the backdoor is AMD64 only. the parent comment said IFUNC isn't supported on ARM64 which is incorrect.

m3kw9 • 3 months ago

Maybe if run on non Linux it could be found out either by crashing or leaving some trace because of OS differences

codezero • 3 months ago

[flagged]

saagarjha • 3 months ago

Malware has no reason to edit what uname returns.

codezero • 3 months ago

Why not? Most malware utilizes uname to either whitelist or blacklist infection on a system, so an APT may benefit from lying.

+1

saagarjha • 3 months ago

ycombinatrix • 3 months ago

anyone know any examples of malware which have affected uname results?

codezero • 3 months ago

ChatGPT could not provide any, fwiw, but it's an enticing idea.

+1

Intralexical • 3 months ago

Solvency • 3 months ago

It's kind of tragically amusing how heinously complex and unnecessarily inscrutable modern technology is, and it's only getting worse. I think developers sadistically enjoy it.

Trufa • 3 months ago

This is probably the worst take ever.

It's pretty amazing how the tools keep up with the increasing complexity of the products we make.

And to be honest, in most cases just make it simpler, I think people just don't like to learn new stuff.

xyst • 3 months ago

Whoever hired these people to infiltrate this project spent a shit ton of hours building it in such a way to avoid detection this long. Fortunately, it was so complicated that they couldn't account for all of the factors.

This is why open source will always outperform closed source in terms of security. Sure it pointed a massive flaw in the supply chain. Sure it highlights how under appreciated the foundational elements of FOSS, leaving maintainers subject to manipulation.

But the same attack within a private company? Shit, probably wouldn't even need advanced obfuscation. With a large enough PR and looming deadlines, could easily sneak something like this with a minimal amount of effort into production systems. By the time company even realizes what happens, you are already flying off to a non-extradition country and selling the exfiltrated data on Tor (or dark web).

bsza • 3 months ago

I’m gonna cry survivorship bias here. How do we know how many similar attempts succeeded? How many of the ones discovered have been written off as honest mistakes? How can we know for sure that e.g. Heartbleed wasn’t put there by someone on purpose (and that that someone isn’t filthy rich now)?

When you get hired to a private company, they know who you are. That’s an immediate deterrent against trying anything funny. On Github, no one knows who you are. It might be harder to backdoor a project without getting noticed, but there is no risk to getting noticed. You can try as many times as you like. Jia Tan is still at large, and didn’t even have to plan their whole life around living in a non-extraditing country (if they aren’t in one already).

xyst • 3 months ago

https://en.wikipedia.org/wiki/Industrial_espionage (aka corporate espionage)

Happens all the time. Maybe it's a state actor. Maybe it's a disgruntled employee. It's just not in the same lens as you expect (software supply chain attack).

Apple has trouble keeping the lid on top secret projects. Leaks about designs happen all the time prior to scheduled debut at WWDC.

MS has had trouble in the past as well when it came to developing the Xbox (One?).

"Owners of China-Based Company Charged With Conspiracy to Send Trade Secrets Belonging to Leading U.S.-Based Electric Vehicle Company" - https://www.justice.gov/usao-edny/pr/owners-china-based-comp...

"Ex-Google engineer charged with stealing AI trade secrets while working with Chinese companies" - https://www.latimes.com/world-nation/story/2024-03-07/ex-goo...

"The US Hits Huawei With New Charges of Trade Secret Theft" - https://www.wired.com/story/us-hits-huawei-new-charges-trade...

"U.S. charges China-controlled company in trade secrets theft" - https://www.pbs.org/newshour/economy/u-s-charges-china-contr...

"Ex-Google and Uber engineer Anthony Levandowski charged with trade secret theft" - https://www.theverge.com/2019/8/27/20835368/google-uber-engi...

In the case of Levandowski, the dude didn't even bother with covering his tracks. Straight up just downloads trade secrets from source control and transfers them to personal computer - https://www.justice.gov/usao-ndca/press-release/file/1197991...

In this small sample of cases, were the exfiltration attempts as elaborate as the "xz attack"? Probably not, but all of these people were vetted by internal procedures and that did nothing to stop them from acting maliciously.

Forget back dooring the project when getting through the front door is so much easier! People are very relaxed in their walled off garden and cubicle :)

resolutebat • 3 months ago

Stealing trade secrets (read: copying source code) is a whole different ballgame from trying to inject backdoors into ubiquitous pieces of software.

bsza • 3 months ago

This kind of demonstrates my point. Every single one of these headlines indicates the bad actor has been "charged" (with serious consequences in the case of Huawei).

Has Jia Tan been "charged" with anything?

+1

nindalf • 3 months ago

yongjik • 3 months ago

I don't buy it. Most private companies would require a face-to-face meeting when you start. Even if you're fully remote, the expectation would be that at some point you would meet your coworkers at meetspace, most likely before you get to commit anything substantial. The ones that are worth penetrating will almost certainly require a background check.

And then, once you're in, you cannot just commit to your target project willy-nilly, as your manager and your manager's manager will have other priorities. A for-profit company's frequently dysfunctional management would actually work as deterrent here: you don't just need to justify your code, you will have to justify why you were working on it in the first place.

xyst • 3 months ago

face to face, background checks. they are all superficial.

A smooth talker can get you to relax your guard.

Identity can be faked, especially if you have a nation state backing you.

bigiain • 3 months ago

How much would you bet against the NSA having a team full of leetcode and interview experts, who's job is to apply at tech companies and perform excellently through the hiring process, so that the offensive team at NSA can infiltrate and work remotely without ever needing to meet there new "coworkers"?

I suspect a "professional job seeker" with the resources of the NSA behind them and who lands 1st and subsequent interviews dozens of times a year - would be _way_ better at landing interviews and jumping through stupid recruiting hoops that even the best senior or "10x engineers", who probably only interview a dozen or two times in their entire career.

breadwinner • 3 months ago

Today no one knows what Jia Tan looks like. That means Jia Tan come back as Gary Smith and work on the next exploit. If we knew what he looked like, we could have at least prevented him from coming back.

+1

not2b • 3 months ago

ThePowerOfFuet • 3 months ago

https://rigor-mortis.nmrc.org/@simplenomad/11218486968142017...

skrtskrt • 3 months ago

off topic, but how many actual non-extradition countries are there these days (for the US)?

Even countries we have strained relationships with will extradite as part of a negotiation when it's convenient for them politically.

Russia probably wouldn't have even kept Snowden if it wasn't state secrets he revealed. If it was just some random data breach they would have prisoner-swapped him for an oligarch caught money laundering elsewhere.

moopitydoop • 3 months ago

Our adversaries aren't going to extradite their citizens to the west. And obviously if it's a state-level actor they aren't going to extradite their own actors.

As someone old enough to remember the tail end of the early hacker eras (e.g. Mitnick), I don't think anyone SHOULD be extradited over this, in particular if they're not being charged with actually using the exploit. Prosecute them where they live. Should they be prosecuted 193 times over in every state on Earth? What's the nexus? Github? Every server that installed the compromised xz utils?

But you are right they will deport (not extradite) foreigners who are inconvenient to them or when it is politically expedient to do so, if the foreigners are a nuisance, or as part of a political negotiation or prisoner exchange.

The whole "extradition treaties" meme is a misconception. You will only get extradited if you flee to a country where you are a citizen (even dual citizen), or the ability to assert citizenship/nationality there. A fugitive fleeing to a country without an extradition treaty is subject to deportation. Every country on earth reserves the right to deny entry to or deport foreign fugitives. They might choose not to if someone is found to be a refugee, subject to the death penalty in a non-death-penalty state, etc.

EasyMark • 3 months ago

Russia and China would never extradite for this, they would hire the people involved if they aren't already in their employ, and I wouldn't blame them. I'm not even sure if they could be charged with more than misdemeanor charges anyway.

cesarb • 3 months ago

> off topic, but how many actual non-extradition countries are there these days (for the US)?

Several countries do not extradite their own citizens. For citizens of these countries, going back to their own home country would be enough.

EasyMark • 3 months ago

Could be interesting as a class project to investigate other small relatively innocuous but near ubiquitous projects that could be treated the same way and investigate whether something similar could be done or has been done already. Just making a list of them would be useful if nothing else.

kelseydh • 3 months ago

The lesson I take away from this incident is that we probably shouldn't be allowing anonymity for core contributers in critical open source projects. This attack worked and the attacker will likely get away with it free of consequence, because they were anonymous.

ulrikrasmussen • 3 months ago

No thanks.

That's not going to help, and will be fairly easy to circumvent for nation state actors or similar advanced persistent threats who will not have a problem adding an extra step of identity theft to their attack chain, or simply use an agent who can be protected if the backdoor is ever discovered.

On the other hand, the technical hoops required for something like that will likely cause a lot of damage to the whole open source community.

The solution here is learn from this attack and change practices to make a similar one more difficult to pull off:

1. Never allow files in release tar-balls which are not present in the repo.

2. As a consequence, all generated code should be checked in. Build scripts should re-generate all derived code and fail if the checked in code deviates from the generated.

3. No inscrutable data should be accessible by the release build process. This means that tests relying on binary data should be built completely separately from the release binaries.

lenerdenator • 3 months ago

It's easy to steal or craft an identity. Having a person adopt that identity and use it over multiple in-person meetings around the world over an extended period of time is not.

Part of the appeal of cyber operations for intelligence agencies is that there's basically no tradecraft involved. You park some hacker in front of a laptop within your territory (which also happens to have a constitution forbidding the extradition of citizens) and the hacker strikes at targets through obfuscated digital vectors of attack. They never go in public, they never get a photo taken of them, they never get trailed by counterintelligence.

If you start telling people who want to be FLOSS repo maintainers that they'll need to be at a few in-person meetings over a span of two or three years if they want the keys to the project, that hacker has a much harder job, because in-person social engineering is hard. It has to be the same person showing up, time after time, and that person has to be able to talk the language of someone intimately familiar with the technology while being someone they're not.

It's not a cure-all but for supply chain attacks, it makes the operation a lot riskier, resource-intense, and time-consuming.

jpc0 • 3 months ago

Many OSS contributors likely don't have "fly to distant country for mandatory meeting" money.

You are excluding a ton of contributors based on geography and income.

It's not common that I find this line actually decent but check your privilege with this kind of comment.

This is really a small step away from segregation.

xcrunner529 • 3 months ago

There are multiple ways to vet identity. Even just knowing location and such. This person used a VPN. A big red flag.

peteradio • 3 months ago

Stop trying to support such a variety of images too? Maybe?

rsc • 3 months ago

Two problems with this:

1. Many important contributors, especially in security, prefer to be pseudonymous for good reasons. Insisting on identity drives them away.

2. If a spy agency was behind this, as many people have speculated, those can all manufacture "real" identities anyway.

So you'd be excluding helpful people and not excluding the attackers.

hk__2 • 3 months ago

> The lesson I take away from this incident is that we probably shouldn't be allowing anonymity for core contributers in critical open source projects. This attack worked and the attacker will likely get away with it free of consequence, because they were anonymous.

This would be impossible to enforce, and might not be a good idea because it enables other ranges of attacks: if you know the identities of the maintainers of critical open source projects, it’s easier to put pressure on them.

akdev1l • 3 months ago

If this was a state-actor (which it definitely looks like it) then what validation are you going to do? They can probably manufacture legitimate papers for anything.

Driver’s license, SSN, national ID, passport, etc. If the government is in on it then there’s no limits.

The only way would be to require physical presence in a trusted location. (Hopefully in a jurisdiction that doesn’t belong to the attacker…)

xcrunner529 • 3 months ago

Many more chances to mess up.

Jonnax • 3 months ago

Who designates it as critical?

If someone makes a library and other people start using it, are they forced to reveal their identity?

Do the maintainers get paid?

tgv • 3 months ago

It might prevent attacks under different aliases, but a determined organization will be able to create a verified account, if only because nobody, certainly noy github, has the will and means to verify each account themselves.

damsalor • 3 months ago

The attack almost worked because of too few eyes

PHGamer • 3 months ago

if anyone has worked in any development. (closed or open) you know half the time developers are lazy and just approve PRs. Linus Torvalds is like the glimming exception where he will call out shit all day long.

dijit • 3 months ago

Second this.

And in the event someone is pedantic enough to actually care: that person will be considered a pariah that all development stifles due to.

Tensions with the team for nitpicking etc;

FD: I have a situation like this now, I am not the one being picky- one of the developers I hired is. I had to move him out of the team because unfortunately his nitpicky behaviour was not well regarded. (he also comes from eastern europe and has a very matter-of-fact way of giving feedback too which does not aid things).

Shocka1 • 3 months ago

This cracks me up. I had the privilege of working on a team where about 5 or 7 of us out of 10 were extremely picky about code and documentation, all different nationalities. We would sometimes argue about a word in a doc for hours and didn't get a whole lot done. Feelings were hurt a lot, but what we did get accomplished was very high quality. I've also worked on teams where we were all cowboys and would sling code just about as fast as we could type it.

Both can be fun, but as you probably already know there is a balance.

zadwang • 3 months ago

The set of unix utilities have been tested a long time. I just wish the kernel and key utilities keeps fixed and not changed. Unless absolutely necessary. Don’t fix it if it ain’t broken. The software empire seems out of control.

JonChesterfield • 3 months ago

It is all broken. The quick hack to make things kind of work today, compounded across fifty years and a million developers.

There are occasional shining lights of stuff done right but I think it's fair to say they're comprehensively outcompeted.

jcarrano • 3 months ago

What's the advantage of IFUNCS, over putting a function in the library that selects the implementation, either via function pointers or a switch/if? In particular given that they seem to be quite fragile and exploitable too.

I don't have much experience in low-level optimization, but would a modern CPU not be able to predict the path taken by a brach that tests the CPU features.

jcalvinowens • 3 months ago

> either via function pointers or a switch/if?

> but would a modern CPU not be able to predict the path taken by a brach that tests the CPU features.

That's true, but the CPU has finite branch predictor state, and now you've wasted some of it. Indirect calls hurt too, especially in you need retpolines.

This is a great read: https://www.agner.org/optimize/microarchitecture.pdf

The Linux kernel has interfaces for doing the same thing, more explicitly than ifunc:

https://docs.kernel.org/next/staging/static-keys.html

https://lwn.net/Articles/815908/

account42 • 3 months ago

The advantage is that it avoids an extra function pointer indirection when the function you are trying to swap out is already an exported library function. For purely internal functions like in this case the only advantage is that it allows this exploit.

xurukefi • 3 months ago

Since I'm a bit late to the party and feeling somewhat overwhelmed by the multitude of articles floating around, I wonder: Has there been any detailed analysis of the actual injected object file? Thus far, I haven't come across any, which strikes me as rather peculiar given that it's been a few days.

tithe • 3 months ago

Your best bet may be in the chat (from https://www.openwall.com/lists/oss-security/2024/03/30/26 ):

Matrix: #xz-backdoor-reversing:nil.im

IRC: #xz-backdoor-reversing on irc.oftc.net

Discord: https://discord.gg/XqTshWbR5F

lenerdenator • 3 months ago

I agree, I haven't seen anything about decompiling the object file.

If I had a project to develop a backdoor to keep persistent access to whatever machine I wanted, it would make sense that I would have a plug-in executable that I would use for multiple backdoors. That's just decent engineering.

Woodi • 3 months ago

Few notes:

- "auto" tools are installer tools and, after all, do good job. Maybe we can download any project that use it, run it once, parse results and put it in /etc ? - becouse it's already there. Then next step: build just-detection tools. Side note: just straight 'make all' will detect it too :) And it show additional next problem that auto's already solved - setting up all that inc/lib paths...

- m4 should be replaced and not by monstrosity like cmake

- GNU's came after MIT's and BSD's, evolved from them. But corporations adapted and assimilated. We need next evolution step: something that forces corporations to spend money on what they use, fund devs or participate in development. OSlicensecertification.orgs should help with that instead of helping corps to be predators becouse some outdated bullshit talks

- yea, auto* tools are way overcomplicated

- finally do that TTY 2.0 !

djmips • 3 months ago

So people are talking about the review process but what if the maintainer and the reviewers are all complicit?

k3vinw • 3 months ago

Ha. This backdoor belongs in the same museum as automake!

say_it_as_it_is • 3 months ago

Imagine paying for a security scanning service such as Snyk and finding that it never scanned source code for injection attacks. How many millions of dollars went down the drain?

LunicLynx • 3 months ago

Imagine this inside GitHub copilot, just because it has seen it enough times.

sureglymop • 3 months ago

This is why, in my opinion, if AI will bring exponentially more efficiency, it will also bring an exponential increase in security issues.

The most common vulnerabilities are due to how input is handled (or not). XSS vulns, SQL injections, other injections, etc.

Now, the AI agent itself also produces untrusted input. But it does so, in a way, as an extension of the user using it who is actually a trusted entity. Because of this, it is solely up to the user to validate everything coming out of the AI but most users will not do that thoroughly.

What you mention in your comment is a good example of that. If the AI reproduces malicious code e.g. inspired from this campaign, it is essentially an injection attack where the user misses it and didn't properly validate the untrusted input.

benlivengood • 3 months ago

The only upside of finding this attack (aside from preventing it from being rolled out more widely) is that it gives a public example of a complex APT supply-chain attack. Rest assured there are more out there, and the patterns used will probably be repeated elsewhere and easier to spot.

Obfuscated autoconf changes, multi-stage deployment, binary blobs (hello lib/firmware and friends, various other encoders/decoders, boot logos, etc), repo ownership changes, new-ish prolific committers in understaffed dependency libraries, magic values introduced without explanation.

MarkSweep • 3 months ago

Add to that list suppression of warnings from valgrind and address sanitizer without any justification. And no tracking issue to follow up on fixing the problem so the suppression can be removed.

Committing binary files to source control rather than including build commands to generate the files is a bit of a red flag.

saagarjha • 3 months ago

They're test cases for a compression library. Seems pretty reasonable.

hyperhopper • 3 months ago

Not at all. This would not pass a good code review. The test was for good stream, bad stream, good stream. The two good streams were only a few bites, why was the bad stream so large?

A good reviewer should have, even for a binary test case, asked the submitter to simplify it to the smallest or most basic binary required for the functionality.

Dunedan • 3 months ago

Adding a script to regenerate good and bad binary files might have worked for this use case pretty well too.

+1

saagarjha • 3 months ago

MuffinFlavored • 3 months ago

> The changes to build-to-host.m4 weren't in the source repo, so there was no commit. > > The attacker had permissions to create GitHub releases, so they simply added it to the GitHub release tarball.

What are some simple tweaks the "Debians" of the world can do to mitigate this kind of stuff?

Not trust "hand-curated-by-possibly-malicious-maintainers" GitHub release tarballs, only trust git commits?

Whitelist functions that are allowed to do IFUNC/ELF hooking to core OpenSSH functions?

ilc • 3 months ago

Read IOCCC entries.

Now realize, those are people having FUN. What is your chance of catching nation state level maliciousness in a codebase? Pretty low.

heeen2 • 3 months ago

remove test blobs before building

JonChesterfield • 3 months ago

What's a test binary? Bunch of bytes on disk. What's a source file? Bunch of bytes on disk.

+1

varjag • 3 months ago

+1

heeen2 • 3 months ago

cassianoleal • 3 months ago

> a complex APT supply-chain attack

What do you mean by APT? If you mean Debian's package manager, that's not what this attack was. This was done upstream and affected non-apt distros just as much.

It's true that upstream is part of apt's supply chain but focussing on apt is misleading.

edit: why the downvotes? I get from the responses that I was wrong but given how the exploit was initially found in a Debian system and a lot of people very quickly jumped on the “Debian patched a thing and broke security” bandwagon, I don’t think it was much of a leap to wonder if that’s what was meant.

Acronyms and initialisms are not the best way to convey specific information.

haimez • 3 months ago

It stands for “Advanced Persistent Threat” - https://en.m.wikipedia.org/wiki/Advanced_persistent_threat

danieldk • 3 months ago

I think Advanced Persistent Threat.

mttpgn • 3 months ago

Not apt the package manager-- it's an acronym for Advanced Persistent Threat

supriyo-biswas • 3 months ago

> What do you mean by APT

Advanced, persistent threat.

nequo • 3 months ago

Others have answered your question about APT but FWIW I don't understand the downvotes. You were respectful and simply sought to clear up a misunderstanding.

macintux • 3 months ago

I was heavily downvoted for supplying a link to a previous discussion on the same topic recently. I wouldn’t worry much about it.

fsflover • 3 months ago

> hello lib/firmware

How could it play any role here? It doesn't bring any dependencies, does it?

cletus • 3 months ago

My big takeaway is that modern build systems are fundamentally broken. Autotools, even Makefiles, are (or can be) incredibly obtuse, even unreadable. If I read this attack correctly, it relied on a payload in a test file (to obfuscate it). It shouldn't be possible to include test resources in a production build.

As an aside, C/C++'s header system with conditional inclusion is also fundamentally broken. Even templates are just text substitution with a thin veneer of typing.

I think about Google's build system, which is very much designed to avoid this kind of thing. The internal build tool is Blaze (Bazel is the open-source cousin). Many years ago, you could essentially write scripts in your BUILD files (called genrules) that were hugely problematic. There was no way to guarantee the output so they had to be constantly rebuilt. There was a long project to eliminate this kind of edge case.

Blaze (and Bazel) are built around declaring hermetic units to be built with explicit dependencies only. Nothing shipped to production is built locally. It's all built by the build servers (a system called Forge). These outputs are packaged into Midas packages ("MPMs"). You could absolutely reconstruct the source used to build a particular library, binary or package as well as the build toolchain and version used. And any build is completely deterministic and verifiable.

C/C++, Make, CMake, autotools, autoconf and all that tooling so common in Linux and its core dependencies absolutely needs to go.

ufmace • 3 months ago

You're not wrong, but IMO that isn't the real problem. The real problem is the combination of highly popular core open-source utilities with near-zero maintenance resources.

This was all possible because XZ, despite being effectively everywhere, has one actual part-time maintainer who had other things going on in his life. That also means that there's nowhere near enough resources to redo the build system to some more secure alternative. If they had, say, 10 enthusiastic and skilled volunteer contributors with plenty of free time, they could do that, but then a new person appearing out of nowhere with a few helpful commits would never have a chance at being made an official maintainer or sneaking sketchy tools and code past the other maintainers.

Not that I'm blaming XZ or the real maintainer. Clearly whoever was behind this was looking for the weakest link, and if it wasn't XZ, it would have been something else. The real problem is the culture.

So I guess what this really means is someone at a big corporation making Linux distros should audit their full dependency tree. Any tool in that tree that isn't actively maintained, say at least 3 long-time active contributors, they should take over one way or another - whether that's hiring the current maintainer as a full-time remote employee, offering to buy out the rights, or forking and running under their own team.

I'm not necessarily super thrilled with that, but I guess it's the world we live in now.

pch00 • 3 months ago

> So I guess what this really means is someone at a big corporation making Linux distros should audit their full dependency tree.

This is it precisely. When you're paying Redhat for an "enterprise" Linux then that guarantee should extend down their entire software stack. Just getting the odd backported patch and so-so email support no longer cuts it.

peteradio • 3 months ago

Why do utilities need constant fucking updates? Why do they need maintenance? New features are entirely optional and if what you want to do isn't supported aka xz java, uhh, get fucked or do it yourself.

Dunedan • 3 months ago

I believe software of a certain complexity can't be finished and will always need updates, even if you exclude new features, as the whole eco system of hardware and software is constantly evolving around it. Here are a few reasons why:

- bug fixes (as every non-trivial software has bugs)

- improved security (as the kernel adds security functionality (capability dropping, sandboxing, ...), software can utilize this functionality to reduce its attack surface)

- improvements of existing features (e.g. utilizing new CPU extensions or new algorithms for improved performance)

bhawks • 3 months ago

The overworked solo maintainer is a problem.

However I've seen way too many projects where individuals in the 'team' are able to carve out impenetrable fiefdoms where they can operate with wide latitude.

I could see a Jia Tan being able to pull this off in a team context as well - bigger teams might even be weaker. (Everyone welcome Jia - he's going to write test cases and optimize our build process so everyone can focus on $LAUNCH_DAY)

sanderjd • 3 months ago

> modern build systems > Autotools, even Makefiles > C/C++'s header system with conditional inclusion

Wouldn't it be more accurate to say something like "older build systems"? I don't think any of the things you listed are "modern". Which isn't a criticism of their legacy! They have been very useful for a long time, and that's to be applauded. But they have huge problems, which is a big part of why newer systems have been created.

FWIW, I have been using pants[0] (v2) for a little under a year. We chose it after also evaluating it and bazel (but not nix, for better or worse). I think it's really really great! Also painful in some ways (as is inevitably the case with any software). And of course it's nearly impossible to entirely stomp out "genrules" use cases. But it's much easier to get much closer to true hermeticity, and I'm a big fan of that.

0: https://www.pantsbuild.org/

justinpombrio • 3 months ago

To be clear about the history: Make is from 1976, and Autotools is from 1991. Had me a chuckle about these being called modern, they're (some of?) the earliest build systems.

https://en.wikipedia.org/wiki/Make_(software)

https://en.wikipedia.org/wiki/Autoconf

__MatrixMan__ • 3 months ago

If I were starting something from scratch I'd do Bazel for internal deps, Nix for external ones. If that's a tall order, so be it: we train people. Investments in determinism usually end up paying off.

threePointFive • 3 months ago

I'm not familiar with Bazel, but Nix in it's current form wouldn't have solved this attack. First of all, the standard mkDerivation function calls the same configure; make; make install process that made this attack possible. Nixpkgs regularly pulls in external resources (fetchUrl and friends) that are equally vulnerable to a poisoned release tarball. Checkout the comment on the current xz entry in nixpkgs https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/comp...

diamondburned • 3 months ago

If you're ever using Nix like you would Bazel, you likely would not want your derivation to be built via another Makefile. Indeed, that defeats the whole point of using Nix to fix this in the first place. As it is, mkDerivation + fetchurl is mostly used to build existing non-Nix software.

Nix at the very least provides first-class(-ish) build support for modern languages like Rust, Go and Python, but I don't think anyone has written an actual Nix builder for C/C++. A combo of Bazel + Nix is fairly common, though.

IMO, it's hard to say if Nix would "solve this attack", since we've only seen it being truly used on more modern things where the build complexity of the actual piece of code is not much more than the same few commands.

As for pulling down a poisoned tarball, I think the discussion here is rather about upstream projects using Nix to guarantee reproducible builds rather than Nix trying to fend off attacks downstream in Nixpkgs. In modern Nix (with Flakes), this would look something like being able to clone and run `nix build .#` and end up with the same output every time.

wocram • 3 months ago

This sounds nice on paper, but there's a lot of missing glue to be written between nix and bazel.

Ideally nix would move towards less imperative/genrule style package declarations and ultimately become more usable for internal builds.

peteradio • 3 months ago

Google gets to decide what platform they want to support for their backend. I wonder what their build code would look like if they tried to support the top-10 common OS from the last 30 years.

ants_everywhere • 3 months ago

> The internal build tool is Blaze (Bazel is the open-source cousin).

I was under the impression that Blaze is Google's Bazel deployment, i.e. that they're the same code. Is that not correct?

xen0 • 3 months ago

Mostly correct; there's some code that's Blaze only and some that's Bazel only, but the core is the same.

dhx • 3 months ago

Ugly and broken build systems are most of the reason why Gentoo's 'sandbox' feature and equivalent for other distributions exists.[1] These sandboxing features have mostly been used in the past to prevent an ugly shell script in the build process for libuselesscruft from doing something similar to "rm -rf" on the build system. More recently these sandboxing features are helpful in encouraging reproducible builds by alerting maintainers to build processes which try and obtain non-deterministic information from the operating system environment such as the operating system name and version, host name and current time stamp.

There are a few gaps I think xz-utils highlights:

- Repositories containing a mixture of source code, build scripts, test frameworks, static resources and documentation generation scripts are all considered to be a single security domain with no isolation between them. If you look to Gentoo's src_prepare function as an example, we perhaps should instead split this into build_src_prepare, doc_src_prepare, test_src_prepare and install_src_prepare instead. If source code is being built and binaries generated, the sandboxed build directory should perhaps not contain test files and documentation generation scripts. If the package is being installed with "make install" (or equivalent) then static resources (such as a timezone database) should be available to copy to /usr/, but build scripts used to generate the binaries or documentation do not need to be available to "make install" (or equivalent).

- Sandboxing used for package building hasn't traditionally been implemented for security reasons in the past. Sandboxing should perhaps be hardened further with modern and more complex approaches such as seccomp to further protect build systems from the likes of libbackdoored that are targeted towards package maintainers/Linux distribution build systems. As a further example to seccomp, Gentoo's 'sandbox' has Linux namespace isolation built in, but not yet enabled whilst it is tested.

- There is a lack of automated package management tools (including dashboards / automatic bug creation) for comparing source trees in Git to released tarballs and making discrepancies more visible and easier for maintainers to review.

- There is a lack of automatic package management tools (including dashboards / automatic bug creation) for detecting binary and high entropy files in source trees and confirming they are validly formatted (e.g. invalid tag in a TLV file format) and confirming that test and example files contain nothing-up-my-sleeve content.

There has already been an accelerated move in recent years towards modern and safer build systems (such as meson and cargo) as 80's/90's C libraries get ripped out and replaced with modern Rust libraries, or other safer options. This is a lot of work that will take many years though, and many old 80's/90's C libraries and build systems will be needed for many more years to come. And for this reason, sandboxing/isolation/safety of old build systems seemingly needs to be improved as a priority, noting that old build systems will take years or decades to replace.

[1] https://devmanual.gentoo.org/general-concepts/sandbox/index....

[2] https://devmanual.gentoo.org/ebuild-writing/functions/src_pr...

anthk • 3 months ago

Makefiles are not a Linux thing, but Unix.

And if you confuse C with C++ and Make with CMake/Autotools/Autoconf, you have a lot to learn. Look: simple and portable makefiles work:

git://bitreich.org/english_knight

dankwizard • 3 months ago

[flagged]

pizzafeelsright • 3 months ago

There's a lot of mind reading there.

frankohn • 3 months ago

[flagged]

afro88 • 3 months ago

Did you get an LLM to write this?

frankohn • 3 months ago

I see that my post was down voted because it was written by an LLM in a verbose and pretentious style. That's okay but the idea I expressed are my own, the LLM did only the wording.

I regret also that the discussion derailed to discuss about LLM generated contents. I will be more careful the next time.

menomatter • 3 months ago

curious how were you able to tell? as a non-native speaker, I too use LLM to improve my wordings/clarity but modify it to maintain my rough style

andersa • 3 months ago

It's the writing style. Humans simply don't write like that, unless they are writing a research paper.

laszlokorte • 3 months ago

Even as a non-native speaker it really stood out: "This situation is exacerbated by the fact that", "In contrast", "Given the outlined".

I would call this style something like fluid enumeration of facts, typical for GPT generated texts.

NorwegianDude • 3 months ago

It's too verbose and vauge, the information density is terrible. The only people I've seen who writes like that are those who need to reach a certain number of words while saying as little as possible.

PokemonNoGo • 3 months ago

A hacker news comment isn't a research paper. I scrolled past it after starting because of the AI tells. Organic communication will be hard soon.

zeehio • 3 months ago

When I listen to speech synthesis systems from the 80s, like the one from Stephen Hawking, I feel they are easy to understand but "robotic", they sound all the same in a "robotic" sense.

LLMs seem to suffer from that as well. The length of the answers has lower variance than the length from humans. The number of paragraphs, the length of each paragraph, the structure of the paragraphs... All that is kind of predictable, so when I read a text that does not have that structure I usually don't think it's written by an LLM. However if the text has that structure then I can suspect it may come from an LLM. I believe in some months we will all be used to the LLM style and we will get much better to identify it... And then chatbots will probably be changed to sound less robotic...

jiggawatts • 3 months ago

It's actually very easy to convince an LLM to output perfectly natural text, it's just that their default mode is that verbose corporate boilerplate speak.

You can request responses in some specific style by the name of a popular author, or just provide a bunch of examples for it to copy.

I like responses in the style of an "Encyclopaedia Dramatica article", which tends to be both hilarious and also a bit less guarded and hence more honest.

dzdt • 3 months ago

For me "eschewing unnecessary complexity" stands out as a phrase that I've heard in the context of joking (as it is itself an unnecessarily complex wording) but which didn't fit well here as it didn't seem any joking was intended.

whimsicalism • 3 months ago

it’s poorly written - overly discursive. stuff like “the fact that” which are filler phrases

frankohn • 3 months ago

Yes, I told ChatGPT4 what I wanted to say and I left him the wording. I don't really like the style he adopted but, in some way, it is better than my approximative, rough, English.

ARandomerDude • 3 months ago

I like your words more. Rough English is fine and adds a human touch. ChatGPT just makes me wonder if some shill is bot posting on HN.

Besides, as a native speaker, ChatGPT reads to me like a 13 year old desperately trying to sound like an adult.

dboreham • 3 months ago

The "Turing13 Test".

afro88 • 3 months ago

In some ways it is better, but mostly it comes across like a bot. Which makes me not trust the words. And in that way it is a net loss.

Your english is good enough. Keep writing the comments yourself!

drewg123 • 3 months ago

Or cmake, or any other modern build system aside from autotools.

Heck, even just a Makefile and code with a few ifdefs is enough for fairly simple code.

A lot of the complexity of autoconf / libtool is totally unneeded. I was very glad when I ripped out libtool from a project at an older employer in the early 2000s. Even when building libraries on Windows, Mac OSX, AIX, Linux, FreeBSD & Solaris, libtool was not worth the complexity.

anthk • 3 months ago

git://bitreich.org/english_knight

This. Simple and usable.