Back

Hermit is a hermetic and reproducible sandbox for running programs

146 points15 hoursgithub.com
an-unknown10 hours ago

It seems like this tool does not create a fully deterministic nor reproducible environment. Hermit seems to only intercept and modify syscalls, but this is not the only source of non-determinism and randomness. For example, the layout of environment variables in memory also causes non-determinism, caused by the content of the environment variables as well as their order in memory. CPU instructions like RDTSC, RDRAND, RDSEED and similar also introduce randomness. It seems like Hermit ignores some these sources of randomness, but I can't test it, because it doesn't build on a current Arch system with the Rust toolchain from the repo.

At least it seems Hermit masks RDRAND and RDSEED via CPUID, but not every program is written to support ancient architectures which didn't support these instructions and therefore not every program tests availability via CPUID.

In addition, even if all of this was deterministic, CPU flags set by various instructions with "undefined" flags according to the CPU manual can slightly differ between different microarchitectures. A "normal" program should not be influenced by this, but it is still a source of non-reproducibility. This might be relevant for certain rare compiler bugs.

eatonphil14 hours ago

It's a really interesting project but it hasn't worked for non-trivial programs for me. I tried to use it on my Raft implementation. Hermit crashed with obscure (to me) error messages.

Others have commented on here before, it admittedly doesn't seem to be actively maintained.

> Just to let you know we’re not actively working on Hermit in the team

https://github.com/facebookexperimental/hermit/issues/34#iss...

flurie11 hours ago

That's been my experience as well. It lacks support for certain clone(2) flags like CLONE_VFORK[1], which limits the set of non-trivial programs it can run, and since running non-trivial programs is most of the point, I haven't revisited it since it was first announced.

[1] https://github.com/facebookexperimental/hermit/blob/bd3153b4...

yjftsjthsd-h15 hours ago

I'm curious what the performance impact is like; I assume there has to be some slow down because of the interception of system calls?

hiatus14 hours ago

It uses Reverie under the hood, which itself relies on ptrace (at least for the current, sole implementation).

> Since ptrace adds significant overhead when the guest has a syscall-heavy workload, Reverie will add similarly-significant overhead. The slowdown depends on how many syscalls are being performed and are intercepted by the tool.

> The primary way you can improve performance with the current implementation is to implement the subscriptions callback, specifying a minimal set of syscalls that are actually required by your tool.

https://github.com/facebookexperimental/reverie

mananaysiempre13 hours ago

Tangent: running old OSes (with no virtio support) under QEMU on Linux has the peculiar property that I/O-heavy portions such as installation can run faster under TCG (JIT) than under KVM (hardware virtualization), presumably due to all the trapping. It’s a toss-up when those also include CPU-heavy parts (decompression).

TillE14 hours ago

> all thread executions are serialized so that there is effectively only one CPU

This definitely isn't intended for general-purpose sandboxing. It's an interesting tool for analysis and debugging.

yjftsjthsd-h14 hours ago

Ah, I had missed that it effectively forces you to one CPU. Although I already would not use it for anything but testing account of it intentionally on unrandomizing things - I suspect, for instance, that it's unsafe to run any sort of cryptography that would create keys under this.

nicoty10 hours ago

It sounds similar to that antithesis testing service that was on front page recently as well. That also claimed to be able to run programs deterministically as well. I wonder if the two projects are related at all.

wwilson10 hours ago

Our projects have some features in common, but are pretty much unrelated. Hermit is a deterministic userland, whereas we enforce reproducibility at the hypervisor level and with the right device drivers can support any OS.

The most interesting part of Antithesis (to me) isn’t even the perfect reproducibility, but the autonomous state space exploration that finds the bugs in the first place. AFAIK Hermit doesn’t do that, though you might be able to get somewhere by running your program plus a conventional fuzzer under Hermit together?

Disclosure: I am one of the co-founders of Antithesis.

tony-allan7 hours ago

"Hermit is no longer under active development within Meta and is in maintenance mode. There is a long tail of unsupported system calls that may cause your program to fail while running under Hermit. Unfortunately, we (the team behind this project) don't have the resources to triage issues, fix major bugs, or add features at this point in time."

debacle13 hours ago

What's the difference between this and a container?

quadrature12 hours ago

Hermit executes your program deterministically. This means that it accounts for sources of non-determinism like thread scheduling. The idea is that you will be able to investigate executions in a fully reproducible manner.