Somewhat off topic, but CERN has a fantastic science museum attached to it that I had the privilege of visiting last summer. There is of course Tim Berners-Lee's NeXT workstation, but also so much more. It is also the only science museum I've visited that addresses topics in cyberinfrastructure such as serving out massive amounts of data. (I personally get interested when I see old StorageTek tapes lol.) The more traditional science displays are also great. Check it out if you are ever in the Geneva area. It is an easy bus ride to get out there.
This is fascinating. How are they managing or even taking backup for this gigantic storage?
What is Rucio?
Rucio enables centralized management of large volumes of data backed by many heterogeneous storage backends.
Data is physically distributed over a large number of storage servers, potentially each relying on different storage technologies (SSD/Disk/Tape/Object storage) and, frequently, managed by different teams of system administrators.
Rucio builds on top of this heterogeneous infrastructure and provides an interface which allows users to interact with the storage backends in a unified way. The smallest operational unit in Rucio is a file. Rucio enables users to upload, download, and declaratively manage groups of such files.
They use a distributed data management tool called RUCIO: https://rucio.cern.ch to distribute data on the grid.
For experiment data, there is a layer on top of all of this that distributes datasets across the computing grid. That system has a way to handle replicate at the dataset level.
Tape and off-site replicas at globally distributed data centres for science. Of the 1EB a huge amount of that is probably in automated recall and replication with "users" running staged processing of the data at different sites ultimately with data being reduced to "manageable" GB-TB level for scientists to do science
Yup, lots of tape for stuff in cold storage, and then some subset of that on disk spread out over several sites.
It's kinda interesting to watch anything by Alberto Pace, the head of storage at CERN to get an understanding of the challenges and constraints: https://www.youtube.com/watch?v=ym2am-FumXQ
I was basically on the helpdesk for the system for a few years so had to spend a fair amount of time helping people replicate data from one place to another, or from tape onto disk.
IIRC I had issues with inotify when I was editing files on a remote machine via SSHFS, when these files were being used inside a Docker container. inotify inside the container did not trigger the notifications, whereas it did, when editing a file with an editor directly on that host.
I think this was related to FUSE, that Docker just didn't get notified.
The inotify signals might work if you add -v a whole directory
does modern fuse still context switch too much or does it now use io_uring or similar?
FUSE over io_uring is still WIP: https://lwn.net/Articles/988186/
FUSE Passthrough landed in kernel 6.9, which also reduces context switching in some cases: https://www.phoronix.com/news/Linux-6.9-FUSE-Passthrough . The benchmarks in this article are pretty damning for regular FUSE.
FUSE Passthrough is only useful for filesystems that wrap an existing filesystem, such as union mounts. Otherwise, you don't have an open file to hand over.
yeah but still not great for metadata operations, no?
i remember it was really not great for large sets of search paths because it defeated the kernel's built-in metadata caches with excessive context switching?
Last I read about FUSE, adding a 128KB read-ahead buffer drastically reduced context switching.
1EB with only 30k users, thats a wild TB-per-user ratio. My frame of reference; the largest storage platform I've ever worked on was a combined ~60PB (give or take) and that had hundreds of millions of users.
Most humans don't handle sensor and simulation data for a living, though. CERN just so happens to employ thousands who do that for a living.
When experiments are running the sensors generate about 1PB of data per second. They have to do multiple (I think four?) layers of filtering, including hardware level to get to actual manageable numbers.
It depends on which experiment. We call it trigger system. And it varies according to each experiment requirements and physics of interest. For example LHCb is doing now full trigger system on a software side (No hardware FPGA triggering) and mainly utilizing GPUs for that. That would be hare to achieve with the harsher conditions and requirements of CMS and ATLAS.
But yes at LHCb we discard about 97% of the data generated during collisions.
Disclaimer: I work on LHCb trigger system
My frame of reference; the largest storage platform I've ever worked on was a combined ~tens of EB (give or take) and that had over a billion users.
That's the scale of the universe, compared to data generated by humans
Re: https://news.ycombinator.com/item?id=41716523,
> over the years what discoveries have been made at CERN that have had practical social and economic benefits to humanity as a whole?
Some responders to the question believe I was criticizing a supposed wastefulness of the research. Not knowing the benefits of the discoveries in high energy physics, ie the stuff the accelerators are actually built to discover, doesn't mean I was criticizing it.
Responses referenced the contributions the development of the infrastructure supporting the basic research itself have made, which is fine, but not the benefits of high energy physics discoveries.
So to rephrase the question - What are the practical social and economic benefits to society that the discoveries in high-energy particle physics at institutions like CERN have made over the years?
This is not just in relation to CERN, but world wide, such those experiments which create pools of water deep underground to study cosmic rays etc.
You're probably getting replies like that because it's a bit of an odd question. Academic research isn't really done to achieve a particular purpose or goal. The piratical benefit literally is academic.
It's also one of the first questions from people that very much are criticizing, so even if it was an sincere question it will be lumped together. Not recognizing/addressing this when posing the question does nothing to prevent it from the lumping.
The piratical benefit may be particle cannons? Yarrgh!
Most of the magic is in https://eos-web.web.cern.ch/eos-web/ apparently
[flagged]
.ch as in Switzerland. Does .cn (China) prohibit open source?
"Confoederatio Helvetica" if anyone wonders why Switzerland uses the CH TLD.
The things you can build when everyone is a rockstar :D
I'm convinced CERN could greatly benefit from "middle out".
i mean, they also have one of the largest ceph deployments. anything is scalable with no budget.
slide 22 states that the cost is 1 CHF/TB/month (on 10+2 erasure coded disks), though it would be interesting to do a breakdown of costs (development, hardware, maintenance, datacenter, servicing, management, etc..)
1 CHF/TB/month is a bit expensive for storage at that scale, so it would definitely be interesting to see what they're spending the money on and what they are (and aren't) counting in that price.
Tape backup, accessibility, networking, availability... At 1CHF/TB that's a lot better than my local university still charging >100x that for such services internally
Economies of scale in storage are significant. Also, I don't know why you put up with your university charging 100x that when you can store things on AWS for $5-10/TB/month (or less). That comes with all the guarantees (or more) of durability and availability you get from the university.
No budget often tags along with no accountability
They probably consume Panasas, IBM, DDN, and BeeGFS gear and licensing too.
Nop.
Most internal data is spread between Ceph and home-made distributed storage system named EOS (https://indico.cern.ch/event/138478/contributions/149912/att...) running over commodity hardware.
The only commerical-backed storage system is the long term storage tape system. Still it has an home-made overlay API over it to interface with the rest of the systems.
Good god no. Nowhere near anything so crass. CEPH and EOS all the way
People here keep claiming “Anything is possible with unlimited budget”.
Cerns budget is 1.4 billion Euro, 50 million Euro for all IT infrastructure.
https://cds.cern.ch/record/2888205/files/English.pdf#page18
It’s not the money, it’s the people. Update: Added source.
That kind of place can draw a certain kind of employee. This finding is hard to transfer to commercial projects. Sure employees will always claim to be really motivated, especially in the marketing material, but are they we-are-nerds-working-on-the-bleeding-edge-of-human-knowledge-motivated?
Probably not, but there is surely some manager out there who made themselves believe they can motivate their employees to show the same devotion for the self-made hardships of some mostely pointless SaaS product. If you want to grab that kind of spirit, what you do needs to fundamentally make sense beyond just making somebody money.
That's exactly how we were able to go to the moon in 55 years ago. And why it's complicated today. It was of course lot of money. But it was mostly a lot of highly skilled, motivated devoted people doing for an ultimate common goal. Money would not have been sufficient by itself.
Since then, a LOT of the smart motivated people have been lured into either banking or adtech. The pay is good and the technical problems can be pretty interesting but the end result lacks that "wow factor".
I also read that nowadays we are more risk averse and many people/manager/companies are mostly administrators of status quo. Pair that with lack of vision and public engagement for current challenges to humanity.
In other words, if you permit, pure capitalism isn't a sufficiently good motive to get something significant done. But of course most of us don't work towards an ultimate common goal – and neither did most people in those times. One wonders if there is enough meaning left these days to go 'round and ensure most of us feel passionate about the stuff we (have to) do. Maybe we really need a god or war or common enemy to unite all strands into a strong rope.
[dead]
Also, CERN does not have a profit motive.
How much good work have the people reading this thread had to trash because it didn't align with Q3 OKRs? How much time and energy did they put into garbage solutions because they had to hit a KPI by the last day of June?
> Also, CERN does not have a profit motive.
This is a great point. We work with CERN on a project, and we're all volunteers, but we work on something we need, and contribute back to it wholeheartedly.
At the end of the day, CERN wants to give away project governance to someone else in the group, because they don't want to be the BDFL of anything they help creating. It allows them to pursue new and shiny things and build them.
There's an air consisting of "it's done when it's done", and "we need this, so we should build this without watering it down", so projects move at a steady pace, but the code and product is always as high quality as possible.
CERN buddy of mine suggested that exposing a colony of physicists to elevated ambient levels of helium would trigger excessive infrastructure building behavior.
That’s a great observation, and I think generally correct, but there are private companies where that sort of motivation exists, for basically the same reason
Then they get bought by some megacorp which kills the motivation.
Or they are the megacorp that killed it (Google, Xerox?)
[dead]
People get this very wrong, CERN is extremely underfunded. People really don't understand how expensive running the accelerators is and most of the budget goes to that. Last years they even had to run for less months than expected because they couldn't afford the rising energy prices.
The buildings are old, the offices suck, you don't even get free coffee and they pay less than the norm in Switzerland. But they have some of the top minds working on very specific and interesting systems, dealing with problems you'd never encounter anywhere else.
I would like to yap more about the current management and their push/reliance on enterprise solutions but to cut it short I really do think cern is a net contributor to open science and they deserve more funding.
Also, the in-kind contributions from hundreds of institutes around the world. Much can, and has, been said about physicist code, but CERN is the center of a massive community of “pre-dropout” geniuses. I can’t count the number of former students that later joined Google and the likes. Many are frequenting HN.
CERN was a good example of how much can be done with how little when you have the right people.
For a long time, the entire Linux distribution (Scientific Linux) used for ~15K collaborators, the infra and the grid computing was managed by a team of around 4-5 people.
The teams managing the network access (LanDB), the distributed computing system, the scientific framework (ROOT) and the storage are also small, dedicated skilled teams.
And the result speaks for itself.
Unfortunately, most of that went to shit quite recently when they replaced the previous head of IT by a Microsoft fanboy/girl coming from outside of the scientific environment. The first thing he/she did was to force Microsoft bloatware everywhere to replace existing working OSS solutions.
I think the majority of the Scientific Linux software came from Fedora/Red Hat and the Linux kernel. Planning and managing the CERN computing infrastructure is a lot of work, then updating and releasing a famous distro on top of that was impressive.
> Unfortunately, most of that went to shit quite recently when they replaced the previous head of IT by a Microsoft fanboy(girl?) coming from outside of the scientific environment.
Painful to read so I did a short check. From a news post I don’t want to link here, but easily found searching “CERN, the famous scientific lab where the web was born, tells us why it's ditching Microsoft and helping others do the same”, direction taken in 2019 seemed quite the opposite. I am not sure how current head of IT at CERN, Enrica Porcari, fits in to the story. Insider info will be appreciated.
> direction taken in 2019 seemed quite the opposite
The head of IT changed in 2021 if it answers your question.
Considering how massively in bed with the U.S. government and other governments that Microsoft is, and said government has been known for keeping tabs even on allies(1), I'm sure that certain parties have a keen interest in keeping up with what's going on at CERN that's not just scientific curiosity. Strangely these Microsoft evangelists manage to pop up in organizations all the time to reverse any open source initiatives. Could just be a coincidence though.
1. https://www.washingtonpost.com/graphics/2020/world/national-...
Joachim Mnich, Director for Research and Computing and her boss [4], holds the position also since January 2021 [1]. Mike Lamont, Director for Accelerators and Technology, also got the job at the same time [2]. Finally Fabiola Gianotti, Director-General, in 2019 extended her tenure for a second term “to start on 1st January 2021” [3].
So in 2019 the initiative to remove Microsoft began. With renewal and promotions taking in to effect it stopped. Interesting. Feeling a strong Microsoft US vs Munich DE vibe. With a twitch of IT.
1. https://home.cern/about/who-we-are/our-people/biographies/jo...
2. https://home.cern/about/who-we-are/our-people/biographies/mi...
3. https://home.cern/about/who-we-are/our-people/biographies/fa...
4. https://german-dac.web.cern.ch/sites/default/files/2022.01%2...
Scientific Linux was originally a product of Fermilab, with contributions from CERN.
> Cerns budget is 1.4 billion Euro
Kind of weird that a company like Uber has a valuation of $150 billion Euro.
Most of the people who make CERN work aren't working for CERN. The IT department is under CERN, but there are many thousands of "users" who don't get payed by CERN at all. Quite a lot of the fabrication and most of the physics analysis is done by national labs and universities around the world.
CERN budget on experiment level is being paid mostly by contributions from the institutions that is part of this experiment. I am talking about operation, R&D and this would also include personnel contributions to different aspect. There is also service work that each one of the users must do beside doing physics. I am for example work on software development stack beside my current physics analysis. Some of my colleagues working on hardware.
Then there are country level contributions that pays for CERN infrastructure and maintenance (and inter experiment stuff) and direct employees salaries.
The important point here is that (I believe) the 1.4 billion above doesn't account for all the work done directly by institutes. Institutes pay CERN, but they also channel government grants to fund a huge amount of work directly.
Most of the people I know who "worked at" CERN never got a pay check that said CERN on it.
Apples to oranges. Budget is per year, valuation is total.
A better comparison would be Uber's revenue of $37 billion in 2023.
I don't see why it's Apples to oranges. Uber could pay for 150 CERN-years.
Ok, maybe it's 75 CERN years or maybe even 10. The point still stands.
PS: Sorry if you got tired, but I'm tired of people explaining what valuation isn't when we're just talking orders of magnitude.
it's only useful for getting loans that you'll pay back with a bigger loan. it's how rich people are always cash-poor but wealthy and live wealthy lifestyles.
How many people ordering a meal (often out of laziness) per day vs thinking and searching the mysteries of universe? Economically it makes sense that Uber generates a lot more of cash.
I think you misinterpreted that there shall be a correlation between _valuation_ and _earnings_. Ubers _first_ ever positive year was 2013, after 15 years in business [1] . Uber may be generating cash, but it's also loosing (lost) cash a lot faster than it was generating it. By taking 2013 as reference (~2 billion), it needs another 5 of those years just to recover from its losses in 2012 (9 billion). I understand the economics behind it, but its valuation is way out of reality.
[1] https://www.theverge.com/2024/2/8/24065999/uber-earnings-pro...
That being said, though, members contribute more than money. A lot of the work done at CERN is not done on CERN budgets, but on the budgets of member institutes.
Good hiring managers can find the hidden gems. These are typically people who don't have the resume to join FAANG immediately, due to lacking the pedigree, but who have lots of potential. Also these same people typically don't last long because they do eventually move on.
Also it helps that Europe is so behind in tech that if you want to do some cutting edge tech you are almost forced to join a public institution because private ones are not doing anything exciting.
> Also it helps that Europe is so behind in tech that if you want to do some cutting edge tech you are almost forced to join a public institution because private ones are not doing anything exciting.
This is genuinely cringeworthy. Do you think that companies in the EU all use COBOL on mainframes and nothing newer than 10 years old is allowed? Airlines and banks here(!) are rewriting their apps to be Kubernetes native... And have been doing so for years. Amadeus (top 2 airline booking software in the world) were a top Kubernetes contributor already a decade ago.
The tech problems being solved at Criteo, Revolut, Thales, BackMarket, Airbus, Amadeus (to name a few fun ones off the top of my head) are no less challenging and bleeding edge than... "the Uber of X" app number 831813 in the US. Or fucking Juicero or Theranos or any of the other scams.
Because doing the millionth CRUD in USA is very exciting?
One wonders if things win because they really are better, or because there's sufficient financial momentum behind them. I have worked in the public sector for some years, and I don't think Europe is behind, just that the budgets are a lot smaller. If you want to capture a lot of people in an ecosystem or walled garden, you're going to need money, and lots of it. For all that's good and bad about it, most of that excess is concentrated in the US, in a few hotspots. No need to get distracted and put a flag on somebody like a Zuckerberg or Jobs or Gates though.
> and I don't think Europe is behind, just that the budgets are a lot smaller. If you want to capture a lot of people in an ecosystem or walled garden, you're going to need money, and lots of it
And the initial market you have is quite a bit smaller. Germany is the biggest EU country by population at 84 million, compared to 333 million in the US. Moving into another EU country means translating into a different language, verifying what laws apply to you, how taxes work, etc. Sometimes it's easy (just a translation), sometimes you might have to redo everything almost from scratch (e.g. Doctolib which schedule healthcare appointments, do meetings online with doctors, can be used to share test results, prescriptions - each new country they enter will have a lot of regulations on healthcare data that will need to be applied).
But it's mostly the budgets.
Yes, but that still covers infrastructure (cables) and a lot of equipment for the experiments including but not limited to massive storage and tape backup, distributed local compute, and local cluster management all with users busy trying to pummel it with the latest and greatest ideas of how they can use it faster and better... Not to mention specialist software and licences. 50M doesn't go that far when you factor all of this in
[flagged]
https://en.wikipedia.org/wiki/CERN#Scientific_achievements
Here's a couple, in case you don't want to read the page:
- CERN pioneered the introduction of TCP/IP for its intranet, beginning in 1984
- CERN has developed a number of policies and official documents that enable and promote open science
- The CERN Science Gateway, opened in October 2023,[179] is CERN's latest facility for science outreach and education
I purposefully picked items that weren't directly particle physics related.
Just tacking some detail onto "promote open science".
CERN was/is a large early user and supporter of the open source KiCAD electronics CAD tooling. The downstream impact of improved accessibility to solid ECAD tooling has been a large contributing factor to the growing ecosystem of open electronics.
A lot of really impressive test and measurement equipment to support their research is developed in the open (see https://ohwr.org/project). People on HN are probably most likely to have heard of the White Rabbit timing project, but there's fantastic voltmeter designs, a lot of FPGA projects for carriers, gateware, fun RF designs.
They also use the expensive big ECAD tools for the super complex stuff.
But no secret - they are one of the reasons why Kicad isn't an ugly duckling anymore.
They have their own page for it: https://kt.cern/
There's a lot of use for the acceleration and sensor knowledge in the medical sector. Technology first developed for high-energy research can be used to improve CT scans[1], better cancer treatment[2] and so on. This goes way back.
[1]: https://home.cern/news/news/knowledge-sharing/spectral-imagi...
[2]: https://kt.cern/success-stories/pioneering-new-cancer-radiot...
Inventing WWW is arguably the single greatest economic development in the history of mankind.
But if Berners-Lee hadn't started the WWW, someone else probably would have within a few years: the hard part was the development of the internet, i.e., a flexible low-cost wide-area network where anyone could start a new service (look in /etc/services to see all the services that people have defined over the years) without the need to get permission from anyone.
IIRC the first WWW server went live in 1990. By then there was already WAIS, Archie and Veronica (search engines for anonymous-FTP sites). In 1991, the first Gopher server went live. Gopher grew rapidly till the late 1990s.
The US government's Advanced Research Projects Agency started funding research into "packet-switched networks" in 1960 which would eventually lead to the internet, which went live in 1969 (under the name ARPAnet, but only a pedant would say that ARPAnet is not the early verion of the internet). Then the USG continued to fund the internet every year till it no longer needed funding in the early 1990s.
So, CERN and Berners-Lee (mostly the latter because no one at CERN other than Berners-Lee cared much about the WWW in its early days before it became a big hit) get some credit for the WWW, but in my reckoning it is a small amount of credit.
But if…
But wasn’t.
A lot of the benefit has come from learning expertise in applications.
Tons of the data science tools have roots in CERN. Tons of interesting statistical methods, tons of experience R&D with superconductors and all manners of sensors.
Tons of math/ computation techniques / modeling, etc would not be here without for CERN.
It would be sort of silly to expect that any of their actual discoveries or tests of the SM would have any actual application, but the ancillary benefits are there.
Which tons? And why would it be silly? If actual new particles or physical phenomena were found the applications would be trillions
a particle that requires 30 km particle accelerator to produce isn't going to have that many applications on earth
> practical social and economic benefits to humanity as a whole?
Why does it have to be practical? Scientific discovery is a perfectly valid end in its own even if it only ever means that we understand the universe better
The fact that almost always scientific discovery turns out to have practical purposes in the long run (centuries, not decades) is an added bonus.
It's not like it's a huge expense either. If switzerland decided to it could cover the yearly budget of cern, by itself at the cost of a fraction of a percentage of its gdp alone
There's a number of points to unpack here.
High energy physics research has contributed some technology with social and economic benefits. Some of that has been direct results coming from pure research into fundamental properties of matter and electromagnetic radiation, some are indirect results that came about because when you build an institute like CERN, it spontaneously generates advances in other areas that solve more general problems (this is known as the "collect a bunch of smart people in a single place, with a lot of resources, to solve a unique problem" strategy). But no, most of the research, pure or applied, has not really had direct practical social and economic benefits to humanity as a whole.
That's entirely missing the point. We, as a society, have decided that we will balance our economic productivity into several different areas- welfare, infrastructure, military, industry, science/research, technology. We believe that investments in areas of research which have no direct benefit still can have positive outcomes- partly through fundamental discoveries, but also enriching us as a species. We also believe these investments will ensure that we have the freedom to be productive in the future.
A cynic might even say that CERN has played a critical role in keeping people from working on military applications, or working for the enemy.
If your criticism (it's hard not to read your comment as an implicit criticism) is that we should invest the results of our productivity more directly into areas which maximize social and economic benefits- sure, this is argued about all the time. The SSC was cancelled, at least partly because people failed to see the value in having a world-class HEP facility in the US.
[flagged]
No, but had cynicism? Off the member states the highest cost per used payer is still less than a bag of peanuts each year and most people with throw that at the TV over whatever upsets them without thinking. It's collective science not big pharma which is soaks tax payer money and then sells the discoveries back to you with 1000x markup. And yes CERN has played an important part in the scientific conversation of where we are in the universe and what is looks like. If you don't think that's important I think flat earth cults are working just as hard to derail conversations they don't want to join in good faith...