Back

GitHub degraded performance – resolved

101 points2 yearsgithubstatus.com
bob10292 years ago

This one is pretty nasty. Getting tired of the disengagement this causes for the team. It's basically a lost day of productivity even if GH goes down for only 30 minutes. Yes, we can continue coding locally, but issues & PRs are a huge part of our daily process.

When I get back from vacation we are moving our shit to the enterprise plan. $21/user-month is really not that big of a deal when you are running basically your entire business through the product.

I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers, but at the same time our infrastructure has proven itself to be extremely reliable over the last 4-5 years. Even hosting GH enterprise on public AWS/Azure is more ideal in my eyes now, because I can control the physical region and tenancy. There is an Azure datacenter within 100 miles of many of our home offices and I can ensure that our Github stack spins up there. Minimizing the amount of internet you have to transit to get to your applications can sidestep a lot of this stormy public cloud/internet weather bullshit.

andrewstuart22 years ago

I think you will quickly find if you're just deploying GH Enterprise on premises that it is not at all what you get from the GH Cloud offering. GHE has its own product roadmap that is quite a bit behind the cloud product, and in many cases (IMO) unacceptably so. It still doesn't support cache for runners, last I checked, though I've since moved on from the org that required me to work with GHE. I'm back to my happy place with self-hosted GitLab, and a little bit of GitHub cloud.

bob10292 years ago

> GHE has its own product roadmap that is quite a bit behind the cloud product

This is exactly what we want though. We don't need the new fancy shit on a regular cadence. Issues, Code, PRs and 1 line checkbuild scripts are all we care about. Everything else is built into our software.

andrewstuart22 years ago

What I mean is that it's quite clearly not the same product as cloud. It does have roughly the same road map as cloud, just pretty far behind and/or cherry-picking some features in a different order. My experience with enterprise software is that it doesn't matter what the roadmap is as long as you get to choose if/when the benefits outweigh the risks for an update. And you usually want certain releases to get backported security updates for that same reason. You don't have to take new features and their associated bugs but you do want, and get, security fixes. That's a separate thing, because this would be the case if GHE was just "run your own copy of .com."

What is really rough about GHE is that you can't choose a lot of the features or IMO baseline requirements like caching that you've probably come to expect from github.com, and may have been around for years. At least not until they can get GHE to parity with .com.

eyegor2 years ago

I'd recommend checking out gitlab for on-prem hosting in an enterprise environment. Works great, integrates with AD/ldap and most of the features are available on the free tier if you want to test it. Practically a drop in replacement for github.

+1
mdaniel2 years ago
onionisafruit2 years ago

> I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers

That by itself would be ridiculous, but there's more to it than that. Your GHE server won't have new code deployed to it hundreds of times per day the way github.com does. You probably won't be the target of ddos attacks either.

Very few of github.com outages are the result of maintenance errors.

cmckn2 years ago

> It's basically a lost day of productivity even if GH goes down for only 30 minutes. Yes, we can continue coding locally, but issues & PRs are a huge part of our daily process.

I know outages are frustrating, but how does 30 minutes before 10am ruin an entire day? Maybe you’re just being hyperbolic, but people take coffee breaks longer than that.

bob10292 years ago

> how does 30 minutes before 10am ruin an entire day?

Not everyone is on pacific time. This impacted us right in the middle of a standup call and disrupted our planning for the day.

Also, the problem is more that you don't know how long the outage is going to last at first, so you start finding other ways to occupy your time. Through the lens of hindsight, yes we are certainly being hyperbolic in those cases where it was only 30 minutes.

jimmaswell2 years ago

> so you start finding other ways to occupy your time

Ok, pick a ticket and do some work on it locally, when that's done do the same with another. I can go a full day without interacting with github because I'm working on a local branch. Make a note of what branches you need to push later. I can't possibly imagine throwing my arms up and saying the day is wasted because I have to work locally. It's completely unbelievable.

+1
jorams2 years ago
nonbirithm2 years ago

> It's basically a lost day of productivity even if GH goes down for only 30 minutes.

Are we entering an era where if we don't have hundreds of thousands of servers running 24/7 to host our services, with all the resource consumption and environmental implications that result, that we will no longer be able to remain productive as a society? Is this gradually becoming a new baseline for humanity from which we cannot reasonably downsize?

ygjb2 years ago

Yes.

It's worth noting that you can take almost any software from before the late 90s to early 2000s, depending on the vendor, that is still available, and with a layer of emulation get it running in minutes.

The vast majority of software that is being built today for end users simply will not function in a short time frame because of aggressively built in dependencies on cloud based services, often with those dependencies designed to encourage customer lock-in and prevent piracy by forcing users to have active accounts and shift core logic from endpoints to cloud services.

Even moving past licensing servers and account capabilities, tools like Grammarly ship much of their analysis to the cloud, same for most translation services. Many modern text to speech services are cloud based as well (just look at how useless a modern cell phone becomes when you are without a data connection, for example).

I don't know what the statistics would look like, but I shudder to think how much of the world economy would grind to a halt if Amazon or another significant cloud provider had a sustained, multi-region outage (say 24-48 hours).

It's a god-damn mess, and we did it to ourselves.

bob10292 years ago

> I shudder to think how much of the world economy would grind to a halt if Amazon or another significant cloud provider had a sustained, multi-region outage (say 24-48 hours).

You can rest at ease. Nothing that is mission-critical for the world's financial infrastructure is hosted within one of these sorts of facilities. Facebook and Netflix might go down for days, but your Amex will still work at any merchant with a functioning internet connection.

I have been inside of [financial services organization]'s datacenter in [some midwestern state] which was purpose-built for the IT load. The strategy is "failure is not an option". It's essentially 1 gigantic, redundant life-support system for the one of the more sensitive computers on the planet. Amazon and Microsoft cannot afford to go to these lengths for the market they serve.

ygjb2 years ago

I wish I had your confidence in the financial system, but it's not the financial services I am concerned about.

Financial services are important for moving money around, and processing electronic payments, but it doesn't matter how effectively you can process a wire if there are significant supply chain disruptions, and systems failures that take down the platforms that major retailers and distributors use for logistics.

Even if ATMs and bank networks remain up, what about the encashment and physical security services that those institutions rely on to move around actual physical money?

The economy is more than just financial services, and all of those financial services are just proxies for the real world goods that people need to survive for more than a few days in most urban centres.

bob10292 years ago

> Are we entering an era

We've been in this era for about 4 decades now. There are mainframes which do payment processing that, if they were to fail, would cause substantial harm to the global economy almost instantly.

sudhirj2 years ago

Utility technology become foundational very quickly. Despite it being very new, humanity is already fundamentally reliant on global supply chains, oil, electricity, networks, satellites and many other technologies that we cannot downsize. We could collectively plan and execute decades long exit plans, like we do for oil, but outages will bring daily life to a halt.

lmm2 years ago

This is like being shocked that so many companies now rely on having an electricity supply and can't work during power outages.

jrochkind12 years ago

Yes. You're just noticing?

teh_klev2 years ago

> There is an Azure datacenter within 100 miles of many of our home offices and I can ensure that our Github stack spins up there. Minimizing the amount of internet you have to transit to get to your applications can sidestep a lot of this stormy public cloud/internet weather bullshit.

You've no guarantees that your local'ish data centre is going to be hop-wise, route-wise and peering-wise any better than a DC 1500 miles away, in relation to your home or office ISP.

bob10292 years ago

> You've no guarantees that your local'ish data centre is going to be hop-wise, route-wise and peering-wise any better than a DC 1500 miles away, in relation to your home or office ISP.

You're correct. In fact, as I type this reply my cloudflare diagnostics are indicating I am talking to a datacenter 200 miles further away than would otherwise be ideal. That said, its still within an extremely reasonable distance. This is a "risk" I am willing to take. It's certainly a better starting point than guaranteed 70ms minimums.

saghm2 years ago

After the first paragraph, I was somewhat expecting this to be about wanting to move off of github due to the issues, so I was kind of surprised to see that you instead decided to start paying them! I don't think you're wrong to decide that or anything; it's just interesting to see that negative experience can drive free users to become paid users when naively I would expect the opposite.

bob10292 years ago

It's even more nuanced than that. We already pay them money and are looking to pay more.

lima2 years ago

> I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers

Why? It's totally reasonable that your own GHE instance will have better uptime.

Running GitHub.com is much, much harder than a private instance (DB scale-out, load, ...).

Our in-house Gerrit infra and CI has had a significantly better uptime than GitHub over the past year, but we have hundreds, not 60 million users and exabytes of storage :-)

cyberpunk2 years ago

I would strongly recommend (having used both extensively) going all in in gitlab instead if you have to do a migration anyway.

scaryclam2 years ago

I don't buy this at all. Even if GH is down for the entire day, how the heck does NO-ONE on your team know what they're doing? Was no-one working on something already? Did nobody pay any attention during planning? Do you not have anything you can do from memory from your backlog?!

If you can't work for a day just because Github is down, then there's bigger problems in your process that github being down. I'm sorry of that sounds harsh, but you're either being hyperbolic or you have some real issues to fix in your team or organisation.

rightbyte2 years ago

> I do agree that it's ridiculous to assume that we can manage Github's software better than their own engineers

Not really. You can mess with stuff when it suits you with a risk for downtime. Hosting yourself has the same advantage as disabling auto-updates - you are in control of when to break stuff.

wdb2 years ago

Personally, I had more issues with all the Google Cloud issues that happened this week. It's difficult to debug issues when Cloud Logging stops working. Most worrisome was that Google reported issues in US regions while we only have EU regions.

blitzar2 years ago

On the plus side, you have a process and in theory it works, or you would be looking elsewhere.

You might have lost a day today, but how many days have you gained thanks to these tools the last month?

dabeeeenster2 years ago

How does moving to the Enterprise plan help?

onionisafruit2 years ago

Presumably GP means they will deploy their on GitHub Enterprise server instead of using github.com.

sudhirj2 years ago

GH Enterprise is run on your own servers, so you'd theoretically run it right in the office. It may not move the needle on actual downtime, but there's some control in the downtime - if the LAN is out no one can work anyway, and upgrades will only happen on your business's lean days, be tested out with the teams that have an appetite for experimentation, etc.

intunderflow2 years ago

When GitHub is down we legally don't have to do any work, it's in the constitution I swear

m_a_g2 years ago

Best comment in this thread by far

funOtter2 years ago

I was just doing some development on my GitHub Actions ... can only assume it was my fault.

roland352 years ago

Should have probably left out that sudo rm -rf from your Action

mabbo2 years ago

In the middle of onboarding with a new company. We're at a critical point of training that requires GitHub.

Oh boy, this is going to be a fun day.

awestroke2 years ago

The incident lasted less than an hour. How does this affect your whole day?

mabbo2 years ago

It didn't. But at the time, the trainer was very, very worried.

tyingq2 years ago

Maybe you could sign up for a GitHub enterprise trial? At the least first few screens seem to be working.

mot2ba2 years ago

This incident has been marked as resolved. You guys could check how often Github notable incidents that really captures the audience's attention in HN [0] [1]:

[0]https://news.ycombinator.com/from?site=githubstatus.com

[1]https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=fal...

jrochkind12 years ago

Looks like about once every two months?

Which is actually more than I expected, and seems like kind of too much.

contingencies2 years ago

They were down at least a solid hour last week and didn't even post to their status page. I put in a query and got no response. They then unilaterally closed the ticket asked me how my support experience was. Time to move to Gitlab. https://news.ycombinator.com/item?id=28874751

jrochkind12 years ago

Hmm, how do I get the Github Actions CI to run on all the already existing PR's for which it never ran? Anyone know?

mdaniel2 years ago
jrochkind12 years ago

Not sure. There was no error reported, just a "waiting" message, with no "run" created.

i'm not sure I have any ID's to give to those API calls. Would have preferred something from the web UI.

I just went ahead and created new commits for them all. (Create an empty commit, or --amend to re-commit the last commit with a new timestamp).

Would prefer if there were an easier way to do it, but fighting with http API's I am not familiar with when they aren't immediately apparent and I'm not sure they'll work at all was not that easier way.

maherbeg2 years ago

I'm surprised they don't have a couple of separate clusters that they roll things out to and monitor. Seems like you could have a very stable "high paying customers" cluster that is at the very end of your deployment cycle after a ton of canary checks on the way get through.

intunderflow2 years ago

With Actions they do this, if you're on GitHub Enterprise and run an action it picks machines out of a special pool set aside just for enterprise customers.

judge20202 years ago

GitHub Universe is only a week away, could be related to a bad deploy for some feature updates to-be-revealed then.

blondin2 years ago

i wonder if github ever considered a hybrid approach? an on-prem enterprise solution that syncs with github cloud.

jacobrussell2 years ago

Good time for a lunch break I guess

encryptluks22 years ago

Just noticed GitHub is down for me. Can't access repos :(