Sadly, we have all seen these promises, "X makes Y much easier" but you cannot make complex things easy without removing lots of functionality.
Kubernetes for the basics is actually pretty easy despite what people say, I got a fairly simple cluster running with little pain but...it doesn't take long before you want multiple clusters, or vlan peering, or customising the DNS or.... and that is when it becomes complex.
What will fly.io do? Probably what everyone else does, starts simple, becomes popular and then caves in to the deluge of feature requests until they end up with an Azure/AWS/GCP clone. If it stays simple, lots of people will say that you will outgrow it quickly and need something else, if you increase functionality, you lose your USP of making infrastructure easy.
I think perhaps the abstractions are the problem, if you are abstracting at the same level as everyone else e.g. docker images, orchestration etc. then I don't understand how it can ever work differently.
To make my point, the very first comment below (above?) is about container format, a really fundamental thing that noobs are not likely to know about, they will just immediately have some kind of error.
You definitely can't make complex things simple just by removing features.
What we did, instead, was built low level primitives, then built opinionated PaaS-like magic on top of those.
If you're running a Phoenix app, `fly launch` gets you going, then `fly deploy` gets you updated.
If you want to skip the PaaS layer and do something more intense, you can use our Machines API (or use Terraform to power it) and run basically anything you want: https://fly.io/docs/reference/machines/
We are very, very different than k8s. In some ways, we're lower level with more powerful primitives.
We probably won't build an AWS clone. I don't think devs want that. I also don't think devs want a constrained PaaS that only works for specific kinds of apps.
I think want devs want is an easy way to deploy a boring apps right now, and underlying infrastructure they can use to solve bigger problems over time.
> What will fly.io do? Probably what everyone else does, starts simple, becomes popular and then caves
No, mrkurt will not cave, I can guarantee you that. Fly will be a platform that says no to feature requests that don't make sense for their customer base.
I have no affiliation with Fly, other than I've used it on and off since the beginning of the platform's existence. They're a veteran team that knows how to build platforms. I definitely trust them to go in the right direction with their roadmap, and all my new projects go on Fly.
You should look at the Multi-region PostgreSQL feature they did, not their best work imo.
Hey! I love this feature. What don't you like?
> No, mrkurt will not cave, I can guarantee you that.
Fly is not your typical startup with dreams of becoming the next big corp monster.
They are just a bunch of talented people with a vision having fun making cool stuff.
Not to break your rose-tinted glasses, I think fly.io is pretty cool too. But it's a for-profit company with outside investment, coming from investors who expect a return on their investment, one way or another, and they likely have some influence on how the company will act.
Same has been said for every company who taken outside investment ever. "But no, Heroku/Figma/GitHub/X are different, they really do care about their users and would never sell/go public/Y", and then a couple of years later we end up in the same position.
It might not even be up to the "bunch of talented people" in the end, what they have to do to survive or to grow. But grow they have to, unless investors are fine with getting their ROI over 10-50 years rather than 1-10 years. A growing usually comes with some pain.
This is a 10-50 year company. You're not wrong, though, we are either building something big or we're building something that will fail. Our bet is that a developer first public cloud is important and needs to exist.
Which means, there will be one of three outcomes:
1. We are correct, and manage to build the right thing. We'll get to work on this forever.
2. We are correct, but not the right group to build it. We fail.
3. We are incorrect, and the world doesn't need a public cloud for devs. We fail, and I become a carpenter.
We have the same incentives as our investors. That doesn't mean it'll work. It does mean that we all believe that we're building a product for developers.
We're pretty good at surviving, so far. And there are early signs that we're good at growing. There's reason to be hopeful. :)
Have other people invested money in them? If that's the case, sooner or later they don't call the shots, but rather who owns the capital and wants it to grow.
I think if you look at the companies you like, you'll notice that it's not the investors who call the shots. There are a handful of very large, very profitable, developer focused companies that investors love because they remain developer focused.
There are also companies that never figured it out because "developer focused" is not the right business model for them. Those are, I think, the companies that make us all feel burned.
Heroku is one of those companies where "developer focus" is not the right business model. Salesforce has a model, it's working very well, Heroku's doesn't fit.
> it doesn't take long before you want multiple clusters, or vlan peering, or customising the DNS
Then, Fly is not for such an application. Just not yet. I mean, we wouldn't buy a snowboard and complain we couldn't go skiing. Different tools.
The point really is, for a similar thing that which Fly is capable of (and other NewCloud services to an extent like railway.app, render.com, replit.com, convex.dev, workers.dev, deno.com, pages.dev, vercel.com, temporal.io etc), you're better off NOT using AWS/GCP/Azure. I certainly have found it to be true for whatever toy apps I build.
> Sadly, we have all seen these promises, "X makes Y much easier" but you cannot make complex things easy without removing lots of functionality.
There's certainly a limit, but it makes me so sad that developers see the current state of orchestration and say “welp, it's a complex problem, guess this is as good as it gets” (not you specifically, but it's a common sentiment on HN.)
Sure, there will always be use cases that require getting down to a lower level, but there's definitely space for reducing complexity for common use cases.
It sounds like you’re commenting in general terms without having looked at what Fly.io is actually doing. Yes, choosing the right abstractions is the problem, and what makes Fly.io really interesting is that they chose different ones. It’s really well explained in their docs and blog posts.
> "X makes Y much easier" but you cannot make complex things easy without removing lots of functionality.
The question is whether customers need or want that "functionality".
Spent last night migrating a Discord bot from Heroku to Fly as a result of the upcoming closure of the Heroku free dynos. Overall fairly painless, though I opted to provide my own images rather than use a source-to-image pipeline and discovered one little quirk: images require a Docker V2 formatted manifest. I use Buildah and Podman in my workflows which default to the OCI format. This was simple enough to solve once I figured it out, but I only found one or two forum posts on it and spent a lot of time trying to figure out why the deployment couldn't find an image that I manually pushed to registry.fly.io.
For others, you can adjust the format with `podman build --format=docker` (or set `BUILDAH_FORMAT=docker`) or possibly you can push it separately?
Wonder why they don't use OCI format.
Yeah, I use Buildah for creating my images, so the environment variable or specifying commit --format docker would work. Before I did that I just re-pushed the image itself using the v2s2 format, so:
# podman inspect quay.io/my/image:latest | jq "..ManifestType" "application/vnd.oci.image.manifest.v1+json" # podman push --format v2s2 quay.io/my/image:latest registry.fly.io/my-app:latest
There are a bunch of rough edges.
Your server doesn't have a static IP for outgoing requests, so to use it with RDS, you can't just open up a port on the RDS side. (They want you to set up your own proxy) https://community.fly.io/t/set-get-outgoing-ip-address-for-w...
The NFS kernel module isn't installed, so you can't use EFS. (They suggest some 3rd party userland tool)
They expect you to set up VPN access with Wireguard for any connections to your containers. You can't just SCP your files to a volume. It's so much more hassle than connecting with kubectl cp or scp, especially if you're hoping to script things.
All that said, I'm happy to see competition in the "we'll run your docker image" space.
This all seems fair. Just a note: we don't generally expect our users to "set up" WireGuard --- we bake it into `flyctl`, our CLI, which automatically brings up WireGuard sessions as needed. A month or so ago I merged a PR to add sftp support to our SSH server; sftp probably works now with "native" WireGuard, and when I get unburied I'll work out an sftp client interface for `flyctl` itself.
It's a rough edge, to be sure! I just wouldn't want to leave it as "we think the status quo is the right way to handle getting files to and from instances".
I like Fly but can't migrate my apps to them because they don't support cron jobs - they've been promising it for more than a year now: https://community.fly.io/t/recurring-scheduled-tasks-like-cr...
I run cron jobs on fly, using a process. Eg with a rails app running Puma:
And the shell script looks like this:
[processes] web = "bundle exec puma -C config/puma.rb" worker = "config/cron_entrypoint.sh"
Although I would say that cron isn’t a great solution for containerized apps on most platforms, it seems like scheduled processes need a rethink for todays infra.
#!/bin/sh printenv | grep -v "no_proxy" >> /etc/environment cron -f
We're building serverless scheduled processes, queues, and event-driven systems at https://www.inngest.com/. It's early days, and we agree — they've needed a refresh to adapt to modern practices.
"Although everybody talks about how Kubernetes, for example, is complex and hard, most companies out there use it"
I am very skeptical of this claim.
Me too, though "Most of the Fortune 500" or similar is probably true. It tends to find it's way in somewhere if the org is large enough.
I've been using Fly for 2 years now. Overall I'm very happy and I recommend it to everyone.
If you just want to run a container in multiple regions with anycast, Fly is really the best option out there IMHO. Nothing comes close.
There are some rough edges for certain use cases but they keep polishing the service and the DX keeps getting better.
Personally the only features I'm missing today are:
- PG backups/snapshots. AFAIK these are coming in the form of virtual disk snapshots.
- Scale apps from zero to say 100 VMs like Cloud Run does. There's some autoscaling right now and the machines API, but still needs more polish. Specially for certain use cases like concurrent CPU heavy tasks (video encoding, etc). AFAIK some form of this is also coming in the next months.
I skimmed through their docs but couldn't find whether they:
* support for abstract apps, not only HTTP/web apps like heroku, let's say I want to deploy a SIP app
* support for HTTP/2 and potentially HTTP/3
If they do support these two I would say it's enough to be considered Heroku killer.
You can route arbitrary TCP or UDP services through our load balancing layer just fine. I'm not sure what SIP actually needs, but it might work. We don't currently have a way to route TCP connections directly to individual VMs, so stuff like WebRTC doesn't work.
We do support HTTP/2.
I absolutely love Fly. Cannot recommend them enough for the right use cases.
Could someone elaborate on the right "use cases"?
Just curious about other's people definition, as I imagined Fly.io as a more self-served BaaS (mostly for web applications)
My guess would be something like:
- startups, small teams, no devops.
- horizontally scalable monoliths.
- no complex infra needs.
Haven't used fly.io but strongly considering migrating over from Heroku and above is roughly what I would say is a good fit for Heroku so hopefully it's the same on fly.io
I haven’t done it myself yet (soon), but they seem great for even multi service apps. All apps in an org are automatically connected to each other via a wireguard network. Internal communication and service discovery is provided by special internal dns resolution.
It feels like the simplicity of Heroku deployments with the power and security of an AWS VPC.
The big win is web applications that are read heavy and have geographically dispersed users.
Coming from a devops background, I kind of wish they didn't support arbitrary docker containers, as I found the experience fairly painful (going through one of their guides), and soured me from using Fly at all. Maybe it was the app's fault, but troubleshooting was not intuitive at all, especially on the networking side.
You'd prefer if they only supported build-packs/nix-packs? I can see the appeal from a simplicity standpoint but supporting containers expands the usecase significantly.
The thing is about all these tools is that starting out with a simple VPS is much more cost effective? https://contabo.com has 4 cores for 5€ and you can scale up to 10 cores and 60 gb ram for 27€ a month. And if you need dedicated cores 12 cores with 96GB is 130€ vs 8 dedicated cores with 64 GB ram for 550€
Choose a docker image and just docker-compose up your application.
If you outgrow that, you might aswell switch to kubernetes and aws/gcp/azure
If you don’t need edge compute (which you do if you have customers dispersed geographically) then what you say is true.
But if you do no amount of Kubernetes on the old school cloud providers is going to get you there. You will encounter the hard problems fly solves for.
Is this contabo reliable? Their prices are extremely low (way more than Hetzner).
How are they offering such low prices? Overprovisioning users?
One possibility is that they have fewer regions than something like AWS, so they can put their data centers somewhere where they get favorable electricity/cooling costs.
It seems Fly.io originally focused on Elixir, but has identified the product-market need for app server + Postgres.
We didn't know about Fly.io when we chose GCP for this setup.
The initial setup on GCP was exceedingly painful. We used CloudRun for our app server, with the value prop being that "it just works". It didn't. Our container failed to start with zero logs from our servers. Stackdriver was of no help. Eventually we found a Stackoverflow thread revealing that CloudRun didn't like Docker images built from Macs. As always, GCP's official docs and resources are incoherent. GCP docs address a hundred things you don't care about, and the signal-to-noise percentage is in the low teens, if we're being generous. We had to chase down half a dozen bureaucratic things to get our CloudRun app to see and talk to CloudSQL. Apparently with Fly.io, you just run a command to provision Postgres, and pass in an environment variable to your app.
We consoled ourselves that GCP was difficult to setup but now it's set-and-forget. This is also a lie. This week we saw elevated and unexplained 5xx. First was CloudRun randomly disconnecting from CloudSQL. As AWS measures reliability in terms of 9s, the way GCP DevRel responded to this bug is that this is a distributed system and therefore acceptable that things just fail a reliably human-reproducable 1%+ error rate. Yesterday we saw botnet traffic scanning for vulnerabilities on our app. This happens if you're on the web, not inherently GCP's fault. We have GCP's Cloud Load balancer setup but it's not very smart. We were able to manually block specific IP addresses but it's no where as meaningful as Cloudflare. Not a fan of Cloudflare the company but their products address a need. The botnet somehow knocked over our "Serverless VPC connection" to CloudSQL. Basically what that is is a proxy server that you are forced to setup because CloudRun can't actually talk to cloudSQL. All the auto-scaling claims of GCP's serverless are diminished if we are forced to introduce a single point of failures like this in the loop. That serverless VPC connection requires a minimum of 2+ VMs, so the scale-to-zero of CloudRun is no more.
Our experience with GCP is constantly having to come up with workarounds and addressing their accidental complexity. This should not be the customer's problem. For example, CloudSQL doesn't have an interface to query your databases. If you use a private IP for security, you can't even use GCP's command line tools to access this. We found out that GCE VMs are automatically networked to talk to CloudSQL. We ended up creating a "bastion" GCE VM instance and setup Postgres CLI tools in order to do ad-hoc queries of our DB state. For this, we just needed to the cheapest VM but GCP makes even this difficult. As for Stackdriver, it's still been an annoyingly painful UI.
If I remember correctly Fly weren't focused on Elixir at first, and instead were just trying to be a better Heroku, but then one of their developers realised Elixir would be a good fit for one of the components they were writing, and off the back of that realised "Heroku, but with a private network to allow comms between nodes" was a really good fit for Elixir applications.
Yes, the platform came before the Elixir love; it just happened that Elixir is the development ecosystem that best takes advantage of a distributed platform with private networking, and that Phoenix LiveView really screams when you can run it within 10s of milliseconds of all your users.
I had a good experience doing the test app with fly, I’d definitely consider it for something internet routable and non-complex like a bot or something. Very excited for when I can run more complicated workloads such as workloads started with Docker Compose, without having to implement all the Docker Compose functionality myself.
Docker containers are good deployment unit. At this point producing them and publishing them to a container registry is pretty straight forward. That's not something that requires devops people. Any reasonably senior backend developer ought to know how to do this or can learn how to do it in a few hours.
The rest is just deploying and running containers. There are lots of ways to do that. I loved using Google Cloud Run a few years ago. Stupidly easy to get started with and flexible enough for many things. With some service discovery on top, it's perfect for a lot of stuff. Add some managed middleware & databases to the mix and you essentially have a close to zero ops CI/CD capable environment. No devops needed for this either. When I did this for the first time, I was up and running with our dockerized app in about 15 minutes. Most of that was just waiting for builds to finish.
I'm CTO of a company currently and I've gotten sidetracked with enough lengthy and super expensive devops type stuff in past projects that I'm on purpose avoiding to go near certain things not because I can't do it but because I don't think these things are worth spending any time on for us right now. So, no terraform, no kubernetes, no microservices. I just don't have the time or patience for that stuff. We run a monolith. So, there's not a lot I actually need from my infrastructure. I need it to be fast, secure, and resilient and be able to run my monolith. But I don't need to have things like service discovery, complicated network setup (bog standard vpc is fine), and all the other stuff that devops people obsess about.
We use a load-balancer, I clicked one together in the Google UI. It's fine. Ten minute job. Doesn't need terraform scripting. We have two of them. And we have a couple buckets and our monolith behind that. I could grab the gcloud command that recreates this thing and put it somewhere. But I have more urgent things to do.
For deployment we use simple gcloud commands from github actions to update vms with new instance templates to tell them to run the latest container that our build produced. We started with cloud run but our monolith has a few worker threads that we don't want killed so we moved it to proper vms. Very easy to do in Google Cloud.
Our deploy command does a rolling restart. We have health checks, logging, monitoring, alerting, etc. Could be better but it works. Initial provisioning of the environment was manual and we scripted together all the commands that are part of our deploy process for automation. We added a managed redis, database, and elasticsearch to this. None of that was particularly hard or worth automating to me. Yes, it's bit of a snowflake. But not that complicated and I documented it. So, we can do it again in a few hours if we ever need to.
The dirty little secret of a lot of devops that it's a lot over over-engineered YAGNY stuff that is super labor intensive to setup and maintain and you end up using it a lot less often than people think.
This is why freelance devops engineers are so in demand: this stuff just requires a lot of manual work! Companies need these people full time and usually more than just one. The devops alone can add up to hundreds of thousands of dollars/euros per year.
It's a lot of manual work that probably should be automated. However, hiring a lot of people at great expense to automate things that are cheap and not that complicated is not always the best use of resources. I've seen companies that spend an order of magnitude more on devops salaries than on the actual hosting bills. If you think about it, that's kind of weird to be spending so much for so little gains. And most of these companies are not particularly big or experience enormous scaling issues.
Fly made infrastructure really hard for me because their hypervisors cpuid emulation broke my program.
Yes, fly employees, I will file a bug somewhere - or email me.
Does fly recommend any good tools for measuring client latency (which is their big selling point)? I know they offer graphs for server latency.
Ah. Full circle back to heroku.
Without PITR for postges I don’t think so.
There is nothing easy about them, cheaper yes (only for the lower instances), but not easier. Until they have first-class, evergreen, support for Heroku buildpacks, they are a subpar replacement.
They are also missing a proper managed database with point-in-time backups through the web UI like those offered by most proper PaaS services.
We aren't going to support Heroku buildpacks. We are working to make more frameworks easy to launch with minimal configurations. Which means – hiring people to work specifically on those frameworks + docs + builds.
We did bake nixpacks into our CLI recently, they seem better for our particular environment than buildpacks. Railway.app did a great job with these: https://nixpacks.com/docs/getting-started
We're working on managed databases, but we're not doing them like Heroku did. We just launched a preview of managed Redis with Upstash: https://fly.io/docs/reference/redis/
This seems like the future of managed databases on a platform like ours. There are companies that build very good managed database services. We're getting to the size where these people will work with us. Getting well managed DBs onto the platform is basically what I'm spending all my time on these days.
Incidentally, we're a lot cheaper than Heroku because we run our own infrastructure.
It's not Fly's fault if Heroku decided to remove their free tier. They do have buildpacks for the most popular stacks. Eg: You can deploy a Node app really with just `fly launch`.
As for the managed DBs, one of their founders was from Compose, so yeah they know how these things work. But AFAIK Fly doesn't have much interest in DBs, their focus is really in VMs.
> point-in-time backups through the web UI
Is `fly postgres create --name restoredDb --snapshot-id backupId` that hard that's it's a deal breaker?
> support for Heroku buildpacks
I haven't tried it but there's some buildpack support: https://fly.io/docs/reference/configuration/#builder
It's not a managed database, if things break at 3am, you have to fix it yourself. The buildpacks are an afterthought using third-parties. A real evergreen solution will track the official Heroku solution (since fly.io doesn't even bother to document their buildpacks) to the nearest day. The whole point of the P in PaaS is that somebody else does the DevOps. If I have to do so much DevOps, then there's no point in adopting Fly. I am getting the impression that users here only care about their shiny blogposts and content marketing, versus the actual experience as a business user.
I'm going to make a point here that might be contentious: you are asking for Heroku. Heroku is going away because it no longer works as a business.
We are not Heroku. It is ok for you to not like what we're doing. We're building something different. We've never even _said_ we were a Heroku alternative, we just liked their UX for deploying apps and decided to roll with something similar.
I also agree about the importance of fully managed databases!
We shipped "automated" Postgres because we couldn't get any full managed DB providers to pay attention to us when we were small. I expect we'll have an option running on Fly infrastructure in the next six months.
Our Redis is fully managed, so you can get an idea of how it might play out: https://fly.io/docs/reference/redis/