Back

Store SQLite in Cloudflare Durable Objects

244 points2 yearsma.rkusa.st
eropple2 years ago

I read through this entire post going "oh, this is cursed", slowly growing in volume the entire time. As is proper for something this clever.

I've been doing a bit of prodding at Durable Objects lately, in part inspired by the same note that the OP quoted about implementing datastores atop the framework, and this is the sort of environment that I could see being really, really useful at scale in the future. I'm anxious about using it without a public spec and some portability, but the concept is just tremendously cool.

techgnosis2 years ago
juancampa2 years ago

What do you mean by "cursed"?

eropple2 years ago

Substitute "this is a terrible idea and I love it" if you want. ;)

k__2 years ago

My guess in this context: "A hacky solution that probably is unstable"

samwillis2 years ago

This is super exciting!

I see there is concern about Asyncify. I believe in the browser there is a way around this using Atomics and a shared buffer (Absurd SQL[0] uses it), but suspect this isn't possible in CloudFlare Workers as it is working across the worker/main thread boundary? Is it possible to launch another worker from a CloudFlare worker, in the same process, like a web worker?

I am somewhat hoping that CloudFlare have something up their sleeve to make these sorts of projects easer, maybe even some sort of distributed block store.

0: https://github.com/jlongster/absurd-sql

kentonv2 years ago

Indeed Cloudflare Workers does not and likely will not support threads with shared memory, due to Spectre concerns. Asyncify is the way to go here.

samwillis2 years ago

Thats a pity, is that due to the architecture of sharing a V8 runtime with other customers workers? How is this different from V8 in Chrome and shared buffers?

kentonv2 years ago

Chrome is all-in on strict process isolation as their Spectre defense. Given the web platform that they need to support, they don't really have another choice -- precise timers and threads, as well as many other ad hoc timing mechanisms, were already part of the platform before Spectre hit. Using process for everything is pretty costly, though, to the point that they have to make some compromises which are open to attack: https://www.spookjs.com/

In Cloudflare Workers, process isolation for every worker isn't practical -- if we had to do that then we'd only be able to offer real "edge compute" to a small number of big enterprises with deep pockets, rather than at prices affordable to everyone. Our fundamental difficulty is that we need a single machine to be able to support thousands of running applications where each app may only get a small trickle of traffic, so that we can deploy every application to every one of our edge locations. It's like if you always had 10,000 tabs open in Chrome.

Instead our strategy is to stack a lot of different mitigations that make attacks slower, until the point where they are too slow to be remotely useful[0]. One part of the strategy is a novel defense we designed with researchers at TU Graz called Dynamic Process Isolation[1], in which we use hardware performance counters to detect execution patterns indicative of an attack and move those workers into private processes. For that strategy to work, though, we first need to slow down attacks enough to give ourselves a chance to detect the patterns -- and that requires disallowing precise timers or anything that could be used as a precise timer, such as multiple threads with shared memory. Luckily, we were thinking about timing attacks from the very start of the project (even before Spectre was known), so we were able to avoid ever putting these into the platform in the first place.

In general I think threads are not as important on servers because distributing work across machines is more powerful anyway. We're trying to create a platform where that is really easy.

[0] https://blog.cloudflare.com/mitigating-spectre-and-other-sec...

[1] https://blog.cloudflare.com/spectre-research-with-tu-graz/

+1
samwillis2 years ago
+1
meekins2 years ago
psanford2 years ago

I built something very similar for using DynamoDB as a VFS backing store for sqlite: https://github.com/psanford/donutdb

Glench2 years ago

Whenever SQLite comes up, always have to link to this post about using SQLite in production: https://blog.wesleyac.com/posts/consider-sqlite :)

I'm running https://extensionpay.com off SQLite and a $5/month DigitalOcean box and it's serving around 3 million requests a month with absolutely no issues and seriously low CPU load. I'm kind of astonished, frankly.

nicoburns2 years ago

> 3 million requests a month with absolutely no issues and seriously low CPU load

That's just under 1 request per second, so I can't say I'm surprised. Even if traffic is spiky that would leave you with a lot of headroom!

Glench2 years ago

Yeah, thanks for putting that in perspective! ExtensionPay is still relatively new, too, so I'm glad there's plenty of headroom :)

ryanianian2 years ago

A more useful metric for the parent comment to include would be the peak throughput in ops/time versus CPU or IO load.

bborud2 years ago

When I develop server software I always include support for SQLite and PostgreSQL so I can run the server entirely stand-alone if I want to. Choosing whether to use the embedded database or connecting to PostgreSQL is a command line switch.

I have been using SQLite in an IoT application where the measurement of a long-running test instance has 65 million rows in the main data table and is 3.6Gb in size....and the response time running queries against the database isn't bad. I'm starting to question of I actually need PostgreSQL for a lot of situations.

achillean2 years ago

We've been using it for https://internetdb.shodan.io and it's been doing a few thousand lookups/ second without problems.

rexreed2 years ago

Do you have any tips load-balancing or horizontal scaling of SQLite if you have to have a 2nd server for whatever reason?

otoolep2 years ago

https://github.com/rqlite/rqlite provides a form of horizontal scaling for reads, but it's not a drop in replacement for SQLite.

https://github.com/rqlite/rqlite/blob/master/DOC/READ_ONLY_N...

(I'm the author of rqlite)

Glench2 years ago

I don't.

My current understanding is that even though there are sharding / replication projects out there for SQLite, it's usually a better idea to just go with a typical database server when you need many machines. But a lot of sites never get to that point, especially given how fast servers and SSDs are these days, so SQLite seems like it can get you further than you might realize.

bob10292 years ago

> But a lot of sites never get to that point, especially given how fast servers and SSDs are these days, so SQLite seems like it can get you further than you might realize.

SQLite on top of NVMe with WAL mode enabled is about as fast as it gets. We've been doing this in production for years now.

We don't have a scenario where we would find ourselves writing more than 2 gigabytes per second to DB/disk, nor do we ever think we would encounter one with our product, so we have committed ourselves to single instance SQLite architecture. Saves a lot of time and frustration when you can accept your circumstances and go all-in on simpler solutions.

+3
hu32 years ago
Glench2 years ago

Cool, what's your setup / workload look like?

eatonphil2 years ago

I just learned about ZeeSQL yesterday which is SQLite+Redis. It's proprietary but if I understand it correctly (the marketing is terribly unspecific about their primary goal) it let's you scale SQLite across all your Redis nodes.

https://zeesql.com/

Zababa2 years ago

I'll be honest, I would consider running SQLite in production, at least for my personal stuff, but the part that scares me is that there's no "managed SQLite" offering and I would have to manage it myself. Litestream seems like a great option for people like me, but I'd like to hear some experience report on it, especially on more complex case, and how it fails.

Another problem (which Litestream might solve) is that SQLite is not compatible with Heroku, that I use since I don't have much ops/sysadmin experience. I feel like people saying that SQLite makes things easier often have already some experience in sysadmin/ops stuff, which isn't my case.

This is not a criticism of SQLite in any way, just an expression of my own limitations and why it makes it harder for me to consider SQLite. I'll gladly take any learning resources about that.

benbjohnson2 years ago

Litestream author here. I'm working on a serverless option that'll run as a managed service later this year. I think there's a huge opportunity to get a simple option for folks running on ephemeral platforms like Heroku, Google Cloud Run, & Cloudflare Workers.

Glench2 years ago

Ooh hi Ben, thanks so much for Litestream! Can you say more about the service you're describing and how it will work?

Zababa2 years ago

That's nice to know, I'll keep an eye out for when it comes out.

Glench2 years ago

The conservative approach — I like it! Yeah it's not for everyone but it's a real delight if your needs align with what SQLite offers.

> there's no "managed SQLite" offering

It's just a file :) You can copy it to a remote server or use Litestream or whatever you want to do with a file.

> SQLite is not compatible with Heroku

Yeah, that's pretty annoying. Apparently fly.io is similar to Heroku but lets you use persistent disks — I've never used it though.

I'd love to see more "managed app server + sqlite" options, since a lot of the times the "managed database server" option from cloud hosts is kind of expensive. Send emails to Heroku and DigitalOcean and Google Computer platform. Tell the world!

Zababa2 years ago

I see how SQLite being just a file makes it easy to do lots of things from a developer point of view. However, I'm looking at this from a "business owner" point of view, at which point it's not just a file, it's my clients data. That's a bit scary for me, so I have a bias for "reassuring stuff" like managed databases with automated backups. This is why I'm looking for more info on "the hard parts" of managing SQLite. I'm conscious that this bias might not be based on reality, especially since I don't know well MySQL or PostgreSQL, especially how they store data on files. But as a relatively young developer, I've heard a lot that "data should go in the database" and databases are big scary things that should be managed by people with lots of experience and knowlegde in that role.

I'll take a look at fly.io, thanks.

+1
7steps2much2 years ago
Glench2 years ago

Well maybe to you "managed database server" feels more reassuring which is fine, but to me I run my business off SQLite which I've found insanely simple to set up and maintain, so that is more reassuring to me.

And just an aside, I think if you looked into it a bit you'd probably find that while there is a lot of nuance to database fine tuning, you can get most of that without needing to be an expert.

gen2202 years ago

If you want an even-earlier-published article to point to as well, this has been mine: https://crawshaw.io/blog/one-process-programming-notes

There's also the venerable "Taco Bell Programming": http://widgetsandshit.com/teddziuba/2010/10/taco-bell-progra....

It all boils down to unix principles, I guess: use the most simple (but not simplistic) tools for the job.

eatonphil2 years ago

"always have to link" to a post that was written less than a month ago?

Glench2 years ago

Yes indeed! Even though it's only been a month I keep posting about it in HN threads about SQLite and people keep going "wow I never thought about that!"

_wldu2 years ago

If you setup Sqlite properly (Remote Clients <-> RESTful API <-> DB) and use WAL mode, you can run a DB backed website fairly cheap. It will be performant, reliable and simple to maintain as well.

hu32 years ago

Hi. What tool do you use to access production database remotely? Say you need to browse some data, perform queries. Is there something like pgAdmin?

I'm interested in SQLite and I'd like yo know more.

Glench2 years ago

Copying and pasting my answer from another subthread:

I just scp the whole database to my local machine and use "db Browser for SQLite" (a mac app) :)

Would love to hear if there are other server-based db browsers for SQLite, though.

eatonphil2 years ago

scp is the way I implemented querying a remote SQLite database under the hood in my data ide. Now that I think of it though I should probably add some clarification to docs or the UI that you're querying a local copy of the database and that your edits won't affect the original.

https://github.com/multiprocessio/datastation

jprd2 years ago

CloudBeaver works great for me, though I'm not doing anything intense.

https://github.com/dbeaver/cloudbeaver/

tmp_anon_222 years ago

Its SQLite so you would just copy the file and download it to a local machine for analysis through any number of tools. Presumably your setup would already be doing this as a form of backing up the data periodically.

irq-12 years ago

You can use the command line for sqlite on the server. There are lots of GUI tools if you copy the file local. Here's one I use https://sqlitestudio.pl

justinclift2 years ago

> Is there something like pgAdmin?

A fairly well used GUI is https://sqlitebrowser.org (a project I help out personally).

shedam2 years ago

You can do that with adminer https://www.adminer.org/

buryat2 years ago

how do you get replication?

Glench2 years ago

I don’t. I have the one server with streaming dB backups. Although apparently litestream might include a way to do live SQLite replication soon.

nerdyadventurer2 years ago

How do you do encryption, authentication and authorization?

Glench2 years ago

I'm not sure what that has to do with my choice of database. I just do it the way every web app does those things.

logankeenan2 years ago

How do you manage backups of your database and how often?

Glench2 years ago

I do streaming updates with Litestream: https://litestream.io/

I recently set it up after putting it off for a while and it took like 5 minutes. It was great. And the backups are currently costing me 0 cents per month on backblaze.

quadrature2 years ago

A long these lines if you're looking for a way to sync your SQLite DB to an S3 like object store there is https://litestream.io/. It creates a snapshot of your DB on s3 and then handles checkpointing of the WAL.

Glench2 years ago

Yes, love Litestream! It felt like the missing piece for replacing Postgres or MySQL with SQLite.

wirelesspotat2 years ago

Does anyone know if it's possible to use Litestream with SQLite running on Cloudflare Durable Objects?

I suppose you'd have to compile Litestream to WASM or have Litestream running on a different machine and reading SQLite from the Durable Object?

benbjohnson2 years ago

Litestream author here. I haven’t looked at Durable Objects yet so I don’t think Litestream would currently work with it. I may add support in the future but I’m adding separate serverless functionality to be released later this year. That’ll allow Litestream to run on Cloudflare Workers or GCR or other ephemeral platforms like those.

Glench2 years ago

Haha check the original post, my friend

WrtCdEvrydy2 years ago

That is something I was looking for.

hinkley2 years ago

I really think I want a “JSON” format for SQLite that is a strict subset of the SQLite file syntax that can be generated fairly trivially by other applications and read by SQLite. There are apps that produce JSON that would be better served by something I can run search queries upon and jq is nice but it’s not SQLite.

Is there a protobuf implementation of the SQLite file format out there somewhere?

rileymat22 years ago

I am having trouble with the mapping? I presume this touches on the orm impedance problem? Unless sqlite dictates the json format. But then it probably won’t be the json you like to deal with?

pphysch2 years ago

PostgreSQL has excellent JSON support today, including advanced queries, indexing, constraints. There are many cheap/free managed offerings that could rival SQLite in operational simplicity.

If you want a SQL interface to JSON with room to grow, that's your best bet IMO.

ignoramous2 years ago

> There are apps that produce JSON that would be better served by something I can run search queries upon...

https://duckdb.org/ ?

mathgladiator2 years ago

I actually think this is onto something that I'm finding in a different way. Instead of a massive database, what if we had a key-value store mapping keys to tiny databases.

This is, to some degree, what I'm building over at http://www.adama-lang.org/ without a full SQL engine. Each document has tables, and the tables can be indexed. I have yet to find a usecase (in my domain) which requires joins. HOWEVER, I've had a ton of fun building it and I'm getting ready to start making games.

I do believe it would be amazing to have a key-logger service where a reducer like sqlite/adama could come into collapse the log into a single file.

The closest I see is from the Boki paper ( https://www.cs.utexas.edu/~zjia/boki-sosp21.pdf ) which was presented at SOSP21.

dantodor2 years ago

I see the bundle size after `npm install -S @rkusa/wasm-sqlite` it's a little bit above 4MB. To my current knowledge, limit is 1MB for workers. I asked one of the companies I work with and that uses CF extensively a few months ago for a limit increase, it took them like almost one month and gave us 2MB. So my question is: what is the limit increase one should ask? What was your experience with that and what I need to ask for if I want to run this? Other than that, great job, and thanks a lot for sharing with the community. Until now I was trying to get DuckDB to run in a durable object (they also have a WASM bundle), now I have an excellent alternative. Thanks again!

vdm2 years ago

sqlite vfs on browser indexeddb: https://jlongster.com/future-sql-web

liveoneggs2 years ago

WebSql could have been..