Back

SQLite-based databases on the Postgres protocol

165 points9 hoursblog.chiselstrike.com
maxmcd5 hours ago

Wild stuff

- Fork of SQLite with new features https://github.com/libsql/libsql

- And you can embed it in your application and it will really talk to your SQLite/libsql database over the network (subject of this blogpost): https://github.com/libsql/sqld

- Oh, and if you're wondering about backup to S3, they have that too: https://github.com/libsql/bottomless

- Uh, sqld can integrated with this https://github.com/losfair/mvsqlite, so now your SQLite is backed by FoundationDB!?

--

- Meanwhile Litestream exists https://github.com/benbjohnson/litestream/

- Ben is developing https://github.com/superfly/litefs at Fly so that you can talk to your SQLite through a FUSE filesystem. (and work has stopped on the streaming replication effort https://github.com/benbjohnson/litestream/issues/8)

- And, of course, SQLite has announced a new backend that hopes to support concurrent writes and streaming replication: https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html

What a time for SQLite

Presumably all of these things provide a wildly different experience in terms of consistency, availability, latency, complexity, and concurrent use. Would be so nice to read a lengthy blogpost comparing all or some of these with their pros and cons.

maxmcd5 hours ago

Questions for libsql:

- Looks like bottomless does automatic restoring of the database? Litestream seems to avoid this, I assume because of concerns about accidentally creating multiple writers on a rolling-deploy (or similar mistake). Any concerns about this possible footgun?

- Bottomless is a sqlite extension, not a separate process? Pros and cons compared to Litestream?

- Similar questions for sqld. How does sqld handle the lifecycle of deploys and multiple readers/writers talking to the database? Anything a new user should be concerned about?

psarna5 hours ago

First of all, let me start by reiterating the first sentence from the bottomless repo - it's work in heavy progress (and we'll move it under the sqld/ repo soon, to keep everything in one place). Now, answers:

> - Looks like bottomless does automatic restoring of the database? Litestream seems to avoid this, I assume because of concerns about accidentally creating multiple writers on a rolling-deploy (or similar mistake). Any concerns about this possible footgun?

It's a valid concern, but what always happens on boot is starting a new generation (a generation is basically a snapshot of the main file + its continuously replicated WAL), distinguished by a uuid v7, the timestamped one. So even if a collision happens, it would be recoverable - e.g. one of the stray generations should be deleted.

> - Bottomless is a sqlite extension, not a separate process? Pros and cons compared to litestream?

The only con I see is that a bug in the extension could interfere with the database. As for pros: way less maintenance work, because everything is already embedded, we're also hooked into the database via a virtual WAL interface (libSQL-only), so we have full control over when to replicate, without having to observe the .wal file and reason about it from a separate process.

> - Similar questions for sqld. How does sqld handle the lifecycle of deploys and multiple readers/writers talking to the database? Anything a new user should be concerned about?

There are going to be multiple flavors of sqld, but the rough idea would be to trust the users to only launch a single primary. In the current state of the code, replicas contact the primary in order to register themselves, so the model is centralized. Once we build something on top of a consensus algorithm, leader election will be pushed to the algorithm itself.

maxmcd5 hours ago

Very cool, thank you for the insights

aaviator425 hours ago

I wrote a small PHP library that gives you a key-value storage interface to SQlite files: https://github.com/aaviator42/StorX

I've been dogfooding for a while by using it in my side projects.

And there's a basic API too, to use it over a network: https://github.com/aaviator42/StorX-API

vdm59 minutes ago

ICYMI, similar concept for python https://dataset.readthedocs.io/

houqp4 hours ago

bottomless looks really nice, thanks for sharing!

houqp4 hours ago

Very cool and well executed project. Love the sprinkle of Rust in all the other companion projects as well :)

The ROAPI(https://github.com/roapi/roapi) project I built also happened to support a similar feature set, i.e. to expose sqlite through a variety of remote query interfaces including pg wire protocols, rest apis and graphqls.

Thaxll6 hours ago

Basically re-inventing MySQL / PG but worse. Next step, we don't have auth over the network, let's bake in RBAC into SQLite.

vidarh5 hours ago

From what else they're doing, it appears the point is to be able to use sqlite in serverless setups where MySQL/Postgres would be way too heavy to deploy on a per-customer or per-function basis.

matesz5 hours ago

Honestly I don't get the point of edge. Do people really care whether their website loads in 50ms vs 300ms?

mirzap5 hours ago

Of course they care. It directly impacts on conversion rates. I'm still stunned with simple fact that most people think 100ms doesn't make a difference (for their business).

https://www.globaldots.com/resources/blog/how-website-speed-...

+1
billythemaniam5 hours ago
matesz5 hours ago

Can't they just cache let's say these 1000 marketing, sales and customer support pages into cdn then?

vidarh5 hours ago

To take a different tack then the other two (at time of writing) replies to this: It's not just at the edge. Being able to add databases with wild abandon and just provide a key to an S3 compatible bucket to replicate it to, and not have to worry about any server setup or replication is appealing whether or not your setup is otherwise centralised. For some setups you do want to share data across user accounts, but for others, isolation along customer or individual user boundaries is not just fine but an advantage (no accidentally writing a query that returns too much data). And then, it's a plus if having a million databases and selectively spread them out over your servers is trivial.

VWWHFSfQ5 hours ago

One request being 50ms or 300ms probably doesn't really matter.

But 10,000 requests _per second_ being 50 or 300ms matters a lot

maxmcd5 hours ago

Sure it's on a bit of a hype wave at the moment, but yes, most internet users are mobile users that don't live in major metropolitan areas with robust internet infrastructure? I think at least having the option to serve some things at the edge can dramatically improve the browsing experience for those users (and many others!).

matesz5 hours ago

I never thought people take restricting access from within database seriously. Like row security policies [1] in postgres. But now as I look at it, it must be taken seriously, just because those features exist.

Is anybody here using it in production with success?

[1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html

sally_glance5 hours ago

Jep, my company uses Postgres RLS for a pretty big project for a client in the finance sector. It's the foundation of our multi-tenant setup there. We connect with a single DB user, but set a session variable at the start of each transaction. This session variable is then used by the RLS policies to filter the data.

Works like a charm, you basically get to build your app like it was dealing with a single tenant DB. Just make sure it's as hard as humanly possible for application developers to forget setting that tenant ID when they query... In our case we have a custom JPA base repository which takes care of that, and some PMD rules to catch rogue repos/queries.

aobdev5 hours ago

If I understand correctly, it seems to be a cornerstone of Supabase’s Authorization features. https://supabase.com/docs/guides/auth/row-level-security

pphysch5 hours ago

> But now as I look at it, it must be taken seriously, just because those features exist.

You can do pretty much everything within Postgres, from ETL to Authz to serving HTML templates and GIS apps. However, that doesn't mean you should, or that anyone is seriously using it in production on a large scale (after evaluating alternatives).

giraffe_lady4 hours ago

Yes I've used it a bunch in production it's great. It is highly valued in some PII-conscious fields for compliance reasons that I don't fully understand, but becoming competent with postgres RLS has been a huge benefit for my career.

The main practical downside is that it is more precise and rigorous than your app- or business-level auth logic is likely to be, and has no real escape hatches. If you're trying to drop it on an existing system you're going to spend a lot of time going back and forth with whoever to shake out edge case ambiguities in your rbac policy that weren't obvious or relevant before.

ronanyeah3 hours ago

I've been using render.com to host Rust+sqlite stacks recently, and I'd like to leverage this.

Should my Rust server be running sqld, and I would add a passthrough endpoint through its existing http api to the sqlite?

Or alternatively, should I be using sqld locally to access my sqlite instance through SSH or something?

cutler6 hours ago

Excuse my ignorance but isn't the whole point of SQLite that it's embedded, not clint-server?

bob10296 hours ago

For us, a major point of SQLite is that we can execute queries in ~100uS when deployed on top of NVMe storage. Everything being in the same process is 50% of the value proposition for our product. Once you involve a network and another computer, it gets a lot slower unless you are doing RDMA-style shenanigans with very expensive hardware.

The other 50% of the value proposition is simplified operations in a B2B setting.

MuffinFlavored5 hours ago

> For us, a major point of SQLite is that we can execute queries in ~100uS when deployed on top of NVMe storage.

Postgres can't achieve these query times?

irq-12 hours ago

A Dev once told me that running SQL Server and IIS (Microsoft webserver) on the same machine would allow shared memory. Just pass a pointer from db to web; no copying of the data. Postgres doesn't do this AFAIK.

bob10295 hours ago

How long does it take to round trip through the network stack of 2 machines in a typical datacenter?

+1
MuffinFlavored4 hours ago
Kinrany6 hours ago

Ideally you want both an embedded database, a library for manipulating requests understood by the database, and a few libraries for exposing the same interface over different networks.

bawolff4 hours ago

I don't get why you would want this. It seems like its taking away all the key advantages of the sqlite approach and leaving only the disadvantages.

chasil6 hours ago

This is quite an accomplishment. It is unfortunate that it could not be done in the main project.

drej6 hours ago

Note that this already exists on top of SQLite proper - authored by Ben Johnson (Litestream, Fly.io etc.) - https://github.com/benbjohnson/postlite

maxmcd6 hours ago

I think they are quite different it seems. Postlite expects you to connect over the postgres wire protocal. Sqld is compiled into your application so your application behaves like it's talking to an embedded sqlite, the calls are then made over the network (using one of three available transports) before being returning to your application.

MuffinFlavored5 hours ago

Dumb question, with all of this newfangled WASM stuff, why couldn't we also bake the Postgres server (and then client) into the code? I know the WASM runtime would need to expose the low-level ability to do filesystem operations, mmap()ing, network sockets, etc.

Existenceblinks6 hours ago

Enlighten me, if it needs a host different network location than the app, what is the advantage over postgres, is it because it's easier to "setup" and "maintain?

glommer3 hours ago

* extremely easy to get started. * unmatched testability, since you can now run the same code in CI and production and pass .db files to your tests. * extremely cheap and lightweight replication.

Existenceblinks3 hours ago

What's kind of codes that run on CI AND production? You mean passing a prod-clone.db to test suits instead of fixtures?

vidarh5 hours ago

Given their webassembly sqlite function support and other changes, the advantage would seem to be that you're most of the way there to being able to provide "serverless" databases. Have a proxy handle auth, spin up an instance if it's not running/shut it down after a while, and add syncing of changes to/from object storage, and you're there.

Existenceblinks4 hours ago

> syncing of changes to/from object storage

So you are suggesting to have both the "serverless" db and the storage? If so, you now have 3 problems.

vidarh4 hours ago

I'm describing capabilities that already exist. Reliably streaming sqlite data to object storage is not a new thing and supported by multiple separate implementations at this point.

+1
Existenceblinks4 hours ago
jimperio5 hours ago

Interesting, hadn't thought about it like that but it makes sense. Used SQLite for a desktop app but never thought it would make any sense on the server.

_4482 hours ago

A noob question, how does one specify unique constraint on a column in SQlite?

philipwhiuk4 hours ago

"libSQL" is a silly name. It's not a library is it?

I wish people would at least vaguely try to be helpful naming products.

zffr4 hours ago

From the SQLite homepage:

"SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine."

Wouldn't a fork also be a library?

glommer4 hours ago

of course it is a library. An embedded database is essentially a library that you add to your application.

That you can build stuff around it, doesn't invalidate the fact that the core of it is a library.

Spivak4 hours ago

It seems like it’s a library you’re meant to embed in an application and that sqld is just one application built on libSQL.

It does seem like it makes more sense to call it something that alludes that it’s postgres

newaccount20215 hours ago

[dead]