Wild stuff
- Fork of SQLite with new features https://github.com/libsql/libsql
- And you can embed it in your application and it will really talk to your SQLite/libsql database over the network (subject of this blogpost): https://github.com/libsql/sqld
- Oh, and if you're wondering about backup to S3, they have that too: https://github.com/libsql/bottomless
- Uh, sqld can integrated with this https://github.com/losfair/mvsqlite, so now your SQLite is backed by FoundationDB!?
--
- Meanwhile Litestream exists https://github.com/benbjohnson/litestream/
- Ben is developing https://github.com/superfly/litefs at Fly so that you can talk to your SQLite through a FUSE filesystem. (and work has stopped on the streaming replication effort https://github.com/benbjohnson/litestream/issues/8)
- And, of course, SQLite has announced a new backend that hopes to support concurrent writes and streaming replication: https://sqlite.org/hctree/doc/hctree/doc/hctree/index.html
What a time for SQLite
Presumably all of these things provide a wildly different experience in terms of consistency, availability, latency, complexity, and concurrent use. Would be so nice to read a lengthy blogpost comparing all or some of these with their pros and cons.
Very cool and well executed project. Love the sprinkle of Rust in all the other companion projects as well :)
The ROAPI(https://github.com/roapi/roapi) project I built also happened to support a similar feature set, i.e. to expose sqlite through a variety of remote query interfaces including pg wire protocols, rest apis and graphqls.
Basically re-inventing MySQL / PG but worse. Next step, we don't have auth over the network, let's bake in RBAC into SQLite.
From what else they're doing, it appears the point is to be able to use sqlite in serverless setups where MySQL/Postgres would be way too heavy to deploy on a per-customer or per-function basis.
Honestly I don't get the point of edge. Do people really care whether their website loads in 50ms vs 300ms?
Of course they care. It directly impacts on conversion rates. I'm still stunned with simple fact that most people think 100ms doesn't make a difference (for their business).
https://www.globaldots.com/resources/blog/how-website-speed-...
> There is even such a thing as too fast. If a user doesn't notice the page updated due to speed, that is also a poor experience.
If you're relying on your app's execution time to demonstrate this to a user, then you have poor UX. Instant changes and updates have been the norm in mobile apps forever and optimistic UI is becoming pretty standard on the web too, see Linear for example.
Can't they just cache let's say these 1000 marketing, sales and customer support pages into cdn then?
To take a different tack then the other two (at time of writing) replies to this: It's not just at the edge. Being able to add databases with wild abandon and just provide a key to an S3 compatible bucket to replicate it to, and not have to worry about any server setup or replication is appealing whether or not your setup is otherwise centralised. For some setups you do want to share data across user accounts, but for others, isolation along customer or individual user boundaries is not just fine but an advantage (no accidentally writing a query that returns too much data). And then, it's a plus if having a million databases and selectively spread them out over your servers is trivial.
One request being 50ms or 300ms probably doesn't really matter.
But 10,000 requests _per second_ being 50 or 300ms matters a lot
Sure it's on a bit of a hype wave at the moment, but yes, most internet users are mobile users that don't live in major metropolitan areas with robust internet infrastructure? I think at least having the option to serve some things at the edge can dramatically improve the browsing experience for those users (and many others!).
I never thought people take restricting access from within database seriously. Like row security policies [1] in postgres. But now as I look at it, it must be taken seriously, just because those features exist.
Is anybody here using it in production with success?
[1] https://www.postgresql.org/docs/current/ddl-rowsecurity.html
Jep, my company uses Postgres RLS for a pretty big project for a client in the finance sector. It's the foundation of our multi-tenant setup there. We connect with a single DB user, but set a session variable at the start of each transaction. This session variable is then used by the RLS policies to filter the data.
Works like a charm, you basically get to build your app like it was dealing with a single tenant DB. Just make sure it's as hard as humanly possible for application developers to forget setting that tenant ID when they query... In our case we have a custom JPA base repository which takes care of that, and some PMD rules to catch rogue repos/queries.
If I understand correctly, it seems to be a cornerstone of Supabase’s Authorization features. https://supabase.com/docs/guides/auth/row-level-security
> But now as I look at it, it must be taken seriously, just because those features exist.
You can do pretty much everything within Postgres, from ETL to Authz to serving HTML templates and GIS apps. However, that doesn't mean you should, or that anyone is seriously using it in production on a large scale (after evaluating alternatives).
Yes I've used it a bunch in production it's great. It is highly valued in some PII-conscious fields for compliance reasons that I don't fully understand, but becoming competent with postgres RLS has been a huge benefit for my career.
The main practical downside is that it is more precise and rigorous than your app- or business-level auth logic is likely to be, and has no real escape hatches. If you're trying to drop it on an existing system you're going to spend a lot of time going back and forth with whoever to shake out edge case ambiguities in your rbac policy that weren't obvious or relevant before.
I've been using render.com to host Rust+sqlite stacks recently, and I'd like to leverage this.
Should my Rust server be running sqld, and I would add a passthrough endpoint through its existing http api to the sqlite?
Or alternatively, should I be using sqld locally to access my sqlite instance through SSH or something?
Excuse my ignorance but isn't the whole point of SQLite that it's embedded, not clint-server?
For us, a major point of SQLite is that we can execute queries in ~100uS when deployed on top of NVMe storage. Everything being in the same process is 50% of the value proposition for our product. Once you involve a network and another computer, it gets a lot slower unless you are doing RDMA-style shenanigans with very expensive hardware.
The other 50% of the value proposition is simplified operations in a B2B setting.
> For us, a major point of SQLite is that we can execute queries in ~100uS when deployed on top of NVMe storage.
Postgres can't achieve these query times?
A Dev once told me that running SQL Server and IIS (Microsoft webserver) on the same machine would allow shared memory. Just pass a pointer from db to web; no copying of the data. Postgres doesn't do this AFAIK.
How long does it take to round trip through the network stack of 2 machines in a typical datacenter?
Ok - How many microseconds does it take to round trip SELECT 1 to localhost on a typical cloud VM?
You are still using the network stack and involving multiple processes.
Ideally you want both an embedded database, a library for manipulating requests understood by the database, and a few libraries for exposing the same interface over different networks.
I don't get why you would want this. It seems like its taking away all the key advantages of the sqlite approach and leaving only the disadvantages.
This is quite an accomplishment. It is unfortunate that it could not be done in the main project.
Note that this already exists on top of SQLite proper - authored by Ben Johnson (Litestream, Fly.io etc.) - https://github.com/benbjohnson/postlite
I think they are quite different it seems. Postlite expects you to connect over the postgres wire protocal. Sqld is compiled into your application so your application behaves like it's talking to an embedded sqlite, the calls are then made over the network (using one of three available transports) before being returning to your application.
Dumb question, with all of this newfangled WASM stuff, why couldn't we also bake the Postgres server (and then client) into the code? I know the WASM runtime would need to expose the low-level ability to do filesystem operations, mmap()ing, network sockets, etc.
Enlighten me, if it needs a host different network location than the app, what is the advantage over postgres, is it because it's easier to "setup" and "maintain?
* extremely easy to get started. * unmatched testability, since you can now run the same code in CI and production and pass .db files to your tests. * extremely cheap and lightweight replication.
What's kind of codes that run on CI AND production? You mean passing a prod-clone.db to test suits instead of fixtures?
Given their webassembly sqlite function support and other changes, the advantage would seem to be that you're most of the way there to being able to provide "serverless" databases. Have a proxy handle auth, spin up an instance if it's not running/shut it down after a while, and add syncing of changes to/from object storage, and you're there.
> syncing of changes to/from object storage
So you are suggesting to have both the "serverless" db and the storage? If so, you now have 3 problems.
I'm describing capabilities that already exist. Reliably streaming sqlite data to object storage is not a new thing and supported by multiple separate implementations at this point.
Did not at any point suggest multiple sources of data.
EDIT: While there are things in their repos that suggests they might be thinking about moving towards allowing multiple writers, what's currently there suggests a single current active instance of each database, with the WAL being sync'ed to object storage so that in the case of failure and/or when doing a cold-start, the database can be brought back from object storage.
Interesting, hadn't thought about it like that but it makes sense. Used SQLite for a desktop app but never thought it would make any sense on the server.
A noob question, how does one specify unique constraint on a column in SQlite?
"libSQL" is a silly name. It's not a library is it?
I wish people would at least vaguely try to be helpful naming products.
From the SQLite homepage:
"SQLite is a C-language library that implements a small, fast, self-contained, high-reliability, full-featured, SQL database engine."
Wouldn't a fork also be a library?
of course it is a library. An embedded database is essentially a library that you add to your application.
That you can build stuff around it, doesn't invalidate the fact that the core of it is a library.
It seems like it’s a library you’re meant to embed in an application and that sqld is just one application built on libSQL.
It does seem like it makes more sense to call it something that alludes that it’s postgres
[dead]
Questions for libsql:
- Looks like bottomless does automatic restoring of the database? Litestream seems to avoid this, I assume because of concerns about accidentally creating multiple writers on a rolling-deploy (or similar mistake). Any concerns about this possible footgun?
- Bottomless is a sqlite extension, not a separate process? Pros and cons compared to Litestream?
- Similar questions for sqld. How does sqld handle the lifecycle of deploys and multiple readers/writers talking to the database? Anything a new user should be concerned about?
First of all, let me start by reiterating the first sentence from the bottomless repo - it's work in heavy progress (and we'll move it under the sqld/ repo soon, to keep everything in one place). Now, answers:
> - Looks like bottomless does automatic restoring of the database? Litestream seems to avoid this, I assume because of concerns about accidentally creating multiple writers on a rolling-deploy (or similar mistake). Any concerns about this possible footgun?
It's a valid concern, but what always happens on boot is starting a new generation (a generation is basically a snapshot of the main file + its continuously replicated WAL), distinguished by a uuid v7, the timestamped one. So even if a collision happens, it would be recoverable - e.g. one of the stray generations should be deleted.
> - Bottomless is a sqlite extension, not a separate process? Pros and cons compared to litestream?
The only con I see is that a bug in the extension could interfere with the database. As for pros: way less maintenance work, because everything is already embedded, we're also hooked into the database via a virtual WAL interface (libSQL-only), so we have full control over when to replicate, without having to observe the .wal file and reason about it from a separate process.
> - Similar questions for sqld. How does sqld handle the lifecycle of deploys and multiple readers/writers talking to the database? Anything a new user should be concerned about?
There are going to be multiple flavors of sqld, but the rough idea would be to trust the users to only launch a single primary. In the current state of the code, replicas contact the primary in order to register themselves, so the model is centralized. Once we build something on top of a consensus algorithm, leader election will be pushed to the algorithm itself.
Very cool, thank you for the insights
I wrote a small PHP library that gives you a key-value storage interface to SQlite files: https://github.com/aaviator42/StorX
I've been dogfooding for a while by using it in my side projects.
And there's a basic API too, to use it over a network: https://github.com/aaviator42/StorX-API
ICYMI, similar concept for python https://dataset.readthedocs.io/
bottomless looks really nice, thanks for sharing!