Back

The Great Migration from MongoDB to PostgreSQL

345 points26 daysinfisical.com
bit_flipper25 days ago

I've run Postgres at large scale (dozens of machines) at multiple companies. I've also run MongoDB at large scale at multiple companies. I like both generally. I don't really care about data modelling differences - you can build the same applications with approximately the same schema with both if you know what you're doing.

I don't understand how folks seemingly ignore Postgres' non-existent out of the box HA and horizontal scaling support. For small scale projects that don't care about these things, fair enough! But in my experience every Postgres installation is a snowflake with cobbled together extensions, other third party software, and home-rolled scripts to make up for this gap. These third party pieces of software are often buggy, half-maintained, and under-documented. This is exacerbated by Postgres' major version file format changes making upgrades extremely painful.

As far as I can tell, there is no interest in making these features work well in Postgres core because all of the contributors' companies make their money selling solutions for HA/sharding. This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things. I don't really want to run another Postgres cluster myself again. For a single node thing where I don't care about HA/scaling I do quite like it, though.

bognition25 days ago

You'll never see true support for horizontal scalability in Postgres because doing so would require a fundamental shift in what Postgres is and the guarantees is provides. Postgres is available and consistent. It cannot truly be partitionable without impacting availability or consistency.

When an application grows to such a scale that you need a partitionable datastore it's not something you can just turn on. If you've been expecting consistency and availability, there will be parts of your application that will break when those guarantees are changed.

When you hit the point that you need horizontally scalable databases you must update the application. This is one of the reasons that NewSQL databases like CockroachDB and Vitess are so popular. They expose themselves as a SQL database but make you deal with the availability/consistency problems on day 1, so as your application scales you dont need to change anything.

Context: I've built applications and managed databases on 10's of thousands of machines for a public saas company.

BiteCode_dev25 days ago

Because vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more. There is just few incentives.

Especially since:

- Servers will keep getting better and cheaper with time.

- Data is not only in postgres, you probably have redis, clickhouse and others, so the charge is balanced. In fact you may have different dedicated postgres, like one for GIS tasks.

- Those hacky extensions are damn amazing. No product in the world is that versatile.

- Posgres has much better support from legacy frameworks like django/ror/laravel than nosql alternatives. People shits on ORM, but they enable a huge plugin well integrated ecosystem that makes you super productive, and PG is happily and transparently handling all that.

- If by some miracle you actually reach the point you need this, you'll have plenty of money to pay for commercial HA/sharding, or migrate. So why think about it now?

riku_iki25 days ago

> vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more

its less about the scale and more about HA and service interruption: your service will be down if server dies.

hahn-kev24 days ago

Never heard of docker/k8s?

riku_iki24 days ago

I don't think these two words will buy you HA automagically. You will need 3 layers of various open source components on top, and I am not sure if they will improve or reduce HA at the end.

amluto25 days ago

> This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things.

I’m not sure precisely what you mean by “HA”, but, in my experience, out-of-the-box support for the most basic replication setup in MySQL is pretty bad. Just to rattle off a few examples:

Adding a replica involves using mysqldump, which is, to put it charitably, not a very good program. And the tools that consume its output are even worse!

There is nothing that shops with MySQL that can help verify that a replica is in sync with its primary.

Want to use GTID (which is the recommended mode and is more or less mandatory for a reasonable HA setup)? Prepare for poor docs. Also prepare for the complete inability of anyone’s managed offering to sync to an existing replica set via mysqldump’s output. RDS will reject the output due to rather fundamental permission issue, and the recommended (documented!) workaround is simply incorrect. It’s not clear that RDS can do it right. At least Azure sort of documents that one can manually real and modify the mysqldump output and then issue a manual API call (involving the directives that you manually removed from the dump) to set the GTID state.

Want point-in-time recovery? While the replication protocol supports it, there is no first-party tooling. Even just archiving the replication logs is barely supported. Postgres makes it a bit awkward, but at least the mechanisms are supported out of the box.

But maybe the new-ish cluster support actually works well one it’s set up as long as you don’t try to add managed RDS-style nodes?

evanelias25 days ago

> Adding a replica involves using mysqldump

That's one path, but it is not the only way, and never has been.

MySQL 8.0.17 (released nearly 5 years ago!) added support for physical (binary) copy using the CLONE plugin. And MySQL Shell added logical dump/reload capabilities in 8.0.21, nearly 4 years ago.

Third-party solutions for both physical and logical copy have long been available, e.g. xtrabackup and mydumper, respectively.

And there was always the "shut down the server and copy the files" offline approach in a pinch.

amluto25 days ago

CLONE is indeed nifty. But why is it a plugin? And who don’t any of the major hosted services support it? (Or do they? The ones I checked don’t document any support.)

I wouldn’t call xtrabackup or mydumper an out-of-the-box solution.

evanelias25 days ago

What's wrong with CLONE being a MySQL plugin? I mean a good chunk of this page is people praising Postgres for its plugins.

As for support in hosted cloud providers, that's a question for the cloud providers, no one else can answer this. But my best guess would be because they want you to use their in-house data management offerings, snapshot functionality, etc instead of porting MySQL's solution into the security restrictions of their managed environment.

Yes, xtrabackup and mydumper are third-party tools, as I noted. If you needed something out-of-the-box prior to CLONE, the paid MySQL Enterprise Edition has always included a first-party solution (MySQL Enterprise Backup, often abbreviated as MEB). Meanwhile Community Edition users often gravitated to Percona's xtrabackup instead as a similar FOSS equivalent, despite not being a first-party / out-of-the-box tool.

mixmastamyk25 days ago

Citus is open source and well financed. This comment may have made sense a few years ago, but no longer.

evanelias25 days ago

By "well financed" you mean "owned by Microsoft"?

That situation raises a separate set of concerns, especially in the context of Microsoft's main database cash cow being SQL Server, not Postgres/Citus.

sgent24 days ago

How is that different than owned by Oracle?

evanelias24 days ago

Yep, exactly. Apologies, my previous comment was semi-sarcastic but in retrospect that was way too vague :)

On average, HN leans anti-MySQL, with concerns about Oracle ownership frequently cited in these discussions (mixed in with some historic distrust of MySQL problems that were solved long ago). But I rarely see the same sentiment being expressed about Citus, despite some obvious similarities to their ownership situation.

Personally I don't necessarily think the ownership is a huge problem/risk in either case, but I can understand why others feel differently.

mixmastamyk25 days ago

I'm as skeptical of MS as anyone. However it is licensed GNU AGPL, so not particularly worried.

switch00725 days ago

I guess some people really, really dislike Oracle (understandably).

And MariaDB is lagging behind, less and less compatible with MySQL etc leading to various projects dropping support for it - notably Azure. I wouldn't pick it for a new project.

jmuguy25 days ago

This depends on what level you consider HA and horizontal scaling to be required. I could make the same argument, based on my personal experience, that postgis ought to be included out of the box. Of course, I'll assume most people don't need it :)

klysm25 days ago

I feel like I have read this exact comment before verbatim

zulban25 days ago

If you like this kind of thing, I also migrated from mongodb to postgresql and wrote about it here: https://blog.stuartspence.ca/2023-05-goodbye-mongo.html

My post is more technical, with examples and graphs, and less business-y.

bberrry24 days ago

I like your custom prompt. I'll be trying it out for a while!

jpgvm25 days ago

I have done this migration twice and rethinkdb to PostgreSQL once. At this point I think document DBs are as good as dead for new projects. They will live for a really long time still but are in contraction and rent seeking mode now. Expect MongoDB licensing and hosting to increase in price and languish in terms of feature development from here on out.

Jolter25 days ago

Maybe you’re just experiencing the trough of disillusionment, from the Gartner curve? I think it’s fair to call document databases a mature technology now, rather than “dead”.

Semionilo25 days ago

If your normal DB can handle documents like a doc focused DB but better, than it might no longer be worth it to have it as a category but as a feature

aembleton23 days ago

Why did you move from RethinkDB to Postgres? I've very little experience with Rethink and have only played around with it a few years ago with some small project but found it to be really interesting. I'm just curious about what issues you experienced with it.

Beefin25 days ago

mongodb is a fortune 100 company and blows through expectations every quarter. doc dbs are far from dead.

unmole25 days ago

> mongodb is a fortune 100 company

Excuse me? MongoDB Inc. posted a mere $1.68 billion in revenue (with a $177 million loss). Coco-Cola posted $45 billion in revenue and ranks at #100. Forget Fortune 100, MongoDb isn't even in the reckoning for Fortune 500.

forgotmyinfo25 days ago

Huh, I didn't realize database performance correlates so strongly with stock price. Are there Prometheus metrics for this?

jpgvm25 days ago

I didn't say it wouldn't make a ton of money, there is a huge number of MongoDB instances out there in the wild in Fortune 500 due to the MEAN stack days (just like RoR before it). I fully expect the company to keep blowing away expectations by increasing how much revenue they extract from that existing customer base. That said they -still- aren't profitable, they might be profitable next quarter in fact I expect them to be with the current environment but to only reach profitability now after their glory growth days are behind them seems... well pretty garbage. If anything it reminds of me of GoPro except instead of cheap Chinese clones it's PostgreSQL and JSONB that is coming for their lunch.

zachmu25 days ago

PostgreSQL really is eating the database world.

Although in this case, the authors originally chose an architecture that was poorly suited to their data model. They had relational data and put it in a non-relational store. This was obviously always going to cause problems.

SOLAR_FIELDS25 days ago

Over the years I’ve learned two things when it comes to picking a database when starting a new project

1 - most data is inherently relational

2 - Postgres is pretty good at basically any problem you throw at it, pretty scalably. It will fall over at bonkers amount of scale in some cases but you probably won’t have bonkers amount of scale

Thus, if you want to pick any new data store that isn’t Postgres for your project, the default is Postgres. You have to specifically convince me that Postgres is not good enough for this specific use case if you want me on board for anything else as a data store.

zachmu25 days ago

It's a pretty good default stance, yeah.

We have been trying to convince people to use our new database [1] for several years and it's an uphill battle, because Postgres really is the best choice for most people. They really have to need our unique feature (version control) to even consider it over Postgres, and I don't blame them.

[1] https://github.com/dolthub/dolt

chrisjc25 days ago

In my experience, it's always come down to either using a specific database type for the wrong job, or using a database type the wrong way.

- Treating an OLAP like an OLTP.

- Using a document store for relational data bc managing relations in SQL is tedious and slows development/progress/features.

- Using OLTP for analytics.

- and so on...

And the usual reaction ensues. Conversations begin about migrating to another product while not really paying attention to interaction patterns and the intent/reason for the various workloads.

Of course such migrations only lead to either...

- Place-shifting the issues, or

- Resolving the issues that lead to the migration while creating new ones that didn't exist before. Workloads that were suited to the existing DB are now mismatched to the new one.

However, Postgres seems to be a pretty safe bet when such migrations are undertaken due to how versatile it is. Personally, I don't care which database is used (within reason of course). Just use the right database type for the job.

namaria25 days ago

> As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Sir Tony Hoare states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.

I find it interesting that one sentence claims they made an optimal choice for feature delivery speed, and the next one that they rationalized it as "non optimal as prescribed by Hoare". Never mind the fact that three sentences down the original quote Hoare said "It is often a mistake to make a priori judgments about what parts of a program are really critical, since the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail."

dangtony9825 days ago

The nature of the use-case evolved and, at one point, the MongoDB + Mongoose ORM combination was in fact optimal judging from various aspects including tooling familiarity with most to the primary use-case being the managed cloud service. As the situation evolved, it no longer became optimal and was superseded by various stack-wide adjustments.

Regarding tooling familiarity: In the earlier stages of a company, I'd argue it to be important to pick what you can ship with fastest. You don't want to prematurely, for example, setup a complex microservices setup and deployment with K8s for an idea that is in its infancy and that might not even get that much traffic to begin with. Instead, deploying a monolothic app to Heroku might be the optimal choice (YC has a more extreme term for this they call "duct tape engineering"). As the company grows and there is clearer product market fit, it makes sense to further optimize the stack and perform adjustments as needed.

Regarding the use-case changing, it wasn't entirely obvious in the beginning that a lot of users would self-host the product, so it was designed more-so with the cloud product / managed service in mind; in hindsight things seem more obvious but not in the moment. As it became clearer that more and more people were going to self-host this product, the team shipped more features to accommodate for a simpler self-hosting experience and, as part of that, the PostgreSQL migration.

p-e-w26 days ago

At some point during the past 10 years or so, the world realized that document databases are actually a bad idea, and relational databases can do the same job better with just a few easy-to-implement QoL improvements (such as JSON operators in SQL).

Meanwhile MongoDB's creators thought this was a great time to make Mongo non-free software, presumably with the goal of making a quick buck from cloud operators.

The result is pretty much the landscape we are seeing today, with Postgres reigning supreme, and even SQLite being experimented with in server setups, while Mongo is on life support and MySQL not far behind.

vmatsiiako26 days ago

I wouldn't say "Mongo is on life support" – it's actually a very successful business growing 30% YoY on a massive scale. Yet, I agree that the license switch has definitely damaged their long-term ecosystem.

smt8826 days ago

They grew 30% if you look at revenue, but that's not how people determine whether a business is healthy.

EBITDA is a better metric, and that number was getting dramatically worse every year until 2023.

But even in 2023, EBITDA was -$202M. I don't see it ever approaching $0.

So it's still a shit business on top of a shit product that no one ever really needed.

Joel_Mckay25 days ago

I think anyone who administers mongodb would agree it has a list of very dangerous behaviors for novices.

Try to upgrade a running instance node between some versions, and one may thrash object keys or worse. Best to dump and re-load your data every major upgrade to recover all set properties... and there still may be slight differences in the query language later. Wait, your data sets are 1.5TiB per node.... that is a lot of downtime...

The cursor, projection, and re-format queries can be very good at reducing reporting traffic. However, the 3 page long syntax can be unreadable to the uninitiated. The json essentially jams a bunch of equivalent SQL queries in 1 transaction, but you still have to be careful as there is no real guarantee of ordering without auto-indices in clusters.

mongodb is not as clean as SQL, but works for storing constantly changing meta data. i.e. if you find your SQL schema ends up with tuples that have a lot of poorly designed null entries, table edits, and or simply implements an object storage class abstraction in massive meta-data and object-class catch-all tables.

And yes, the mongodb new licensing model telegraphs an unfavorable policy posture. However, I do understand not wanting fortune 500 companies exploiting your teams time for free.

Profit is not an indication of quality, but rather fiscal utility. ;-)

DrNash1713 days ago

what is mongo's new licensing model?

dangtony9825 days ago

I'm actually very bullish on MongoDB.

While the article and significant sentiment in threads suggest a push off MongoDB toward PostgreSQL, I do think MongoDB has its own place in the stack and that it won't be obsoleted. I've personally had many pleasant experiences working with it in past projects.

Regarding business metrics, I may have a slight bias coming from the startup world but we often value revenue the most, especially in the earlier stages of a company; we can reduce costs and perform various optimizations in the future but what stays at the foundation is whether or not there is strong product market fit such that more customers keep coming in and coming back to use the underlying product — I would say MongoDB's current growth trajectory is in line with that.

I'm hopeful overall that they can turn profitable (maybe not this year or next but eventually so).

+1
smt8825 days ago
rrr_oh_man25 days ago

I was totally expecting someone to come up with some vanity metric here, but a negative EBITDA of several HUNDRED MILLION? Jesus. How did that happen. B2B business should be a symphony of ka-chings for them.

otabdeveloper425 days ago

MongoDB is great if what you need is a replicated log or queue. It's what Kafka should have been.

+1
smt8825 days ago
basil-rash25 days ago

Why do you choose EBITDA? Last I heard it was utter nonsense”, at least according to Warren Buffet.

https://m.youtube.com/watch?v=tvnKylAyLbQ

+1
roenxi25 days ago
airstrike25 days ago

Buffett's criticism is that EBITDA is just accounting and he cares about cash flow, specifically also after paying for CAPEX (which is Investing Cash Flow so comes after Operating Cash Flow)

EBITDA is used in the industry because it is a proxy for operating cash flow. Sometimes you don't have all the available data needed to get to OCF, or you're looking at company guidance (for future EBITDA values) or analyst estimates. It's easier to keep the conversation at the EBITDA level because it requires fewer assumptions. Generally the revenue line is ~easy to estimate because you can conceptualize how to go from the current number of customers to some future number of customers, how many dollars per customers and so on and so forth.

Then as you work your way to EBIT (Operating Income) you still have to assume some gross margin, R&D expense, etc. These are pretty tangible. It should be pretty easy to get to estimated EBIT from what the company discloses in guidance or what analysts forecast. Since D&A is pretty linear over time, people generally assume it just remains constant as a % of revenue, so now you have EBITDA which is very much like cash flow

EBITDA is similar to cash flow because it adds back to EBIT the non-cash expense that is D&A. The reason it's good to look at it before interest and taxes is because you're also thinking about how much cash the whole enterprise generates, not how much cash goes to equity holders at the end (which is often called "Free" Cash Flow because it's not tied up with commitments to others)

Coming back to Buffett, in the industries he tends to pick stocks from, CAPEX is a major thing. Companies need to build factories, buy equipment, etc. So if you just look at future EBITDA without accounting for future CAPEX needs, you're fooling yourself.

Truth be told, in those industries everyone also looks at "EBITDA minus Capex". Maybe they do so now that he's bemoaned everyone for not doing it in the first place, but IMHO his criticism largely doesn't apply among valuation professionals. Maybe it does for stock traders, but not for valuation purposes like in an M&A context

jpgvm25 days ago

Buffets point is that it can be gamed to look good. In this case even if they are gaming it to look better than it is.... well it's still awful.

+2
phonon25 days ago
+1
blitzar25 days ago
tgv25 days ago

Document databases aren't an inherently bad idea. They have their uses. I'm using mongodb because it fits my use case: the app is very front-end oriented, and the data the user needs on a specific page fits neatly in one document. There are (quite a lot) related documents, but they are only needed in specific parts of the app, and that's when those are retrieved. Pretty simple.

I could argue that in-server SQLite is a bad idea: if you ever need to share the database or distribute tasks, you're fucked. But for some use cases it just works.

"The world" hasn't realized shit. It jumps from fad to fad, fueled by hyped promises and lack of experience.

eddd-ddde25 days ago

What about a Postgres database where you store your documents in a key value table of json objects?

What mongodb benefits would you be missing?

tgv25 days ago

What benefit does switching to postgresql bring? When mongodb stops being maintained, I'll consider postgres, but until then, I now only see downsides.

eddd-ddde25 days ago

Lot's of useful features?

You can join data when needed, proper transactions, schemas where you need schemas.

For example you start with a basic (key, json) table. Once you begin to stabilise your schema, add columns generated from your json objects, now you can get proper joins, indexes, validations, on said columns.

MrBuddyCasino25 days ago

I‘m not sure that is the case. What I‘m seeing is that people use the cloud-native document stores such as DynamoDB instead of Mongo. The only thing zoomers hate more than traditional SQL databases is XML.

cwbriscoe25 days ago

Doesn't everybody hate XML?

arusahni25 days ago

This seems to gloss over the actual rewrite. How did you ensure queries were equivalent? How did you configure the product to be able to read from and write to both databases? Did migrating reveal any bugs? Were you largely able to port your queries 1:1?

Semionilo25 days ago

You can write logging for both drivers dumping results.

Then checking which query doesn't return the same result set.

Cyberdog25 days ago

Hopefully they had comprehensive test coverage. That would have helped greatly in that regard.

KingOfCoders25 days ago

Some years ago we had so much problems with MongoDB that we also ditched it for Postgres. Never been happier.

Not the reason, but very annoying: Why doesn't (perhaps this has changed?) MongoDB use SQL but Javascript with endless $s? It was always a pain writing ad-hoc queries.

cj25 days ago

> Why doesn't MongoDB use SQL

Perhaps not the right answer, but SQL literally stands for Structured Query Language. I suppose you'd need an USQL?

KingOfCoders25 days ago

Google says:

'The original full name of SQL was SEQUEL, which stood for "Structured English Query Language". A language which has been structured in English or lets say English-Like Query language which follows the syntax structure of English.'

iambateman25 days ago

I use MySQL for everything - always have.

Can someone hit me with a few reasons why you would use Postgres over MySQL? I don’t have any familial affinity to any database, but I’m not sure what the benefits to Postgres are relative to MySQL.

phamilton25 days ago

Some things that are nice in Postgres and missing in MySQL:

- Create constraints as NOT VALID and later VALIDATE them. This allows you to create them without expensive locks.

- explain (analyze, buffers). I miss this so much.

- Row level security.

- TOAST simplicity for variable text fields. MySQL has so many caveats around row size and what features are allowed and when. Postgres just simplifies it all.

- Rich extension ecosystem. Whether it's full text search or vector data, extensions are pretty simple to use (even in managed environments, a wide range of extensions are available).

Is that (and more) enough for me to migrate a large MySQL to postgres? No. But I would bias towards postgres for new projects.

dvnguyen25 days ago

How about HA and horizontal scaling? I’ve heard that MySQL excels in that area.

phamilton25 days ago

I mostly have used AWS Aurora there, which is significantly better than vanilla MySQL or Postgres and both are similar enough.

In Aurora, Postgres has Aurora Limitless (in preview) which looks pretty fantastic.

As far as running yourself, Postgres actually has some advantages.

Supporting both streaming replication and logical replication is nice. Streaming replication makes large DDL have much less impact on replica lag than logical replication. As an example, if building a large index takes 10 minutes then you will see a 10 minute lag with logical replication since it has to run the same index build job on the replica once finished on the primary. Whereas streaming replication will replicate as the index is built.

Postgres 16 added bidirectional logical replication, which allows very simple multi-writer configurations. Expect more improvements here in the future.

The gap really has closed pretty dramatically between MySQL and Postgres in the past 5 years or so.

wlll25 days ago

I do scaling and performance work, mostly with Rails apps, but a significant amount of the work is database level and not language specific. I've used both postgres and MySQL (and a few other databases) going back to 2000.

The best thing I can hear from a company when I start is "We use Postgres". If they're using postgres then I know there's likely a far smoother path to performance than with MySQL. It has better tooling, better features, better metadata.

Pengtuzi25 days ago

Two paragraphs about yourself and ending with a very vague sentence that “answers” their question. 10/10

wlll24 days ago

It's context and weight for what would otherwise be just another opinion.

derekperkins15 days ago

Postgres does several things better than MySQL, but tooling isn't among them.

eknkc25 days ago

Last time I used MySQL I had some delete triggers to clean up some stuff.

Apparently MySQL does not run delete triggers in case the rows are deleted due to a foreign key cascade.

Eveytime I used its slightly advanced features, I ran into such problems. With PostgreSQL I do not need to think if this would work.

andybak25 days ago

PostGIS was way ahead of the MySQL equivalent last time I checked.

The plugin ecosystem is pretty astonishing. Foreign Data Wrappers... I'm not hands on so much any more but there were a lot of things back when I was.

dpcx25 days ago

I'll give you the one that matters to me: in MySQL, you can't do DDL statements (create table, alter table, etc) inside a transaction. MySQL will implicitly commit after each DDL statement.

kstrauser25 days ago

Aww, yes. It’s so nice being able to wrap giant schema and data migrations in a transaction so the whole process is atomic.

tianzhou25 days ago

Architecture-wise, Postgres is more extensible. Another reason is licensing. Both contribute to a viral ecosystem.

https://www.bytebase.com/blog/postgres-vs-mysql/

p4ul25 days ago

This is exactly what I would argue. PostgreSQL makes it straightforward to create extremely powerful extensions.

PostGIS is one such extension, and I would argue that if your use case involves geospatial data, then PostGIS alone is enough of a reason to use PostgreSQL!

gkbrk25 days ago

Not worrying about Oracle suing you is a good reason to use PostgreSQL or MariaDB.

kqr25 days ago

Back when I selected which database to get more fluent in one of the concerns I had about MySQL is that it was decribed as playing loose and fast with type casts, at least compared to Postgres.

derekperkins15 days ago

That's thrown around by people who have been around a long time, but hasn't been true by default for a decade, since 5.7 was released.

teunispeters25 days ago

Stability, reliability, consistency. If you do anything with Unicode you're also much better in PostgreSQL. Faster indexes, smaller base install (), and much more complete SQL language support.

() note: when PHP was taking off, MySQL had a smaller install base. This has long since changed - PostgreSQL hasn't grown much over the years, and MySQL has, at least since the last time I worked on both circa 2015-ish.

majewsky22 days ago

Friendly nitpick, because I had a double-take when reading the second paragraph: Be careful to differentiate between "install base" and "base install". In both cases, you are referring to "installed size of the database in its base configuration". But "install base" commonly means "number of installations". So I was very confused when the second paragraph was implying (using the standard meaning for "install base") that the number of Postgres installations had not grown over the years.

teunispeters15 days ago

Fair. I was referring to the space a minimal installation takes.

anticodon25 days ago

PostgreSQL supports more SQL features and data types out of the box. Also, it looks like MySQL development has stalled after purchasing by Oracle. PostgreSQL has exciting new features in every release, I forgot when anything significant happened in the MySQL world. It's frozen for like a decade now. There're some new releases, but you won't find anything exciting in the change log.

callalex25 days ago

I know this isn’t a technical reason, but my main reason is “eww, gross, it smells like Oracle here.” I’ve been around long enough to know that even being in the same zip code as Oracle is a bad idea.

megaman82125 days ago

I use MySQL mostly, but I would love to have a few features from the Postgres world; namely, the better full-text search, key value store, queue and vector search. A lot of projects I have never reach the scale where I need these to be separate data products so the perfectly fine Postgres versions would suffice.

baur25 days ago

CrateDB might be a good fit for full text and vector search (it’s SQL database but has dedicated clauses for FT and VS).

Curious how do you use PG for key/value and queue - do you use regular tables or some specific extensions?

I can imagine kv being a table with primary key on “key” and for queue a table with generated timestamp, indexed by this column and peek/add utilising that index.

kcartlidge25 days ago

I've used (and introduced) MongoDB in production. Though I much prefer PostgreSQL, SQLite, MySQL, or SQL Server, for some use cases a document database is fine.

However as I discovered myself, once you realise you need to use Mongoose with it you should usually take that as a prompt to consider going relational.

Don't get me wrong, Mongoose is a good package. But the things it solves could likely be better fixed by moving away from MongoDB - I'd go so far as to say that in most cases Mongoose should only be added to help an existing project; if you need it from the start you probably should go relational.

(YMMV and there will be exceptions.)

endisneigh25 days ago

Take away is this has less to do with technical reasons and most to do with licensing, which is fair.

The other takeaway is the fact that they saw huge gains switching as a result of query optimizations with joins shows that they’re data wasn’t properly modeled for use with a key value store, which probably added to them switching to the right type of store to begin with.

ako25 days ago

And that’s the core problem with key value stores, your data needs usually grow beyond key value scenarios. In the beginning it might fit, but then you add more pages with different needs, reporting/dashboards with different needs, APIs with different needs, and ETL processes with different needs. Trying to force everything into key value is short term thinking.

endisneigh25 days ago

What you’re describing isn’t an inherent problem with a key value store. Forcing everything to OLTP SQL is also short term thinking. Pick the right tool for the job.

It’s funny you mention growth as the limiting factor for key value stores, if anything that’s the one area where they’re objectively superior. Which is why most search databases and caches are key value stores.

ako25 days ago

Not if you use a rdbms to store your key/values. Then you can do arbitrary queries with joins over multiple tables, add views, stored procedures, etc.

+1
endisneigh25 days ago
paulmd25 days ago

In this world you either evolve to Elastic/Cassandra/etc or return to Postgres :crab:

You really gotta search your soul and ask yourself: your cool idea really a database, or just a plugin for Postgres? And the answer is often the latter.

winrid25 days ago

The problem is that postgres is not a replacement. The JSON operations are not atomic which is terrible. If you wanna use PG that's cool, but I'd suggest just avoiding JSON.

radiospiel25 days ago

> The JSON operations are not atomic

I hear this today the first time. What exactly os not atomic, and is there a resource with more details?

adastral25 days ago

Not sure if this is what the above comment means by "atomic", but a shortcoming of Postgres' JSON support is that it will have to rewrite an entire JSON object every time a part of it gets updated, no matter how many keys the update really affected. E.g. if I update an integer in a 100MB JSON object, Postgres will write ~100MB (plus WAL, TOAST overhead, etc.), not just a few bytes. I imagine this can be a no-go for certain use cases.

+1
bandrami25 days ago
+1
winrid25 days ago
jeltz25 days ago

A JSON object which is 100 MB after compression is a quite huge thing.

mulmboy25 days ago

How are JSON operations not atomic? Genuinely curious

bandrami25 days ago

I think the sense is not "atomic by field" or whatever you'd call that. If you're going to ignore the fact that PG is an actual ORDB and just store gigantic blobs of JSON in it, it will write the entire object whenever you update part of it, because the whole point is you're supposed to store it as a multi-field record.

+1
SonOfLilit25 days ago
winrid25 days ago
adhamsalama25 days ago

I pushed the company I work for to use Postgres instead of Mongo in all new services and refactors. Now I'm leaving it to join a new company, where they use Mongo. Sigh. I'll have to do this all over again.

winrid26 days ago

I don't doubt there are real benefits for them to switch but BTW this is not true:

> Difficulty configuring database transactions: With MongoDB, setting up transactions was not trivial because it required running MongoDB in cluster mode with various configuration overhead; this made it extremely difficult, for instance, for customers to run a simple POC of Infisical because it required a production setup of MongoDB.

You can run single instance in clustered mode, just a single-instance replica set. You get the oplog, transactions, etc. No advanced configuration required.

vasco25 days ago

And this has been possible for many years.

blue_pants25 days ago

Are there any disadvantages to using transactions with a single-instance replica set?

ammo166225 days ago

We run a single instance replica set for three years in a small application for its transaction function.

The only difference we noticed is the additional replica set setting for a new instance. No other disadvantages are found.

thih925 days ago

Loosely relevant, “the marketing behind mongodb”

https://news.ycombinator.com/item?id=15124306

CSMastermind25 days ago

Man, that thread feels prophetic. Since 2017 whenever this comment was written:

> 100% of my friends who have used Mongo/similar NoSQL have given up and had a nasty rewrite back to pgSQL...

I've led three migrations at different companies from Mongo to Postgres, spending months of my life and millions of dollars.

There really should be a sociology study of why so many people/companies used Mongo even well after the downsides were clear.

jacobyoder25 days ago

"this hurt in particular because our data was very much relational."

Umm... why would you not choose a relational engine to start with? This isn't said with the benefit of hindsight. I worked with Lotus 123 back in the early 90s. I get that there's value in document databases, but even then, there were limitations, and the need for the ability to have structured/related data in an engine that easily allowed for ad-hoc queries out of the box was apparent.

I watched the entire nosql/mongo movement arise and evolve over time, and it rarely made sense to me. Even when mongo made sense for a problem... it only makes sense for that problem, and not as the primary basis for your entire application. Relational/SQL, with nosql/mongo/etc as secondary/ancillary data store has always made the most sense (imo).

JSON columns in major databases now tend to provide a pragmatic balance of "good balance for many use cases". Justifying something other than that as a starting point is possible, but I've rarely seen cases where it makes sense to avoid a decent SQL engine as the core data store for your business apps.

CSMastermind25 days ago

> Umm... why would you not choose a relational engine to start with?

There's typically two answers to this question:

When they first started things were changing so quickly they didn't want to commit to a schema and felt like doing real data modeling would slow their velocity down so they dumped everything into Mongo and called it a day.

Or they had someone on the team who insisted that Nosql was the way to go. They'll say something like, "I worked on xyz team at Amazon and we just used DocumentDB for everything and never had a problem. Those horror stories about Mongo are all from people who didn't use it right."

jacobyoder15 days ago

Interestingly... I've never encountered the first scenario as a justification. I know it must exist, as I've read people describe it.

Every time I've been in a group where nosql was desired, it was because they saw it at a conference, or watched a youtube video, or read some blog extolling nosql virtues, and... yeah - that second option. They'd either never actually used nosql, or didn't realize what they were getting into, possibly because even if they'd used it, the larger architecture had SQL in it someplace, just outside what they could see.

fl0ki25 days ago

I inherited a troubled project using MongoDB and learned a lot about what is, and isn't, really a problem with MongoDB [for this kind of project].

BSON is a terrible format, but I genuinely like that I can use the same schema all the way through the JSON APIs to the database. If they have to deviate in future for some reason, you can deal with it then.

If you are clever with generics, you can compose type-agnostic DB code with type-safe in-memory data structures, without needing any schema generation or ORM. It's a natural fit for data-oriented programming in my workhorses Go and Rust. Unfortunately, the Rust library is very poorly designed in many ways, making many basic things extremely inefficient, and that's not what you want in Rust. The Go library has really slow reflection-based (de)serialization, but other than that, it's very flexible and robust.

If you keep your queries simple, it's easy to mock out and combine with techniques like snapshot testing. Since BSON has a lossless representation in the form of extJSON, you can have human-readable snapshots, diff them easily in review, etc.

Doing bulk writes is easy and efficient, and doing sessions/transactions is easy, but you should basically never combine the two. The server can have its own configured timeout for a session, and if your bulk write exceeds that, it fails and retries, fails again, etc. This is a really serious design flaw that shouldn't have made it past review. If they wanted to enforce a maximum bulk write size, that should be known up-front, not depend on how many seconds it happened to take each time.

Writing data to a replica set is extremely slow, and every index you add makes it slower. This compounds further with the above.

I have concluded that MongoDB only makes sense for read-mostly workloads. If you're going to do your own in-memory data structures and validation anyway, writing out small infrequent changes, MongoDB doesn't do much for you but it also doesn't get in your way very much.

akie25 days ago

Did anyone here ever see a migration _from_ PostgreSQL/MySQL _to_ MongoDB? I've only ever seen startups pick Mongo, then regret it a few months or years down the line and migrate to a relational database. Did anyone ever see the opposite?

djbusby25 days ago

I watched a team go from a sloppy MySQL that they never tried tuning to Mongo (cause it's web-scale). It was another thing that almost killed the company. Now they are on PostgreSQL.

hibikir25 days ago

I've seen a couple of successful ones, but there were extremely good reasons: Most of the time we don't need scaling, but when your target Mongo cluster has a couple hundred machines, and your queries really are very simple, NoSQL is not a crazy idea. Same with DynamoDB: Sometimes it really is a very good choice, but you need a use case of the right shape. If you have an extremely good use case for your table to be in 5+ regions, the dataset has billions of rows and you have no hot keys, maybe running your own replication system really is more effort than letting AWS do it all for you.

It's just far more common for some tech lead in small company to imagine that they are going to be bigger than Netflix, than for someone to start conservatively with a RDBMS, and realize that suddenly they have hundreds of millions of users and some kinds of data where the relational bits don't matter.

harry_ord25 days ago

I work with the aftermath of going from postgres to couchdb. The migration was done years before I joined. There are still customer records showing postgres ids.

It didn't seem like it made things better since there were issues with dates and lost data. Working with it now isn't fun either.

willio5825 days ago

No, but I am currently in a startup that mainly uses DyanomoDB (similar to Mongo) with support from Elastic for more search-based queries. We've found DyanmoDB to be great and we don't feel hindered by the lack of relational-ness.

I will say in case anyone doesn't understand how this might be possible, relational querying is possible in document databases, but they essentially require just different ideas to achieve. In Dynamo, you might need to create secondary indices for example.

Personally I love Postgres and would not mind if we had gone that direction but the more I use DynamoDB the less I feel like Postgres is the "only way"

rglover25 days ago

> Missing out on relational features: With MongoDB, we lost out on many nice features from the relational world like CASCADE which, when specified, deletes all referenced resources across other tables whenever a target resource is deleted; this hurt in particular because our data was very much relational.

I'd be very curious what their data model was in relation to this problem. I wonder if denormalization of the data would have solved the problem without a need for a full database hop.

bsaul25 days ago

is it still difficult to create a cluster of pg dbs. either for redundancy or speed ?

Last time i advocated for using pg vs mongodb, the person replied that mongodb clustering was super easy.

baq25 days ago

Mongo absolutely can be the right answer if you need to horizontally scale to tens or hundreds of TBs. Anything below that I’d rather have a small cluster of big Postgres instances. There’s value in a SQL RDBMS which you just don’t get anywhere else.

Ozzie_osman25 days ago

Read replicas are pretty easy on postgres using replication. That said, you need to be careful about replica lag.

If you want to distribute your writes, that's a little trickier. There are options like Citus and such. But still not natively supported.

KingOfCoders25 days ago

Second that. A client hat PG replica lag, was doing backups from the replica and found out during restoring a backup, that some hours of data were missing.

jononor24 days ago

Ouch? Are there standard tools to monitor that? Any best practice to avoid? I am considering exactly tha kind of setup...

5Qn8mNbc2FNCiVV23 days ago

I don't know about there being a standard tool but you can track the LSNs together with a timestamp on the write node and from there calculate the replica lag by checking what LSN the replica is at

globular-toast25 days ago

I started my career doing web development using PHP and mySQL. Then pivoted to something completely different for about 6-7 years as I got sick of spending all my time making stuff work in IE6. I heard it the whole nosql thing, didn't understand it but thought it sounded silly. Then came back to web dev 5 years ago and found postgres was all the hype and nobody talked about nosql any more. Glad I skipped it!

pmarreck25 days ago

Could have told most people this back when everyone was going to NoSQL.

"You'll be back". Relational capability is just too useful beyond a certain point

XCSme21 days ago

I currently use MariaDB for my self-hosted application. Would I gain big advantages switching to PostgreSQL? My main reason for going with MySQL/MariaDB is the availability and ease of installation on most hosting providers.

derekperkins15 days ago

No. In almost no circumstances will it be worthwhile to switch either direction. While people on HN love to nit-pick on the margins, the core functionality for both is 95% overlapped.

jrochkind125 days ago

> we felt that it already delivered significant benefits by virtue of not being MongoDB

Well then, tell us how you really feel.

rullopat25 days ago

That's what happens when you choose the next product / framework / methodology because of fashion or because FAANG is giving you the stuff for free, without thinking if it really fits your use case and that 99.9999% of the companies will not have the problems Google, Facebook, etc. have.

h1fra25 days ago

Funnily enough, the one feature I don't recommend in Postgres is Cascade. Unless you have a very small controlled set of FK it's not a good idea. Batch delete is less problematic. I wish Postgres would do something about it, like elasticsearch deleteByQuery that can run in the background.

Cyberdog25 days ago

What is the point of using a query builder if you’re only going to support a single RDBMS? Why not write straight SQL and avoid the unnecessary abstraction layer? Is the query builder really going to be easier to learn to use (particularly for non-trivial queries) than SQL?

callalex25 days ago

If your DB schema is expressed as an object/class it means your IDE will have robust autocompletion. It’s just a convenience to avoid having to look up the schema for whatever tables you’re interacting with. It helps cut down on trivial bugs during the writing phase, before you even get to testing.

fakedang25 days ago

Can somebody please explain to me the use cases in which someone would use Postgres vs MySQL vs something like Cassandra or Convex? I know the latter two have their own types of SQL implementations, so why aren't they used as much, even if they are NoSQL databases?

j4525 days ago

It always struck me as odd to see the hard work put into NoSQL databases to make them relational.

Somehow, the work to do this seemed less work than learning SQL.

agentultra25 days ago

Hoo boy. I did this once. Only it was CouchDB. And they didn’t use any schema. Data had accumulated in it for years. Then a customer said they wanted their data exported to PowerBI.

I used a library that generated a type that could parse any JSON document you threw at it by feeding it examples. So I wrote some tools to scan through the database, generate parse test cases, etc.

I then wrote a library to version and migrate records.

With this I wrote some tools to start extracting meaningful bits out of the big-blob type. I could scan the database, download examples that didn’t work, make test cases, fix those, ask the experts in the system, etc.

Then eventually it started spitting out structured data into Postgres and from there into PowerBI.

Decent article.

CoastalCoder25 days ago

Obligatory parody link: https://youtu.be/HdnDXsqiPYo

anticristi25 days ago

:)))

I find the super-plain voice of the annoyed character further amplifies the humor.

WuxiFingerHold25 days ago

Talking to AI 12 years ago :-)

hello_computer25 days ago
Semionilo25 days ago

General speaking: don't use mongodb.

It was super shitty when it came out and using postgres gives you a better and faster solution which you can also use in other projects.

I'm always impressed when I see mongodb still making money.

b9b10eb73624 days ago

While I'd agree with the general stance of avoiding MongoDB for any new project, I find the statement that postgres always gives better and faster solution dubious. They don't really solve the same problems. If you happen to really need horizontal scaling, actual HA (not failovers) or documents with many field-level atomic operations, MongoDB might still be a better fit than postgres. For on-premise hosting, not having any HA out-of-the-box can be a major painpoint.

Semionilo23 days ago

Sharding does exist with postgres.

What type of ha can mongo do postgres can't?

onetimeuse9230425 days ago

Ah, the old "I used a wrong product for my problem and now I complain the product is bad because it does not suit my case." defence.

MongoDB is a document database. It is not supposed to be good at relations. Also lacking support with cloud providers and lacking experience with MongoDB is not MongoDB's problem, it is your poor decisionmaking. If you value those things, you should have taken it into account when you were choosing MongoDB in the first place.

Now, I am not a big lover of MongoDB. Some years ago I was forced to use it and I had very low opinion of it as quite immature product. It is still far behind in maturity compared to something like Oracle database or PostreSQL, but in the meantime I learned to appreciate some of the things MongoDB is good at.

I also admit that MongoDB's transactions are a total joke. It should be prominently placed in the documentation that you are using them at your own risk. I don't use MongoDB transactions anymore because there are better ways to architect your application with MongoDB without using transactions.

I like MongoDB for the ease of use when rapidly prototyping things. I like the concept of a no-schema, document database and some of the additional features it provides to deal with the document. I like its reactive java driver which is a breeze to use to construct data pipelines quickly. I like change streams.

In the end, I think it is good to have a selection of tools that are good at doing different things.

It is our responsibility to chose the right tool for the job. If you chose poorly, don't try to fault the tool for it.

dangtony9825 days ago

Author of the article here!

The article itself does not "complain the product is bad" but some other comments in this thread would certainly suggest so. Instead, the article says that the use-case evolved, states reasons for why MongoDB was no longer suitable (while at some point it was) and why PostgreSQL was chosen instead, and discusses the migration process involved in the transition and the results.

Regarding lacking support with cloud providers and lacking experience with MongoDB is indeed NOT MongoDB's problem. However, it is a problem for users of the platform that are trying to self-host it with MongoDB as a database dependency and therefore justifies the PostgreSQL migration with PostgreSQL being a better candidate for this use-case.

To be clear, the article is not saying that MongoDB is bad and no fingers are being pointed. I would in fact say that there are an array of use-cases where MongoDB is an excellent choice (e.g. I resonate with the point on rapid prototyping).

Please don't skew the words.

finaard25 days ago

To me the article read more that PostgreSQL might have been the better choice from the beginning - but due to lack of experience in the team MongoDB was "good enough" at that point.

tkellogg25 days ago

I recall when my team chose MongoDB ~2011, Postgres & friends didn't have JSON columns, so there was a lot of extra data modelling that probably was unnecessary.

The biggest use case for MongoDB was for huMongous data. Obvs MongoDB was a good fit, because of the name.

+1
jerf25 days ago
+3
vbezhenar25 days ago
wlll25 days ago

https://www.postgresql.org/about/featurematrix/#json

You're right, 9.4/9.5 were when JSONB was introduced and expanded (release 2013/2014).

That said, it's a big decision to go with two very very different technologies (relational vs Document store) just for a column type, storing JSON must have been a pretty major product feature?

lloydatkinson25 days ago

I have noticed in recent times comments on here have been getting more and more like this. An article I posted a few weeks ago got the same treatment even though I iterated multiple times the same points as I anticipated these kinds of comments from people that didn’t read it fully.

thih925 days ago

To be fair, Mongodb’s pr team did their best to present it as the solution to all problems.

> In 2012, 10gen’s VP of Corporate Strategy Matt Asay argued “there will remain a relatively small sphere of applications unsuitable for MongoDB … the majority of application software that developers write will be in use cases that are better fits for MongoDB and other NoSQL technology … Those functions that really help a company innovate and grow revenue [will be NoSQL].” He would note that we were living in the post-transactional future.

https://news.ycombinator.com/item?id=15124306

gwbas1c25 days ago

They were doing that back in 2010. I went to a few conferences where they promoted MongoDB as an all-purpose database that was the "next thing" compared to SQL.

Once everyone realized its shortcomings as a general-purpose database, there was a gradual "oh shit this isn't working" migration back to SQL-based databases.

I think the big problem is that the MongoDB programming model and APIs are very nice; we really need a database that is still relational "under the hood" but has an API that's more like MongoDB.

bashinator25 days ago

I believe one of the job interview problems at Mongo for software engineers, was to design a relational layer on top of the document store.

onetimeuse9230425 days ago

To be fair, almost all product PR teams do the same. I just ignore most of what they say in favour of my own opinion that I try to build based on reading actual technical documentation, experimenting with the product, etc.

Anybody who makes tech decisions based solely on what PR teams say is naive and incompetent at best.

zzzeek25 days ago

MongoDB came into the community like a ton of bricks, taking over conferences, flooding the zone with mugs and t shirts, and for a time there, you couldn't view the front page of hacker news without at least two MongoDB posts. The death of SQL, now known to be an inferior relic of the past, was a regular topic of discussion. There's a reason why the "Web Scale" meme is so famous, because that's what actually happened for awhile there. SQL was an inferior, leaky abstraction, and ACID was not "web scale". it became all about the CAP theorem (which I hardly ever see anyone writing about these days).

only a year or two later, when teams that went all in on MongoDB at the behest of MongoDB's marketing department started realizing they'd been sold a bill of goods, did the slow and arduous march back to what continued to be the best generalized solution for 95% of data problems, the ACID compliant relational database, begin to occur. given that, this blog post seems really behind the times.

thih925 days ago

> To be fair, almost all product PR teams do the same.

No, mongodb’s PR approach and results were largely uncommon at the time. They didn’t advertise directly but instead targeted and amplified dev communities - as detailed in the article linked earlier.

The result was that people not associated with mongodb talked about mongodb at various dev conferences and in blog posts. People didn’t want to listen to PR teams then too, but they followed their peers.

+1
onetimeuse9230425 days ago
taffer25 days ago

As a counterexample, Azure Cosmos or Amazon DynamoDb are marketed as specific solutions for specific problems. Other RDBMS like SqlServer or Postgres are marketed as general purpose solutions because they are in fact general purpose solutions. Mongodb, on the other hand, is a specialized "document database" marketed as a replacement for relational databases.

benterix25 days ago

> To be fair, almost all product PR teams do the same.

As it is their "duty" to twist the truth in this way, it is our role as the users of these technologies to present things as they are.

swasheck25 days ago

mongo was absolutely one of the most egregious (daresay dishonest?) purveyors of such myth. it was quite off-putting for me and my team.

liquidgecka25 days ago

In 2010-2011 time frame the mongo team sponsored an effort inside of Twitter to replace MySQL with mongo. There was a group of us that worked to migrate the tweet store to Cassandra that were able to talk with leadership and get that initiative killed. Turns out migrating HIGHLY structured data into mongo was never a good idea, especially at that scale.

pydry25 days ago

The issue isn't that mongodb isn't suitable for some problems. I've used it before and it has not caused issues. It's that the problems it is suitable for are a subset of those that postgres is suitable for.

I can just as easily rapidly prototype in postgres. If I want to store schemaless JSON in postgres I can easily do that.

>In the end, I think it is good to have a selection of tools that are good at doing different things.

In the end I think every tool needs a niche - something it is good at that its competing tools are not good at.

Mongo doesnt have that.

CuriouslyC25 days ago

If mongo is just a document database, there's really no reason to use it over elastic. The query story is slightly nicer with mongo, but we're not doing relational algebra here, right? Elastic crushes mongo at literally everything else.

LtWorf25 days ago

ES had a bug where a syntax error in a query would send it in a weird state that would then give wrong answers to all queries until the process got rebooted. Which of course isn't fast.

I reported the issue, and thankfully I changed job and never had to deal with ES again. After a few years they contacted me to ask if the issue was still there, I said I had no idea.

I have no idea if the issue is still there.

finikytou25 days ago

i guess elastic is more heavy setup for some POC.

CuriouslyC25 days ago

That might have been true in the past but it's not hard to PoC with elastic using docker compose.

synthc25 days ago

Elastic is much more of a pita to maintain and monitor than Mongo

taffer25 days ago

So what is a good use case for Mongodb? I have never seen an application where a "document database" would have been a good choice. In every project where I have seen Mongodb, it has turned out to be the wrong choice.

Even for prototyping there are many other good choices since RDBMS like Postgres have implemented JSON support. Mongodb looks like a solution in search of a problem.

joneholland25 days ago

Applications that are essentially form wizards are a great fit for a document database.

Think application forms etc.

wlll25 days ago

Is there a benefit of Mongo for that over just Postgres with JSONB columns? You're still storing JSON in Postgres but you'd get the relational aspects too for things like users having many forms, billing and account relationships etc.

unclebucknasty25 days ago

No benefits that I can think of at the database layer. Postgres's addition of JSONB columns represents the best of both worlds. Funny to think that all of that noise about nosql replacing rdbms was essentially nullified by the addition of one column type.

Some people do like the MongoDB API, however.

joneholland25 days ago

Easier horizontal scaling and organizational inertia would be the main reasons to use mongo over a jsonb column. I wouldn’t introduce it to a psql shop if they are already great at running psql.

forgotmyinfo25 days ago

Sorry, 100 times out of 100, "rapid prototyping" means "we built this with the wrong database, now we're stuck with it". If it isn't obvious how to store your data up front, then you either aren't planning your software well, or you don't actually know what you're building.

wlll25 days ago

I was also a bit confused by that part. What is it about Postgres that prevents rapid prototyping? It's not like the schema is set in stone, or even particularly hard to change once you've created it.

benterix25 days ago

Well, yes and no. If you already have your database running in production with many transactions and live connections 24/7, changes to the schema might not be hard per se but always needs careful planning and execution. Additionally, in Postgres a change like adding a column locks the whole table (although I hear this is going to change) so writes are off for a short time. If this becomes unacceptable, you go for a blue/green deployment which has its own gotchas.

So, while I agree with your main point - there is nothing in Postgres that prevents rapid prototyping, and I would chose it over Mongo any moment, I understand why some people might prefer the more "dirty" approach.

wlll25 days ago

True enough, and the solutions for lighter-weight schema changes have evolved and weren't always so good, but early stage startups often don't really have the sort of data-weight issues that make this hard, unless they're starting with large data sets already. Even at the scale of small hundreds of millions of rows per table (like the company I'm in charge of the database for) it's not much of an issue.

I had to look it up, apparently adding a column with a non-null default was "fixed" in PG 11 (2018), but with a null default it had been fast for a while:

https://www.depesz.com/2018/04/04/waiting-for-postgresql-11-...

JJMcJ25 days ago

> MongoDB is a document database.

Thank you. A one sentence clarification of the issues in Mongo vs RDBMS.

benterix25 days ago

It's a bit more nuanced than that. PostgreSQL supports JSON/JSONB data types so if you need, you can use it in a way similar to MongoDB, with less structure and the remaining caveats, but with transactions and all other goodies working out of the box and being able to use standard RDBMS features if needed.

teaearlgraycold25 days ago

> I like MongoDB for the ease of use when rapidly prototyping things.

I think this is similar to when people say writing tests speed them up through TDD, then others don’t write any tests at all. What I take away is you can do it either way as long as you’re bought in and are familiar with the process.

For me it’s no trouble at all to define my schema in advanced. And Postgres provides me with the json column escape hatch. Combined with the massive benefit of the data validity guarantees later in the app’s lifecycle I pick Postgres every time.

jeff-davis25 days ago

"It is our responsibility to chose the right tool for the job."

That perspective doesn't work well for database products, in my opinion. There is a huge pressure for databases to evolve with your business and applications and to adapt to whatever your throw at it.

Swapping out a database product is less like changing tools and more like changing a foundation. You can't do it every time a new problem arises.

That's not to say you can't use a few different products if that makes sense. But that has its complications.

99990000099925 days ago

>As part of that stack, we chose MongoDB + Mongoose ORM because the combination presented least overhead and allowed us to ship quality features quickly. As Sir Tony Hoare states, “premature optimization is the root of all evil,” and there was certainly no need for further optimization at the time.

Looks like the tool did a great job while they were getting started. Mongo is very easy to hack something together with, later on it looks like they just needed to migrate to something more stable

swasheck25 days ago

yeah ... in the early days of mongo it was pitched as a replacement for relational because a) it had faster writes without those pesky joins and constraints, and b) because it's too hard to model data, so chances are it's just unstructured. the onslaught of "relational is dead" and "joins are annoying" was tremendous

99990000099925 days ago

Plus there's a good chance they didn't know exactly what they needed to build when starting out.

jimbokun25 days ago

Postgres and MySQL are easy to hack something together with, too.

99990000099925 days ago

That's assuming you're experienced with them, if you know next to nothing about databases, Mongo is a much better choice. In this case it looks like when they got started the team just didn't have all that much experience with databases.

forgotmyinfo25 days ago

Nonsense. Choosing the correct database isn't "premature optimization", it's the bare minimum of being a competent programmer. Why do we never have time to do it right, but we always have time to do it over?

99990000099925 days ago

It's generally faster to get started with Mongo.

If you're talking about a startup, you might not even get to the let's do it over stage. You might have six months to just hack something out so you can raise more funding, and in that case Mongo is a great choice.

The needs of your business might also just evolve over time.

unclebucknasty25 days ago

To be fair, it burst onto the scene back in the "all you need is NoSQL...RDBMS is dead" era.

So it became one of those software dogma things and I remember having "debates" right here on HN, including around the necessity of transactions.

So it's kind of ironic to see people now chastising those who fell prey to the dogma. Full circle.

throw_m23933925 days ago

> MongoDB is a document database.

So can PostgreSQL be with JSON types and queries.

renegade-otter25 days ago

The irony of it all is that MongoDB actually matured and you do not have to jump through hoops to have, say, transactions.

I am ALL for Postgres, I even wrote a post about the importance of being fluent in relational databases: https://renegadeotter.com/2023/11/12/your-database-skills-ar...

What grinds my gears is the whiplash caused by this frantic stampede from one hype to the next.

I am just waiting for the AI hype to blow over and for everyone to rediscover Hadoop, or something.

thiht25 days ago

Postgres is not hype though. There's very few legitimate use cases for not using a RDBMS as a main data store, and Postgres happens to be the most popular nowadays, for good reasons.

jimbokun25 days ago

In my opinion the use case is when your data no longer fits on a single, potentially very large, server. Automatic distribution and replication and rebalancing of data is a tricky problem and something like Cassandra handles those very well.

Or Cockroach DB or similar, if you still need relational capabilities.

circusfly25 days ago

I've nothing against Postgres but MySQL is a wonderful option that is likely still in far higher and heavier use, when considering open source databases.

kstrauser25 days ago

In my mind, MySQL no longer exists. It’s a weird Oracle thing that I wouldn’t touch with a lineman’s pole. MariaDB is what happened after MySQL disappeared.

LtWorf25 days ago

I think it gained popularity over a decade ago, when it was faster because it was unsafe.

In the end postgres is the better one. With mysql the defaults are bad.

wlll25 days ago

I wrote this in another comment, but it's relevant here I think:

" I do scaling and performance work, mostly with Rails apps, but a significant amount of the work is database level and not language specific. I've used both postgres and MySQL (and a few other databases) going back to 2000.

The best thing I can hear from a company when I start is "We use Postgres". If they're using postgres then I know there's likely a far smoother path to performance than with MySQL. It has better tooling, better features, better metadata. "

Right now I would not choose MySQL over Postgres at all, ever. I can't think of a single way it is materially better.

oldsecondhand25 days ago

In my experience MySql is more popular. Worked at several companies that used MySql, none that used Postgres.

Kids at universities still use XAMMP to learn databases.

lutoma25 days ago

In my experience MySQL is still commonly used in the PHP world, but everything else is mostly Postgres

bradleyjg25 days ago

The number one reason being that it’s free. In a purely technical comparison it is not at the top of the list.

simonw25 days ago

What's top of the list technically?

+2
bradleyjg25 days ago
darby_eight25 days ago

> There's very few legitimate use cases for not using a RDBMS as a main data store

I don't think that's true—all the old cases that were legitimate before are still legitimate. The price of both persistent storage and memory has simply come way down to the point where many computational workloads are viable under vertically scalable databases again. I suspect the pendulum will swing the other way one day yet again (and of course there is a rich ecosystem of ways to query horizontally, if not transactionally, with SQL that will likely temper this swing even further).

wlll25 days ago

The price of persistent storage and memory was the same for Mongo as it was for Postgres back when the NoSQL movement happened. Mongo wasn't made of magic scaling fairies and still needed resources. As soon as you cranked up the safety of Mongo to a nearly acceptable level it's performance fell through the floor. The only reason people though it performed amazingly was because of Mongo's rather deceptive marketing.

I use the term "nearly acceptable" because for a long time Mongo's data consistency was absolutely crap (https://jepsen.io/analyses)

Personally I think people used Mongo because it was new, shiny and exciting and it promised performance (though it didn't really deliver for most people).

+1
darby_eight25 days ago
bradleyjg25 days ago

In addition to changes in ephemeral and persistent memory, the other big difference between now and nosql’s heyday is improvements in distributed relational transactional databases (newsql).

We haven’t exactly circled the square on CAP but we’ve certainly bent it.

sph25 days ago

Postgres is no hype, it was already in plateau of productivity when MongoDB came to the web scale scene.

Hadoop just like Mongo had their hyped time in the sun, but RDBMS are far too advanced and versatile than any of them were.

renegade-otter25 days ago

I am not calling Postgres hype - it should have never been NOT hype. It's a reasonable default for most problems,

Now all I read about is how Postgres is awesome, as if it's this great new thing. I guess that makes sense, as the new generation of engineers is rediscovering stable, reliable, lean technologies after a decade of excesses with "exotic" tech.

For grey beards, it's all very odd. Like, "where have you all been?"

arp24225 days ago

I remember Robert C. Martin ("Uncle Bob") going on about how No-SQL will replace literally all SQL and that there is literally not a single use case for relational data and SQL. I wonder if he ever came back on that.

Now, my opinion of Martin in general is not especially high and he's a bit of a controversial figure, but it wasn't just the kids. And Martin is also all about reliable software, so that makes it even more surprising.

+1
belter25 days ago
+1
rjbwork25 days ago
+2
GiorgioG25 days ago
forgotmyinfo25 days ago

That's as ridiculous as saying "integer division will be useless in 20 years". Sometimes relational databases really are the optimal solution to certain problems. I really wish CS had better authority figures, although we shouldn't need them in the first place.

+2
Shorel25 days ago
tomnipotent25 days ago

Postgres as we know it has only existed for about a decade, since the post-9.x era in 2010-2014 when many of its lauded features were added. Replication, heap-only tuples, extensions, foreign data wrappers, JSON, leveraging multiple indexes, parallel index scans, reindex concurrently, JIT compilation, declarative partitioning, stored procedures, and major WAL improvements are all "recent".

I love Postgres and it's been my go-to default since 2014 and you can pry it from my cold dead hands, but it's more contemporary with Perl 6 than Perl 5 if we're talking grey beards.

+1
Sesse__25 days ago
sph25 days ago

Excited, vocal users make the most noise.

Somehow the entire Mongo DB era passed me through and I never used it once. I used to use MySQL in the 2000s, and switched to PostgreSQL in the 2010s.

LtWorf25 days ago

They used it at my 1st job, a startup.

They had an ORM which made everything much slower, and the most common operation they did was to add items into lists, which causes the whole document to be resized and requires it to be moved.

It could have been done with 1 sql server, but we needed 4 really heavy mongodb servers instead. And of course our code was buggy so the schema was all over the place.

+1
benterix25 days ago
jimbokun25 days ago

My company considered it, but went with Cassandra instead.

forgotmyinfo25 days ago

This is what happens when you hire 20-somethings who don't know how computers actually work to build your entire business. You either learn about relational databases in school, or through experience. Relational algebra has been around for half a century, at least. If someone doesn't know about its utility, it's a failure of the education system, or a failure of corporate incentives.

benterix25 days ago

> For grey beards, it's all very odd. Like, "where have you all been?"

To be frank, PostgreSQL has evolved a lot since the late nineties. There was a time where people preferred MySQL over it at it seemed to work faster, certain things were easier and so on.

+1
hibikir25 days ago
Shorel25 days ago

There was a time PostgreSQL did not run under Windows, and that is IMO what gave MySQL the market share edge.

Without these windowless years, PostgreSQL would be the most used RDBMS right now.

jimbokun25 days ago

MySQL was more popular for a while, but from a technical standpoint Postgres was always ahead.

fijiaarone25 days ago

PostgreSQL didn’t exist in the 1990s. It was called Ingres. Postgres started as a personal project and was first released in 1999, but was unusable. Around 2004 the project started getting popular and hipsters started screaming that MySQL didn’t have stored procedures and wasn’t fully ACID. Acid became a buzzword for PG fanboys, even though data corruption and memory leaks plagued the system for the first decade. It became a stable around 2008 as long as you didn’t mind restarting your database every few days. PostgreSQL didn’t really become a viable option until around 2010.

rendall25 days ago

...and, continuing the cycle next season, devs will (re)discover NoSQL and document DBMS. I do wonder if this phenomenon of trendy tech is somehow integral to advancing the state of the industry, the SWE equivalent of early adopters. "Rediscovering the old" is a side effect, as old tech is used and reinterpreted in new ways. Kind of neat, really.

Retric25 days ago

There’s an entire industry of people selling knowledge of the “next great thing/the one true way," but they need a new thing every few years as people learned whatever the old thing was.

nemo44x25 days ago

MongoDB is more popular that it ever has been. You can look at their quarterly filings and learn this. The NoSQL space is larger than it ever has been and is the fastest growing segment in the database world.

bdzr25 days ago

> The irony of it all is that MongoDB actually matured and you do not have to jump through hoops to have, say, transactions.

Are transactions still limited to a single document in Mongo?

throwitaway112325 days ago

Mongo has had multi-document transactions since 2018 (version 4.0), and joins (the $lookup aggregation pipeline stage) since 2015 (version 3.2) [1].

[1] https://www.mongodb.com/evolved#mdbfourzero

debarshri25 days ago

Generally with these migrations biggest problem in my experience is that all the operational learnings are not transferrable.

You often want to create expertise and achieve operational excellence with the tech that you start out.

renegade-otter25 days ago

Back in the day we used to say "you choose your database once". You can swap in/out a lot of things, but your data store is really the core.

It is shockingly hard to change your storage engine once you are serving customers, so you really want to get that right.

cpursley25 days ago

Exactly, languages and frameworks come and go - but get the data layer right and you’re set.

cpursley25 days ago

The one unsolicited piece of tech advice I always give is “Just use Postgres”.

Postgres Is Enough: https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

There’s some Mongo/json alternatives in the list if you really need unstructured data.

And there’s a huge plug-in ecosystem as well for just about anything you could imagine.

kqr25 days ago

But isn't there a point before which "just use sqlite" is more appropriate advice?

lysecret25 days ago

I love sqlite as much as the next guy but as soon as you are working with multiple write processes (which you quickly will) you will have to move off. So, why not just start with Postgres?

jimbokun25 days ago

SQLite is a replacement for structured data stored in a file. If your data store is a big XML or JSON or CSV, SQLite may be a better solution. For example, I’m using it for ETL tasks and it’s working well for this use case.

If you need a networked database supporting many users, probably better off with Postgres.

WuxiFingerHold25 days ago

For server applications I don't think so (despite the recent hype driven by businesses marketing the cloud services). Managing multiple writes and durability needs the same or even more effort and attention than running PG or MySQL. And on top of it SQLite is missing many often crucial features. For me, those are the type system and functions.

cpursley25 days ago

Sure, if you simply need a dumb store just for storing basic types that’s simple to operate. But SQLite is not at feature parity with Postgres.

__s25 days ago

sqlite is great for what it is

But I recently ended up using sqlite on a site, & it's clear it makes a lot of trade offs to keep itself minimalist. I wish I had just used postgres

That minimalism can be appropriate if you want something that works with a single file to track state & no external process

tristan95725 days ago

Have you tried using something like Turso? Curious how that experience would be for you.

(I do not work at Turso)

__s25 days ago

Looking over it, I don't see why you'd use a managed sqlite service. Unless you started with sqlite & now you're stuck with its quirks

Don't know, I lean pretty minimalist on things, like I'm using `hyper` directly on another website I work on, rather than picking axum or warp

madsbuch25 days ago

I am curious on this. Why do you think SQLite is simpler than Postgres? and in what setting is that the case?

Sqlite comes with the headache of managing an attached volume.

J_Shelby_J25 days ago

Building an mvp or a single container app.

I’m actually trying to get Postgres to work like this right now.

SQLite being embedded is simpler to use and more performant in this environment as there is no IPC overhead.

+1
madsbuch25 days ago
glimshe25 days ago

Friends don't let friends use MongoDB. In my previous job, I got a big promotion in part for getting rid of MongoDB and virtually eliminating our database problems.

pydry25 days ago

Most of the time I've tried to convince people not to move away from mongo i failed.

My only success was pushing back on switching from postgres to azure cosmos after Microsoft salespeople convinced some of our managers that we were making a mistake by not using their hot new toy.

This industry...

antfarm25 days ago

I remember well how in the early 2000s everybody wanted to get away from their relational systems and NoSQL (Not only SQL) databases were the latest fashion. Looks like we have come full cycle.

Dudelander25 days ago

A fad that set back the industry by a decade.

trynumber925 days ago

Not sure on that, it made some of the SQL solutions innovate quite a bit.

forgotmyinfo25 days ago

What did relational databases need from all the document DBs du jour? Broken transactions? Lack of ACID guarantees? Devil-may-care schemas?

rsanek25 days ago

yep, the last one -- Postgres' support for json / jsonb is way better now.

jpalomaki25 days ago

Relational databases have also evolved. With JSON types you can now easily mix relations and documents.

Aloisius25 days ago

Couldn’t you do that with the XML type? I believe that was added to Postgres before MongoDB existed.

EMM_38625 days ago

> With MongoDB, we lost out on many nice features from the relational world ...

> ... this hurt in particular because our data was very much relational.

That is my best attempt at the summary.

The write-up is good, but after you've been around a long (long) time ... you know what this is likely going to involve.

Glad relational SQL is still going strong 50 years later.