Back

Show HN: Hydra (YC W22) – Serverless Analytics on Postgres

34 points6 hourshydra.so

Hi HN, Hydra cofounders (Joe and JD) here (https://www.hydra.so/)! We enable realtime analytics on Postgres without requiring an external analytics database.

Traditionally, this was unfeasible: Postgres is a rowstore database that’s 1000X slower at analytical processing than a columnstore database.

(A quick refresher for anyone interested: A rowstore means table rows are stored sequentially, making it efficient at inserting / updating a record, but inefficient at filtering and aggregating data. At most businesses, analytical reporting scans large volumes of events, traces, time-series data. As the volume grows, the inefficiency of the rowstore compounds: i.e. it's not scalable for analytics. In contrast, a columnstore stores all the values of each column in sequence.)

For decades, it was a requirement for businesses to manage these differences between the row and columnstore’s relative strengths, by maintaining two separate systems. This led to large gaps in both functionality and syntax, and background knowledge of engineers. For example, here are the gaps between Redshift (a popular columnstore) and Postgres (rowstore) features: (https://docs.aws.amazon.com/redshift/latest/dg/c_unsupported...).

We think there’s a better, simpler way: unify the rowstore and columnstore – keep the data in one place, stop the costs and hassle of managing an external analytics database. With Hydra, events, traces, time-series data, user sessions, clickstream, IOT telemetry, etc. are now accessible as a columnstore right alongside my standard rowstore tables.

Our solution: Hydra separates compute from storage to bring the analytics columnstore with serverless processing and automatic caching to your postgres database.

The term "serverless" can be a bit confusing, because a server always exists, but it means compute is ephemeral and spun up and down automatically. The database automatically provisions and isolates dedicated compute resources for each query process. Serverless is different from managed compute, where the user explicitly chooses to allocate and scale CPU and memory continuously, and potentially overpay during idle time.

How is serverless useful? It's important that every analytics query has its own resources per process. The major hurdles with running analytics on Postgres is 1) Rowstore performance 2) Resource contention. #2 is very often overlooked - but in practice, when analytics queries are run they tend to hog resources (RAM and CPU) from Postgres transactional work. So, a slightly expensive analytics query has the ability to slow down the entire database: that's why serverless is important: it guarantees the expensive queries are isolated and run on dedicated database resources per process.

why is hydra so fast at analytics? (https://tinyurl.com/hydraDBMS) 1) columnstore by default 2) metadata for efficient file-skipping and retrieval 3) parallel, vectorized execution 4) automatic caching

what’s the killer feature? hydra can quickly join columnstore tables with standard row tables within postgres with direct sql.

example: “segment events as a table.” Instead of dumping segment event data into a s3 bucket or external analytics database, use hydra to store and join events (clicks, signups, purchases) with user profile data within postgres. know your users in realtime: “what events predict churn?” or “which user will likely convert?” is immediately actionable.

Thanks for reading! We would love to hear your feedback and if you'd like to try Hydra now, we offer a $300 credit and 14-days free per account. We're excited to see how bringing the columnstore and rowstore side-by-side can help your project.

cultofmetatron4 hours ago

my team is currently looking into offloading some of our analytics data into a columnar database next year. hydra and clickhouse were the top ones on the list. would love a breakdown of how the two compare.

coatue4 hours ago

[Joe, Hydra cofounder] Hey, that's really great - I love hearing that. Hydra is a columnar database with an integrated Postgres rowstore. Analytics aren't purely best on columnar: we've heard from users that their analytics workload would benefit from fast lookup on row tables too, not just scanning large tables. Our goal for Hydra is to enable realtime analytics on Postgres without requiring an external analytics database. This makes it possible to join the rowstore and columnstore data in Postgres with direct SQL. Other analytics databases typically rely on ETL pipelines to move data out of Postgres, which depending on your scale, can become expensive and introduce delay.

cultofmetatron3 hours ago

from what you wrote above, it seems like a great value add for greenfield projects.

we currently use aws aurora. how easy would it be to simply sql dump and load into hydra and how well would it serve as a drop in replacement?

coatue3 hours ago

Close to a drop-in replacement since Aurora bills itself as Postgres. Any data you load into Hydra will automatically be converted into the columnstore! we're happy to help out and feel free to DM me directly.

pikdum3 hours ago

I feel like my ideal would be something more hybrid. It's pretty rare that I have a table that I decide upfront should be columnar. It's a lot more common that I want occasional analytics-like queries on my regular tables to not take forever.

coatue3 hours ago

[Joe, Hydra cofounder] That's good feedback. It's easy to change the default table type to rowstore "heap" (https://docs.hydra.so/guides/analytics#switching-the-default...).

We initiall set the rowstore as default, but people wouldn't create columnstore tables and were confused on why performance wasn't improving. So, figured this was cleaner, but you always have the option to switch the default table type back.

switchbak2 hours ago

Ory Hydra is a relatively high-profile project with a name collision, FYI.

VWWHFSfQ2 hours ago

there are a million open source products called hydra. I don't think any of them can really claim it exclusively

fourseventy3 hours ago

The homepage of this website does a bad job of explaining wtf Hydra actually does. Is it a database? Some type of serverless architecture? Ok analytics, but analytics about what, postgrs performance? Does 'analytics' mean that its for OLAP queries?

coatue3 hours ago

[Joe Hydra cofounder]. Hydra is a fast analytics db on Postgres. It's a database with both a row and columnstore. Analytics can mean reporting, metrics, customer-facing dashboards. Sounds like we should spend some time making analytics templates.

switchbak2 hours ago

I've run through the docs and it's really unclear how the compute model works. "Serverless" is nice, but how exactly is that managed?

mritchie7123 hours ago

is this using pg_duck?

coatue3 hours ago

[Joe, Hydra cofounder] Hey there, yes - we codeveloped pg_duckdb and it's what Hydra is built on top of!