Back

Dataflow, a self-hosted Observable notebook editor

205 points3 yearsobservablehq.com
simonw3 years ago

This project looks fantastic.

I adore Observable notebooks, but the one thing that makes me hesitate in using them for everything is that the editor component itself is closed-source and only available on https://observablehq.com/

They're great open source ecosystem supporters - they released their runtime, their parser, their standard library and all sorts of other stuff through https://github.com/observablehq - but the editor itself is their proprietary sauce.

I totally support their decision on this - it's what they're building their business around, and I want them to be successful. But as a user it does give me pause.

This project from Alex Garcia looks like a fix for exactly that. Having more-than-one editor for their notebook format (and an open source option a that) resolves my hesitancy in leaning hard into their ecosystem.

I don't even see it as a competitor to ObservableHQ - the hosted Observable editor has collaboration features that don't even make sense for a local running version.

Plus, Dataflow has some great ideas of its own - in particular the live file attachments thing.

edtechdev3 years ago

Yeah the lack of open source prevented me from committing to observable, too, so I look forward to trying dataflow out.

Just in case this is of interest to others, some other open source browser-based computational notebook tools include:

* Starboard https://starboard.gg/ * And of course there's always Jupyter, but it requires a server component

And this isn't the same thing, more of a javascript playground (open source alternative to codepen and the like), but see also Slingcode: https://slingcode.net/

kragen3 years ago

Thank you for the awesome recommendations! Note that Dataflow isn't open source yet, though.

kragen3 years ago
kragen3 years ago

I agree that the proprietary editor has been a showstopper for what is otherwise a very appealing advance in programming environments; many of us are old enough to have learned the hard way not to base our careers on proprietary infrastructure.

But it's not clear that this new project is a fix for that: there's no license file or licensing notice on https://github.com/asg017/dataflow and no mention of licensing in https://observablehq.com/@asg017/introducing-dataflow. So, unfortunately, under Berne Convention copyright laws (which is most of them nowadays) the software is by default restricted by copyright, and looking at it may put you at legal risk, because access plus substantial similarity is deemed to prove copying.

Now, possibly that isn't their intention; https://github.com/asg017/dataflow/blob/main/package.json does say "license": "ISC", so maybe it's just an oversight. But I'd like to see a much clearer and more unambiguous statement of their intent to irrevocably commit Dataflow to an open source license before touching it.

alexgarcia-xyz3 years ago

hey author here, you're right, completely missed adding a permissive license. Just added an explicit MIT license with the latest version, thanks for bringing it up!

kragen3 years ago

That's wonderful! A significant step toward fully automated luxury gay space communism! Thank you!

kragen3 years ago

It's disappointing to see an expression of gratitude for the expeditious resolution of a licensing hiccup downvoted to -4. I'm left to speculate on why so many people responded by downvoting it as if it were spam.

Possibility 1: they are opposed to full automation?

Possibility 2: they don't think software licensing is a potential obstacle to full automation or sufficiently equitable access to the resulting abundance?

Possibility 3: they just have no idea what fully automated luxury gay space communism is, and lack the curiosity to look it up, and so they react without thinking?

Possibility 4: they hate luxury or gay people, so they object on principle to the vision of the future embodied in the phrase?

Possibility 5: they don't think human-factors improvements in software development environments rise to the level of importance implied by my thanks? (But if so, why are they reading this thread at all?)

I'm curious what could possibly motivate this kind of astoundingly hostile reaction.

thirtyseven3 years ago

I know "dataflow" is kind of a generic name, but the authors might want to consider that there is already a 7 year old Google Cloud product for running data pipelines called Dataflow.

rectang3 years ago

And an entire programming discipline.

https://en.wikipedia.org/wiki/Dataflow_programming

taftster3 years ago

Came here to post the same comment. Exactly right. There are lots of projects that use the term "dataflow".

To add to this, the name of this product is confusing given the context and usecase shown. I assume "dataflow" to the author means the ability to watch data being rendered on a page?

To "big data" folks (like myself), the term "dataflow" tends to represent the routing and processing of data streams along an information pipeline. Not anything to do with a visual representation of a dynamic notebook.

dataflow3 years ago

> I know "dataflow" is kind of a generic name

Well that's a bummer. And here I thought I was being very unique :-)

axiosgunnar3 years ago

I know this sounds very Reddit-like, but

> created: October 15, 2012

nice :D

kragen3 years ago

They also named their whole platform "Observable", as in, extends java.util.Observable, and the equivalent in any other popular OO language. To disambiguate I normally call it "ObservableHQ", which is their domain name; I don't know what to do to disambiguate "Dataflow". "Asg017dataflow"? "Garcia Dataflow"? "Observable Dataflow"? "Issue 9 Dataflow"?

Still, much more significant is the fact that it seems to be such terrific software that this is a discussion worth having, because it's going to be very influential.

keeganj3 years ago

I'm not a data scientist, but I've been interested in the idea of a "code notebook" ever since Jupyter hit it big. I write mostly in JS/TS for application logic, so this looks like it could be really useful.

Related, does anyone have any recommendations of a (Postgres) SQL "notebook"? I don't really need any visualizations, more just a markdown integrated doc that allows me to lay out the different queries I use to answer a question.

javierluraschi3 years ago

For viz/DS/ML/AI with JS/TS is either observablehq or and IDE with custom extensions; this project looks relevant if you are already into observablehq.

Shameless plug, we are building a few tools for JS to narrow down this gap as well: - https://hal9.ai (Drag&Drop / IDE) - https://marketplace.visualstudio.com/items?itemName=Hal9.hal... (VSCode extension) - https://observablehq.com/@javierluraschi/running-nodejs-in-o... (ObservableHQ extension)

Would love to chat if you are interested in providing feedback, I'm in javier at hal9.ai. Cheers.

natrys3 years ago

Emacs and Org-mode has great integration with multiple SQL implementations including Postgres (via org-babel). Org-mode tables are pretty neat, and you can have query result directly populated into tables. Read this blogpost if you are interested:

https://fluca1978.github.io/2021/01/18/PostgreSQLLiteratePro...

Hasnep3 years ago

Rmarkdown notebooks can contain SQL chunks, so you'd only need to use R to configure the connection. [1]

[1] https://bookdown.org/yihui/rmarkdown/language-engines.html#s...

keeganj3 years ago

I didn't know you could write SQL directly in Rmarkdown like this, very interesting. Thanks!

pbowyer3 years ago

Same, when I've read the docs I've always got the impression that it was R only supported.

PuddleOfSausage3 years ago

There are loads more supported via knitr. Scroll to the top of that linked page in this thread for the list.

amcaskill3 years ago

I am working on a SQL-in-markdown reporting tool called evidence.

It’s feels like a markdown doc that runs SQL.

https://evidence.dev/

keeganj3 years ago

This is almost exactly what I was imagining. Just subscribed to updates, very interested to see what this becomes!

simonw3 years ago

Weirdly my Django SQL Dashboard project may fit the bill a bit here: you can build up a "dashboard" (which is a tiny bit notebook-like if you squint at it the right way) with multiple SQL queries on it, and save that either as a bookmark or as a "saved dashboard" with a URL.

https://django-sql-dashboard.datasette.io/

In my own work I've been using it for the kind of things that I would normally use a Jupyter notebook for - gathering together research on problems I'm trying to solve.

keeganj3 years ago

Interesting take, I'm not deep in the python ecosystem, but this looks like it's lightweight enough to function as a refreshable notebook. Will give this a try, thanks!

qbasic_forever3 years ago

I like the ipython-sql magic in Jupyter: https://github.com/catherinedevlin/ipython-sql Depending on what you're doing you might be able to get away entirely with just using it and some basic queries, i.e. no python glue code in the notebook at all. But worst case you might need a cell to open up the DB connection and make the magic aware of it, then you can execute clean and simple SQL queries in cells using the magic.

thejosh3 years ago

Yeah ipython-sql is great and works well, and can use an environment variable for the connection string.

okennedy3 years ago

It's based on Spark rather than Postgresql directly, but I'm part of an effort to build a workflow system disguised as a notebook callled Vizier [1]. SQL is a first-class primitive in Vizier, and the notebook plays nice with postgres (you can load from and unload to postgres using Spark's native data loader).

[1] https://vizierdb.info

RocketSyntax3 years ago

Lots of jupyter magic `%` commands for that already https://www.datacamp.com/community/tutorials/sql-interface-w...

sixdimensional3 years ago

Apache Zeppelin is one open source option - https://zeppelin.apache.org.

gradys3 years ago

Maybe just a Python notebook with a Postgres client library and some helper functions to keep the amount of Python in the main body to a minimum?

shapiromatron3 years ago

re: sql notebook, this came up a few months ago and worked great when I played around with it: https://blog.jupyter.org/an-sql-solution-for-jupyter-ef4a00a.... It's just a different kernel you can install to an existing jupyter instance.

robertlacok3 years ago

Deepnote has native Postgres cells :) you can mix them with Python too.

Disclaimer - I work there :)

Siira3 years ago

org-babel should fit the bill.

d--b3 years ago

I am also working on an alternative: https://www.jigdev.com

It’s the same idea except that cells are spread out on a 2d canvas with tabs similar to excel.

mistidoi3 years ago

As a total Observable/Bostock stan who works with HIPAA protected data, I love this.

Galanwe3 years ago

Does anyone know of a good reusable jupyter front-end?

I have a farm of jupyter kernels that I can run on demand, and would like to integrate a UI for these kernels on my React website.

I've had a look at the Jupyter default UI but it uses Luminos components which are basically not compatible with React.

Also had a look at nteract components, but their projects seem dead.

Anyone working on something similar? Clean react components to act as UI for the jupyter protocol.

nautilus123 years ago

I see all these notebooks products and I honestly don't know how any of them plan to compete with AWS...no body wants self hosted anymore, everyone just wants to pay AWS or databricks for it.

Can other people chime in? Maybe i'm just working at the wrong place.

simonw3 years ago

https://observablehq.com/ is a cloud hosted platform already.

This thing - Dataflow - is an open source run-on-your-own-machine alternative to the official Observable hosted solution, taking advantage of the fact that Observable itself is JavaScript code with some special sauce that's available as open source runtime/parser libraries.

qbasic_forever3 years ago

It's running on localhost here and I presume that's their intended use case for this feature. Localhost is critical for development--imagine if VS code wouldn't work unless you were connected to Github.com. This is fixing that issue with observable notebooks so now you can run and develop your notebook locally without depending directly on the internet or their cloud service.

nautilus123 years ago

I'm increasingly seeing companies embrace "dont develop locally". Personally the idea of having to sign into AWS console to develop makes me cringe, but i'm seeing more people just be ok with it.

simonw3 years ago

Having been responsible for the shared local development environment system at a 100+ engineer company I can tell you exactly why: the amount of time and money wasted fixing individual developer environments is astronomical.

If someone's environment isn't working, having a button they can click to get a brand new working one in the cloud is an enormous time-saver.

nautilus123 years ago

I guess it's comparable to having a corporate uniform or getting to wear what you want

FormFollowsFunc3 years ago

I've been looking for something like this for data vis exploration. Compared to Observable accessing local data files is more convenient. Currently I use a Jupyter notebook along with Pandas and Matplotlib. I'm not a huge fan of Matplotlib so I would prefer to use Plot or Vega Lite API and Pandas could be replaced with Danfo.js or Arquero.

RocketSyntax3 years ago

Help me understand what the page being rendered is doing. Is that like an interactive app you are serving for user input?

qbasic_forever3 years ago

It's an observable notebook: https://observablehq.com/ Basically a notebook where you write JS code and see the results immediately rendered in the notebook. In this case it's being served locally instead of requiring you to use their service website. If you've ever used Jupyter or IPython this is very similar (code notebooks) but with some interesting changes in philosophy and more of a Javascript implementation instead of python.

What might be tripping you up is that in this demo the observable notebook isn't showing the code cells, only the outputs. The code is in the editor on the left and the output on the right is the result of running the code as an observable notebook. In some ways it is like a simple interactive web app.

Isthatablackgsd3 years ago

Is that similar concept to Overleaf for LaTeX?

kragen3 years ago

LaTeX doesn't really support interactive data visualizations or reactive re-rendering, and it's a pretty difficult environment to do things like read a CSV data file and do a linear regression in. Observablehq is closer to Jupyter, Excel, R Studio, Octave, or Tk than LaTeX.

+2
Isthatablackgsd3 years ago
RocketSyntax3 years ago

omg. i had no idea observable was js-focused. i always thought it was another R/python competitor.

chrisweekly3 years ago

OK! I can't put off creating an observablehq acct any longer.

... Done. Stoked to dive in this weekend!

lejohnq3 years ago

This is pretty awesome. Feels like a streamlit for the javascript world.

whoevercares3 years ago

How does this related to data flow or it’s just a brand name