Back

LinkedDataHub: The Knowledge Graph Notebook

103 points17 hoursgithub.com
jitl11 hours ago

What is the use-case for this software? From the README:

> We are building LinkedDataHub primarily for:

> researchers who need an RDF-native notebook that can consume and collect Linked Data and SPARQL documents and follows the FAIR principles

I would be interested in reading a user story of a few paragraphs about how this works. I don't know anyone working with RDF or SPARQL documents, but I'm curious about these technologies. Graphs are cool, and SPARQL has a certain appeal. Who is using these things already day-to-day?

> developers who are looking for a declarative full stack framework for Knowledge Graph application development, with out-of-the-box UI and API

I work on an application (https://notion.so) that would be better with more Knowledge Graph, but I don't need a framework. I'm curious what application developers approach the knowledge graph space looking for a "full stack framework". I presume most commercial developers would prefer to use their existing application tooling. Maybe academic researchers writing software for their lab?

>What makes LinkedDataHub unique is its completely data-driven architecture: applications and documents are defined as data, managed using a single generic HTTP API and presented using declarative technologies. The default application structure and user interface are provided, but they can be completely overridden and customized. Unless a custom server-side processing is required, no imperative code such as Java or JavaScript needs to be involved at all.

This kind of flexibility is intrinsically appealing to programmers, but the resulting user experience leaves a lot to be desired. Usually it's better to build a good product first, and then to extract the framework bits once they've proved productive. Otherwise you may end up with a framework that can do anything, but in a way nobody wants.

ta23891115 hours ago

In my ears, knowledge graph sounds a bit grandiloquent. I do not have a definition, but I know that when talking about knowledge as it is embodied in people, it's quite a subtle thing, hard to formalize and to be honest, something relatively rare.

Why can we just call these things fact databases?

Add. Knowledge evokes a lot of other associations as well, for example that what we are able to know changes over time. That a time has a certain underlying grid, into which certain factual stories appear and later disappear.

hobofan14 hours ago

> Why can we just call these things fact databases?

Because (in theory) they are much much more than that.

In practice the semantic web/data space has a problem of building complicated standard on top of complicated standard (as well as having a Java implementation monoculture, which doesn't help that). That also makes it hard to formalize all the non-trivial statements that are part of our knowledge.

And yes, there are subtle aspects to knowledge, that is usually not capturable easily in manually formalized knowledge graphs, but that's where pairing knowledge graphs with ML-based methods (e.g. vector search) can really shine.

pphysch14 hours ago

> Why can we just call these things fact databases?

Companies that want to reinvent/repackage and sell boring RDBMS tech

drpyser2212 hours ago

Its not rdbms though. It's RDF, triple store, graph-like data models.

pphysch11 hours ago

Those are easy to implement on top of RDBMS. Query performance is a different thing, which can only be evaluated on a case-by-case basis, but you can go a long way with good indexes.

A few companies need real time analytics on really big graphs. Most don't and shouldn't waste their time with fancy Google-scale databases.

nl10 hours ago

A Knowledge Graph is the data in the database, not the tech.

You can absolutely implement this in a RDBMS. There are some advantages to a proper graph database though.

But SPARQL is a dead-end - I don't think anyone is really using that in practice outside a dew public demonstration apps. To a large extent this is true of RDF too: triples are useful, RDF gets in the way.

ta9888 hours ago

Wikidata is a good example of something that works. But I agree there isn't much else.

afandian2 hours ago

Except the software it’s powered by, Blazegraph, by is deprecated (afaict the devs were poached by AWS to work on Neptune).

https://phabricator.wikimedia.org/T206560

tokinonagare16 hours ago

The list of dependencies is amazingly long for a product which seems to be a harder to use TiddlyWiki, or Neo4j UI for the graph viz part. It's crazy the SemWeb community still haven't give up given how much effort have been poured into it for so few results.

mark_l_watson15 hours ago

I access SPARQL endpoints from inside programs written (usually) in Common Lisp, Python, and Clojure.

LinkedDataHub looks cool enough for non-tech users, but I prefer working inside a repl/Slime/etc. interactive programming environment.

Also, Google, Facebook, most banks, etc., etc., use Knowledge Graphs - pretty solid technology.

PaulHoule15 hours ago

This package was designed to solve more problems than it creates

https://github.com/paulhoule/gastrodon

Overall I think of graph visualization as a problem, in particularly there are some people who just don't see that hairballs are incomprehensible

https://cambridge-intelligence.com/how-to-fix-hairballs/

teruakohatu14 hours ago

Large graphs (just about anything larger than a karate club social network [1]) can't usually be visualized in a useful manner. There are exceptions, but in real world applications they are more useful as pretty art than helping with understanding.

Statistical summary plots are more useful.

Maybe one day someone will figure something out, but much like scatter plots fall over when you plot vast amounts of raw data, so do plotting graphs.

[1] https://en.m.wikipedia.org/wiki/Zachary%27s_karate_club

PaulHoule14 hours ago

My answer to it is that graphs need to be manually curated. For example, a UML diagram for all the database tables on the system I am working on now would have to be printed out on a wall to make any sense, but if I picked out the tables involved in a new user registration that would be useful.

I went to an exhibit of this guy's works

https://en.wikipedia.org/wiki/Mark_Lombardi

and saw a series of drafts he'd made where he had drawn many different versions of a conspiracy social network and gradually went from a hairball to something that looked meaningful.

In terms of turning this into a tool there's the interesting problem that there is a graph that comes in from the outside world (and could be regenerated) and also data that represents the curation of the graph (Do I show this? What color is this line? What position does this node get displayed at?) You've got to be able to edit one independently of the other and deal with things sometimes getting out of sync to have a tool that advances over the state of the art.

thyrsus9 hours ago

My first intuition of a knowledge graph would be an IDE. If that's not right, how am I wrong? If it is a typical use, what IDE(like) examples are there? Org-mode is a tree instead of a general graph, but general graphs can be traversed as (sets of) trees. Is the tree discipline somehow important to understanding code?

squarecog12 hours ago

LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.

ta9888 hours ago

Graphs are great for querying drawing a query can really help explaining what you really want. But for results visualization as soon as you reach the hundredish nodes it becomes unbearable. There are tricks used by crime analyzis software for example where results are grouped in different nodes that can make it easier, but that's only good for when you don't have too many node types.

schemathings7 hours ago

Bummer, the demo app at https://kg.opendatahub.bz.it/ seems to be broken. The concept sounds like something I could use.

altilunium8 hours ago

I wish the installation process can be easier.

For now, i use either obsidian or graf[1] to manage my own knowledge graph.

[1] https://github.com/altilunium/graf

Devasta13 hours ago

Its honestly fantastic to see web pages that are using XSLT, is this the most advanced app out there using it these days?

jitl11 hours ago

What's good about XSLT? Is its ecosystem substantially better than alternative options like simple string templating a la https://pkg.go.dev/html/template?

stonogo3 hours ago

XSLT is wildly more than a templating engine. It can (and has) been used to e.g. specify a protocol and generate software based on it. See XCB for an example. With a sufficiently large corpus you can run queries on XML and generate arbitrary media.

As with most overbearingly flexible technology, it's an incredible pain in the ass to use efficiently, and XSLT processors tend to be plagued with complexity and concomitant performance problems.

nl10 hours ago

> What's good about XSLT?

Nothing. It was a bad choice in its heyday (I worked on some projects way back then).

> Is its ecosystem substantially better than alternative options

No.

kkfx14 hours ago

Mh... I'm an org-roam (org-mode/Emacs) user, witch have a similar feature and... I find such visualization honestly sugar-eye and useless.

Network analysis of notes links is fascinating, but must be actionable in some way, just having a UI means nothing. Also most noting tools miserably fails to really offer "easy atomic notes that can be combined (transcluded) and splitted as the user wish", some try structured ways (SPARQL/fixed formats alike) others try to offer some loose feature set to make anything possible but a real solution is still decades of development away IMO.

So far the best, witch means least worse, way I found to really analyze my notes is using org-mode drawers with relevant templates help for consistency to be queried via org-ql, witch means essentially key-value structured tagging of notes so I can see them in a timeline, I can see all notes about a URL, an author, a subject, a topic, ... unfortunately is a manual tedious process and at runtime is not that fast nor flexible.

Long story shorts vast approaches like Wikidata, classic libraries cataloguing techniques & tools, modern/old notes and relevant tools all work to a certain extent and fails thereafter.