LinkedDataHub: The Knowledge Graph Notebook

ta238911 • 3 years ago

In my ears, knowledge graph sounds a bit grandiloquent. I do not have a definition, but I know that when talking about knowledge as it is embodied in people, it's quite a subtle thing, hard to formalize and to be honest, something relatively rare.

Why can we just call these things fact databases?

Add. Knowledge evokes a lot of other associations as well, for example that what we are able to know changes over time. That a time has a certain underlying grid, into which certain factual stories appear and later disappear.

hobofan • 3 years ago

> Why can we just call these things fact databases?

Because (in theory) they are much much more than that.

In practice the semantic web/data space has a problem of building complicated standard on top of complicated standard (as well as having a Java implementation monoculture, which doesn't help that). That also makes it hard to formalize all the non-trivial statements that are part of our knowledge.

And yes, there are subtle aspects to knowledge, that is usually not capturable easily in manually formalized knowledge graphs, but that's where pairing knowledge graphs with ML-based methods (e.g. vector search) can really shine.

pphysch • 3 years ago

> Why can we just call these things fact databases?

Companies that want to reinvent/repackage and sell boring RDBMS tech

nl • 3 years ago

A Knowledge Graph is the data in the database, not the tech.

You can absolutely implement this in a RDBMS. There are some advantages to a proper graph database though.

But SPARQL is a dead-end - I don't think anyone is really using that in practice outside a dew public demonstration apps. To a large extent this is true of RDF too: triples are useful, RDF gets in the way.

namedgraph • 3 years ago

That's just bullshit. Stop spreading FUD.

We participated in a huge RfP for a pharma company which planned RDF KG infrastructure for the next couple of years with 500 billion triple capabilities.

Biomedical, finance, defence, automotive -- all of those industries are using RDF/SPARQL. Just because your problems are not big or complex enough doesn't mean this tech is not used. It takes a certain organization size for Knowledge Graphs to make sense and pay off, that's why most industry users are Fortune 500-level companies.

ta988 • 3 years ago

Wikidata is a good example of something that works. But I agree there isn't much else.

afandian • 3 years ago

drpyser22 • 3 years ago

Its not rdbms though. It's RDF, triple store, graph-like data models.

pphysch • 3 years ago

Those are easy to implement on top of RDBMS. Query performance is a different thing, which can only be evaluated on a case-by-case basis, but you can go a long way with good indexes.

A few companies need real time analytics on really big graphs. Most don't and shouldn't waste their time with fancy Google-scale databases.

namedgraph • 3 years ago

Google started calling it Knowledge Graph 10 years ago: https://en.wikipedia.org/wiki/Google_Knowledge_Graph Then everyone else followed.

jitl • 3 years ago

What is the use-case for this software? From the README:

> We are building LinkedDataHub primarily for:

> researchers who need an RDF-native notebook that can consume and collect Linked Data and SPARQL documents and follows the FAIR principles

I would be interested in reading a user story of a few paragraphs about how this works. I don't know anyone working with RDF or SPARQL documents, but I'm curious about these technologies. Graphs are cool, and SPARQL has a certain appeal. Who is using these things already day-to-day?

> developers who are looking for a declarative full stack framework for Knowledge Graph application development, with out-of-the-box UI and API

I work on an application (https://notion.so) that would be better with more Knowledge Graph, but I don't need a framework. I'm curious what application developers approach the knowledge graph space looking for a "full stack framework". I presume most commercial developers would prefer to use their existing application tooling. Maybe academic researchers writing software for their lab?

>What makes LinkedDataHub unique is its completely data-driven architecture: applications and documents are defined as data, managed using a single generic HTTP API and presented using declarative technologies. The default application structure and user interface are provided, but they can be completely overridden and customized. Unless a custom server-side processing is required, no imperative code such as Java or JavaScript needs to be involved at all.

This kind of flexibility is intrinsically appealing to programmers, but the resulting user experience leaves a lot to be desired. Usually it's better to build a good product first, and then to extract the framework bits once they've proved productive. Otherwise you may end up with a framework that can do anything, but in a way nobody wants.

capevace • 3 years ago

So I don’t personally have many use cases for RDF-type data, but I plan on implementing RDF data endpoints in a music library app I’m building.

I suppose RDF thrives in the academic space, whereas userspace suffers from a chicken and egg type problem. There aren’t many common services available that have public RDF endpoints, so few applications using them get built.

Edit: I suppose that’s what LinkedDatHub provides then, a way for researchers to build API-transformers into their graph, so they can then use it with SPARQL.

namedgraph • 3 years ago

It doesn't have many ETL features, but it does support CSV import.

What kind of data are you looking to transform?

capevace • 3 years ago

Basically I’m combining your local music library with data from streaming services so you have unified playlists across them.

So basically I’m having to match a bunch of resources like songs together from different APIs.

namedgraph • 3 years ago

LinkedDataHub was extracted from the common code from a number of Linked Data projects that we have done in different domains.

It can be used as a framework but it's a standalone application as well, because it provides the default built-in ontologies as well UI for Linked Data and SPARQL consumption.

Disclaimer: I'm the main developer.

jitl • 3 years ago

What were the projects you did? I think concrete examples would help me understand the software more.

namedgraph • 3 years ago

https://atomgraph.com/cases/

jitl • 3 years ago

Thanks!

tokinonagare • 3 years ago

The list of dependencies is amazingly long for a product which seems to be a harder to use TiddlyWiki, or Neo4j UI for the graph viz part. It's crazy the SemWeb community still haven't give up given how much effort have been poured into it for so few results.

mark_l_watson • 3 years ago

I access SPARQL endpoints from inside programs written (usually) in Common Lisp, Python, and Clojure.

LinkedDataHub looks cool enough for non-tech users, but I prefer working inside a repl/Slime/etc. interactive programming environment.

Also, Google, Facebook, most banks, etc., etc., use Knowledge Graphs - pretty solid technology.

namedgraph • 3 years ago

You do realize this is an open-source project? And you are comparing with a product by Neo4J who got $300M VC investment?

The enterprise Knowledge Graphs (yes, it's the same SemWeb tech stack in principle) in Fortune 500 sized companies have in-house platforms that present the graph to the end-users, with entity browsers, analytics, dashboards, structured content etc. LinkedDataHub is an attempt to bootstrap an open-source, standards-driven version of that.

PaulHoule • 3 years ago

This package was designed to solve more problems than it creates

https://github.com/paulhoule/gastrodon

Overall I think of graph visualization as a problem, in particularly there are some people who just don't see that hairballs are incomprehensible

https://cambridge-intelligence.com/how-to-fix-hairballs/

teruakohatu • 3 years ago

Large graphs (just about anything larger than a karate club social network [1]) can't usually be visualized in a useful manner. There are exceptions, but in real world applications they are more useful as pretty art than helping with understanding.

Statistical summary plots are more useful.

Maybe one day someone will figure something out, but much like scatter plots fall over when you plot vast amounts of raw data, so do plotting graphs.

[1] https://en.m.wikipedia.org/wiki/Zachary%27s_karate_club

PaulHoule • 3 years ago

My answer to it is that graphs need to be manually curated. For example, a UML diagram for all the database tables on the system I am working on now would have to be printed out on a wall to make any sense, but if I picked out the tables involved in a new user registration that would be useful.

I went to an exhibit of this guy's works

https://en.wikipedia.org/wiki/Mark_Lombardi

and saw a series of drafts he'd made where he had drawn many different versions of a conspiracy social network and gradually went from a hairball to something that looked meaningful.

In terms of turning this into a tool there's the interesting problem that there is a graph that comes in from the outside world (and could be regenerated) and also data that represents the curation of the graph (Do I show this? What color is this line? What position does this node get displayed at?) You've got to be able to edit one independently of the other and deal with things sometimes getting out of sync to have a tool that advances over the state of the art.

kkfx • 3 years ago

Mh... I'm an org-roam (org-mode/Emacs) user, witch have a similar feature and... I find such visualization honestly sugar-eye and useless.

Network analysis of notes links is fascinating, but must be actionable in some way, just having a UI means nothing. Also most noting tools miserably fails to really offer "easy atomic notes that can be combined (transcluded) and splitted as the user wish", some try structured ways (SPARQL/fixed formats alike) others try to offer some loose feature set to make anything possible but a real solution is still decades of development away IMO.

So far the best, witch means least worse, way I found to really analyze my notes is using org-mode drawers with relevant templates help for consistency to be queried via org-ql, witch means essentially key-value structured tagging of notes so I can see them in a timeline, I can see all notes about a URL, an author, a subject, a topic, ... unfortunately is a manual tedious process and at runtime is not that fast nor flexible.

Long story shorts vast approaches like Wikidata, classic libraries cataloguing techniques & tools, modern/old notes and relevant tools all work to a certain extent and fails thereafter.

namedgraph • 3 years ago

Graph layout is just one of multiple layout modes. See here for more screenshots: https://atomgraph.github.io/LinkedDataHub/

thyrsus • 3 years ago

My first intuition of a knowledge graph would be an IDE. If that's not right, how am I wrong? If it is a typical use, what IDE(like) examples are there? Org-mode is a tree instead of a general graph, but general graphs can be traversed as (sets of) trees. Is the tree discipline somehow important to understanding code?

squarecog • 3 years ago

LinkedDataHub, a "RDF-native notebook", is not to be confused with LinkedIn DataHub, which is a metadata store/crawler/ui for your data systems: https://datahubproject.io/.

altilunium • 3 years ago

I wish the installation process can be easier.

For now, i use either obsidian or graf[1] to manage my own knowledge graph.

[1] https://github.com/altilunium/graf

ta988 • 3 years ago

Graphs are great for querying drawing a query can really help explaining what you really want. But for results visualization as soon as you reach the hundredish nodes it becomes unbearable. There are tricks used by crime analyzis software for example where results are grouped in different nodes that can make it easier, but that's only good for when you don't have too many node types.

namedgraph • 3 years ago

At least in LinkedDataHub, the graph layout is only one of layout modes, together with lists, tables, charts, maps etc.

Check the GH page for more screenshots: https://atomgraph.github.io/LinkedDataHub/

Devasta • 3 years ago

Its honestly fantastic to see web pages that are using XSLT, is this the most advanced app out there using it these days?

jitl • 3 years ago

What's good about XSLT? Is its ecosystem substantially better than alternative options like simple string templating a la https://pkg.go.dev/html/template?

stonogo • 3 years ago

XSLT is wildly more than a templating engine. It can (and has) been used to e.g. specify a protocol and generate software based on it. See XCB for an example. With a sufficiently large corpus you can run queries on XML and generate arbitrary media.

As with most overbearingly flexible technology, it's an incredible pain in the ass to use efficiently, and XSLT processors tend to be plagued with complexity and concomitant performance problems.

namedgraph • 3 years ago

No complaints about the Saxon processors here (we're using Saxon-HE server-side and Saxon-JS client-side). The XSLT standards are excellent, as is the quality of Saxon implementations.

nl • 3 years ago

> What's good about XSLT?

Nothing. It was a bad choice in its heyday (I worked on some projects way back then).

> Is its ecosystem substantially better than alternative options

No.

namedgraph • 3 years ago

Absolutely. XSLT is a data transformation technology, not a template language.

jitl • 3 years ago

Can you describe a task that XSLT makes substantially easier to build/more correct/faster to execute? Saying “yep its good” gives an opinion but after looking at XSLT docs I am not “getting it”. Why do I want this? I transform data all the time with a bash script, is XSLT like bash?

namedgraph • 3 years ago

I use it in ETL whenever I have an XML source, then I use XSLT to lift it to RDF (either RDF/XML or TriX). I use it for the UI where I'm transforming RDF/XML to HTML. I'm also using for the interactive parts instead of Javascript (or React or Svelte etc.), but that's the interactive XSLT extension that goes beyond the standard.

XML to XML, XML to RDF, JSON to XML and XML to JSON, XML to text -- XSLT can be used for all kinds of transformations. 3.0 also supports streaming transforms, which is very useful for large input files.

XSLT is a declarative DSL made specifically for the XML data model. It does limited things such as navigating the XML tree but does them really well. It lifts the abstraction level so you can focus on the transformation. You can transform XML with bash or a general purpose language like Java, but it will never be so concise or effective.

namedgraph • 3 years ago

It certainly has one of the largest interactive XSLT codebases that use Saxon-JS: https://www.saxonica.com/saxon-js/

namedgraph • 3 years ago

Just FYI, there are more screenshots on this GH page: https://atomgraph.github.io/LinkedDataHub/

schemathings • 3 years ago

Bummer, the demo app at https://kg.opendatahub.bz.it/ seems to be broken. The concept sounds like something I could use.

namedgraph • 3 years ago

The endpoint served by our partners went down.

Can you ping me at martynas@atomgraph.com? Would be appreciated.

xcombelle • 3 years ago

no https://query.wikidata.org ?