Back

JJ: JSON Stream Editor

167 points1 daygithub.com
j1elo8 hours ago

I'll take the chance to bring attention to the maintenance issues that 'jq' has been having in the last years [1]; there hasn't been a new release since 2018, which IMO wouldn't necessarily be a bad thing if not for the fact that the main branch has been collecting improvements and bug fixes [2] since then.

A group of motivated users are currently talking about what direction to take; a fork is being considered in order to unlock new development and bug fixes [3]. Maybe someone reading this is able and willing to join their efforts.

[1]: https://github.com/stedolan/jq/issues/2305

[2]: https://github.com/stedolan/jq/pull/1697

[3]: https://github.com/stedolan/jq/issues/2550

mi_lk55 minutes ago

Looks like it's because @stedolan goes silent and not delegating the right GitHub repo accesses to the existing maintainers.

He seems to be working at Jane Street though, so if anyone is able to reach him please help the jq community :)

https://signals-threads.simplecast.com/episodes/memory-manag...

capableweb8 hours ago

What exactly is missing/broken in jq right now which warrants a fork? I've been using jq daily for years, and I can't remember the last time I hit a bug (must have been many years ago) and I can't recall any features I felt been missing for the years I've been using it.

For me it's kind of done. It could be faster, but then I tend to program a solution myself instead, otherwise I feel like it's Done Enough.

j1elo5 hours ago

I wouldn't say I need the program to grow with more features, but at the bare minimum they should have been more diligent with cutting releases after accepting bug fixes, instead of letting those contributions langish on the main development branch out of reach for users.

I mean it would be understandable if the maintainers didn't have the time to keep working on it at all, but clearly the review work was done to accept some patches so why not make .point releases to allow the fixed code reach users via their distribution's channels?

Calzifer6 hours ago

What I miss from jq and what is implemented but unreleased is platform independent line delimiters.

jq on Windows produces \r\n terminated lines which can be annoying when used with Cygwin / MSYS2 / WSL. The '--binary' option to not convert line delimiters is one of those pending improvements.

https://github.com/stedolan/jq/commit/0dab2b18d73e561f511801...

goranmoomin7 hours ago

> What exactly is missing/broken in jq right now which warrants a fork

AFAIK there’s quite a few bug fixes and features that are accumulated on the unreleased main branch, or opened as PRs but never merged.

IIRC I hit one of the bugs while trying to check whether an input document is valid JSON.

I should try checking out what’s happening to the fork, I’ve never opened a PR or something but I’ve read the source while trying to understand the jq language conceptually, and I’d say it’s quite elegant :)

strunz2 hours ago

The README fir Jj points out how it is exponentially faster than jq. Presumably some of those improvements would help this.

jjoonathan2 hours ago

> It could be faster

A decaffinated sloth could be faster.

harisamin2 hours ago

A while ago I wrote jlq, a utility explicitly querying/filtering jsonl/json log files. It’s powered by SQLite. Nice advantage is it can persist results to a sqlite database for later inspection or to pass around. Hope it helps someone :)

https://github.com/hamin/jlq

qhwudbebd10 hours ago

Interesting! I tend to use gron to bring JSON into (and out of) the line-based bailiwick of sed and awk where I'm most comfortable, rather than a custom query language like jq that I'd use much more rarely. But I guess that's at the opposite extreme of (in)efficiency than both this and the original jq.

There might be a nice 'edit just this path in-place in gron-style' recipe to be had out of jj/jq + gron together...

robertlagrant8 hours ago

Just looked up gron - thanks. This looks useful.

qhwudbebd10 hours ago

Are there any gron-like tools for xml? I'm aware it's a harder problem (and an increasingly rare problem) but perhaps someone has tackled it nonetheless?

bandie919 hours ago

xml2[1] turns xml into line-based output. and 2xml reverses.

[1] https://github.com/clone/xml2

cristoperb14 hours ago

I like jq, but jj is so fast it is my go-to for pretty printing large json blobs. Its parsing engine is available as a standalone go module, and I've used it in a few projects where I needed faster parsing than encoding/json:

https://github.com/tidwall/gjson

pcthrowaway14 hours ago

I don't think I've ever been limited by jq's speed, but good to know there are alternatives if it ever becomes a bottleneck.

Other than that I can't think of a reason to use this over jq; the query language is perhaps a bit more forgiving in some ways, but not as expressive as jq (and I've spent ~8 years getting pretty familiar with jq's quirks)

lathiat13 hours ago

The limiting speed factor of jq for me is, by far, figuring out how to write the expression I need to parse a fairly small amount of data. I do a bunch of support analysis and often writing a one-liner to put into a shell script to extract some bit of JSON to re-use later in the script. Often this is going to be used only once by me or a customer to run some task.

Followed closely by figuring out the path to the area of data I'm interested in. "gron" has been a real time saver there - it converts the json into single lines of key/value - so you can use grep and find the full path for any string.

Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there, but, I'm usually in the terminal doing a bunch of different tasks looking through all manor of command outputs, logs, etc :)

Relatedly my primary use of ChatGPT has been asking it to write jq queries for me, it's not too bad at getting close. It's biggest blindness seems to be string values with a dash, which you have to write as ["key-name"].

pdimitar8 hours ago

> Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there

Try https://jless.io/ then.

Simran-B9 hours ago

I agree that figuring out non-trivial jq expressions takes a lot of time, often accompanied with a consultation of the somewhat lacking docs, and some additional googling.

Nonetheless, it is pretty slow at processing data. For example, converting a 1 GB JSON array of objects to JSON Lines takes ages, if it works at all. Using the steaming features helps, but they are hard to comprehend. It gets memory consumption under control and doesn't take super long, but still way too long for such a trivial task IMO.

bobnamob7 hours ago

I’m far more likely to parse json into clojure repl session and go from there these days. Learning jq for the odd json manipulation I need to do seems like overkill

Dobbs8 hours ago

> Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there

I use an app called OK JSON on the mac for this. Its okay.

AeroNotix7 hours ago

emacs has a command to get the current path at point.

pdimitar5 hours ago

Which one is it exactly, please? I'd like to use it.

maleldil8 hours ago

Am I correct in understanding that this can only manipulate (get or set values) from a JSON path? That is, is it not a replacement for jq?

For example, I frequently use jq for queries like this:

    jq '.data | map(select(.age <= 25))' input.json
Or this:

    jq '.data | map(.country) | sort[]' input.json | uniq -c
Is it possible to do something similar with this tool?

This is not a slight at jj. Even if it's more limited than jq, it's still of great value if it means it's faster or more ergonomic for a subset of cases. I'm just trying to understand how it fits in my toolbox.

Willuminaughty1 hour ago

Hey there,

Just wanted to drop a quick note to say how much I'm loving jj. This tool is seriously a game-changer for dealing with JSON from the command line. It's super easy to use and the syntax is a no-brainer.

The fact that jj is a single binary with no dependencies is just the cherry on top. It's so handy to be able to take it with me wherever I go and plug it into whatever I'm working on.

And props to you for the docs - they're really well put together and made it a breeze to get up and running.

Keep up the awesome work! Can't wait to see where you take jj next.

Cheers

wvh5 hours ago

I've been using the gjson (get) and sjson (set) libraries this is based on for many years in Go code to avoid deserialising JSON responses. Those libraries act on a byte array and can get only the value(s) you want without creating structs and other objects all over the place, giving you a speed bump and less allocations if all you need is a simple value. It's been working well.

This program could be an alternative to jq for simple uses.

BiteCode_dev1 day ago

For those wondering, the README states it's a lot faster than JQ, which may be the selling point.

nigeltao13 hours ago

jj is faster than jq.

However, jsonptr is even faster and also runs in a self-imposed SECCOMP_MODE_STRICT sandbox (very secure; also implies no dynamically allocated memory).

  $ time cat citylots.json | jq -cM .features[10000].properties.LOT_NUM
  "091"
  real  0m4.844s
  
  $ time cat citylots.json | jj -r features.10000.properties.LOT_NUM
  "091"
  real  0m0.210s

  $ time cat citylots.json | jsonptr -q=/features/10000/properties/LOT_NUM
  "091"
  real  0m0.040s
jsonptr's query format is RFC 6901 (JSON Pointer). More details are at https://nigeltao.github.io/blog/2020/jsonptr.html
zokier6 hours ago

Looks neat. One suggestion: add better build instructions on wuffs readme/getting started guide. I jumped in and tried to build it using the "build-all.sh" script that seemed convenient, but gave up (for now) after nth build failure due yet another missing dependency. It's extra painful because the build-all.sh is slow, so maybe also consider some proper build automation tool (seeing this is goog project, maybe bazel?)?

rektide13 hours ago

Presumably the memory footprint is often far less too.

Rygian7 hours ago

This behaviour looks confusing to me:

$ echo '{"name":{"first":"Tom","middle":"null","last":"Smith"}}' | jj name.middle

null

$ echo '{"name":{"first":"Tom","last":"Smith"}}' | jj name.middle

null

It can be avoided with option '-r' which should be the default, but is not.

planede6 hours ago

I don't get this behavior for your second command, it just seems to return an empty string.

edit:

There are three cases to cover:

1. The value at the path exists and not null.

2. The value at the path exists and is null.

3. The value at the path doesn't exist.

jj seems to potentially confuse 1 and 2 without the -r flag. "middle": "null" and "middle": null more specifically. It probably confuses "middle": "" and missing value as well, that's 1 and 3.

notorandit6 hours ago

Interesting. How often do you manipulate a 1+MB JSON file? Maybe I am wrong, but going from 0.01s to 0.001s doesn't motivate me to switch to jj.

untech4 hours ago

Datasets are often stored in (sometimes gzipped) jsonlines format in my field (NLP). The file size could reach 100s of GBs.

notorandit55 minutes ago

100s of GBs?

In those cases, querying un-indexed files seems quite a thinko. Even if you can fit it all in RAM.

If you only scan that monstrous file sequentially, then you don't need either jq or jj or any other "powerful" tool. Just read/write it sequentially.

If you need to make complex scans and queries, I suspect a database is better suited.

asadm13 hours ago

I wish this existed when I was trying to look at 20G of firebase database JSON dump.

vmfunction10 hours ago

that is what gets me, why did the file get to 20g? At that point just ship a SQLite file.

capableweb8 hours ago

Does it matter why? Sometimes files gets big, and you don't control the generation or trying to change the generation is a bigger task than just dealing with a "big" (I'd argue 20GB isn't that big anyways) file with standard tools.

notorandit54 minutes ago

Nope, it matters a lot! Unstructured unindexed files get that gig usually as the result of some design flaw.

Self-Perfection12 hours ago

I would like to see a comparison with jshon. Jshon is way faster than jq and for many years available in your distro repositories.

Alifatisk10 hours ago

Cool, didn’t know about jshon, how’s the query language?

Self-Perfection9 hours ago

Almost non-existing. A couple of excerpts from man page:

  {"a":1,"b":[true,false,null,"str"],"c":{"d":4,"e":5}}
  jshon [actions] < sample.json
  jshon -e c -> {"d":4,"e":5}
  jshon -e c -e d -u -p -e e -u -> 4 5
Yet this covers like ~50% of possible use cases for jq.
listenallyall9 hours ago

Is this the SAX of JSON?