I'll take the chance to bring attention to the maintenance issues that 'jq' has been having in the last years [1]; there hasn't been a new release since 2018, which IMO wouldn't necessarily be a bad thing if not for the fact that the main branch has been collecting improvements and bug fixes [2] since then.
A group of motivated users are currently talking about what direction to take; a fork is being considered in order to unlock new development and bug fixes [3]. Maybe someone reading this is able and willing to join their efforts.
[1]: https://github.com/stedolan/jq/issues/2305
A while ago I wrote jlq, a utility explicitly querying/filtering jsonl/json log files. It’s powered by SQLite. Nice advantage is it can persist results to a sqlite database for later inspection or to pass around. Hope it helps someone :)
Interesting! I tend to use gron to bring JSON into (and out of) the line-based bailiwick of sed and awk where I'm most comfortable, rather than a custom query language like jq that I'd use much more rarely. But I guess that's at the opposite extreme of (in)efficiency than both this and the original jq.
There might be a nice 'edit just this path in-place in gron-style' recipe to be had out of jj/jq + gron together...
Just looked up gron - thanks. This looks useful.
Are there any gron-like tools for xml? I'm aware it's a harder problem (and an increasingly rare problem) but perhaps someone has tackled it nonetheless?
xml2[1] turns xml into line-based output. and 2xml reverses.
I like jq, but jj is so fast it is my go-to for pretty printing large json blobs. Its parsing engine is available as a standalone go module, and I've used it in a few projects where I needed faster parsing than encoding/json:
I don't think I've ever been limited by jq's speed, but good to know there are alternatives if it ever becomes a bottleneck.
Other than that I can't think of a reason to use this over jq; the query language is perhaps a bit more forgiving in some ways, but not as expressive as jq (and I've spent ~8 years getting pretty familiar with jq's quirks)
The limiting speed factor of jq for me is, by far, figuring out how to write the expression I need to parse a fairly small amount of data. I do a bunch of support analysis and often writing a one-liner to put into a shell script to extract some bit of JSON to re-use later in the script. Often this is going to be used only once by me or a customer to run some task.
Followed closely by figuring out the path to the area of data I'm interested in. "gron" has been a real time saver there - it converts the json into single lines of key/value - so you can use grep and find the full path for any string.
Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there, but, I'm usually in the terminal doing a bunch of different tasks looking through all manor of command outputs, logs, etc :)
Relatedly my primary use of ChatGPT has been asking it to write jq queries for me, it's not too bad at getting close. It's biggest blindness seems to be string values with a dash, which you have to write as ["key-name"].
> Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there
Try https://jless.io/ then.
I agree that figuring out non-trivial jq expressions takes a lot of time, often accompanied with a consultation of the somewhat lacking docs, and some additional googling.
Nonetheless, it is pretty slow at processing data. For example, converting a 1 GB JSON array of objects to JSON Lines takes ages, if it works at all. Using the steaming features helps, but they are hard to comprehend. It gets memory consumption under control and doesn't take super long, but still way too long for such a trivial task IMO.
I’m far more likely to parse json into clojure repl session and go from there these days. Learning jq for the odd json manipulation I need to do seems like overkill
> Switching to a GUI to browse the JSON that would let you copy the path to the current value would probably also help there
I use an app called OK JSON on the mac for this. Its okay.
emacs has a command to get the current path at point.
Which one is it exactly, please? I'd like to use it.
Am I correct in understanding that this can only manipulate (get or set values) from a JSON path? That is, is it not a replacement for jq?
For example, I frequently use jq for queries like this:
jq '.data | map(select(.age <= 25))' input.json
Or this: jq '.data | map(.country) | sort[]' input.json | uniq -c
Is it possible to do something similar with this tool?This is not a slight at jj. Even if it's more limited than jq, it's still of great value if it means it's faster or more ergonomic for a subset of cases. I'm just trying to understand how it fits in my toolbox.
Hey there,
Just wanted to drop a quick note to say how much I'm loving jj. This tool is seriously a game-changer for dealing with JSON from the command line. It's super easy to use and the syntax is a no-brainer.
The fact that jj is a single binary with no dependencies is just the cherry on top. It's so handy to be able to take it with me wherever I go and plug it into whatever I'm working on.
And props to you for the docs - they're really well put together and made it a breeze to get up and running.
Keep up the awesome work! Can't wait to see where you take jj next.
Cheers
I've been using the gjson (get) and sjson (set) libraries this is based on for many years in Go code to avoid deserialising JSON responses. Those libraries act on a byte array and can get only the value(s) you want without creating structs and other objects all over the place, giving you a speed bump and less allocations if all you need is a simple value. It's been working well.
This program could be an alternative to jq for simple uses.
For those wondering, the README states it's a lot faster than JQ, which may be the selling point.
jj is faster than jq.
However, jsonptr is even faster and also runs in a self-imposed SECCOMP_MODE_STRICT sandbox (very secure; also implies no dynamically allocated memory).
$ time cat citylots.json | jq -cM .features[10000].properties.LOT_NUM
"091"
real 0m4.844s
$ time cat citylots.json | jj -r features.10000.properties.LOT_NUM
"091"
real 0m0.210s
$ time cat citylots.json | jsonptr -q=/features/10000/properties/LOT_NUM
"091"
real 0m0.040s
jsonptr's query format is RFC 6901 (JSON Pointer). More details are at
https://nigeltao.github.io/blog/2020/jsonptr.htmlLooks neat. One suggestion: add better build instructions on wuffs readme/getting started guide. I jumped in and tried to build it using the "build-all.sh" script that seemed convenient, but gave up (for now) after nth build failure due yet another missing dependency. It's extra painful because the build-all.sh is slow, so maybe also consider some proper build automation tool (seeing this is goog project, maybe bazel?)?
Presumably the memory footprint is often far less too.
This behaviour looks confusing to me:
$ echo '{"name":{"first":"Tom","middle":"null","last":"Smith"}}' | jj name.middle
null
$ echo '{"name":{"first":"Tom","last":"Smith"}}' | jj name.middle
null
It can be avoided with option '-r' which should be the default, but is not.
I don't get this behavior for your second command, it just seems to return an empty string.
edit:
There are three cases to cover:
1. The value at the path exists and not null.
2. The value at the path exists and is null.
3. The value at the path doesn't exist.
jj seems to potentially confuse 1 and 2 without the -r flag. "middle": "null" and "middle": null more specifically. It probably confuses "middle": "" and missing value as well, that's 1 and 3.
Interesting. How often do you manipulate a 1+MB JSON file? Maybe I am wrong, but going from 0.01s to 0.001s doesn't motivate me to switch to jj.
Datasets are often stored in (sometimes gzipped) jsonlines format in my field (NLP). The file size could reach 100s of GBs.
100s of GBs?
In those cases, querying un-indexed files seems quite a thinko. Even if you can fit it all in RAM.
If you only scan that monstrous file sequentially, then you don't need either jq or jj or any other "powerful" tool. Just read/write it sequentially.
If you need to make complex scans and queries, I suspect a database is better suited.
I wish this existed when I was trying to look at 20G of firebase database JSON dump.
that is what gets me, why did the file get to 20g? At that point just ship a SQLite file.
Does it matter why? Sometimes files gets big, and you don't control the generation or trying to change the generation is a bigger task than just dealing with a "big" (I'd argue 20GB isn't that big anyways) file with standard tools.
Nope, it matters a lot! Unstructured unindexed files get that gig usually as the result of some design flaw.
I would like to see a comparison with jshon. Jshon is way faster than jq and for many years available in your distro repositories.
Cool, didn’t know about jshon, how’s the query language?
Almost non-existing. A couple of excerpts from man page:
{"a":1,"b":[true,false,null,"str"],"c":{"d":4,"e":5}}
jshon [actions] < sample.json
jshon -e c -> {"d":4,"e":5}
jshon -e c -e d -u -p -e e -u -> 4 5
Yet this covers like ~50% of possible use cases for jq.Is this the SAX of JSON?
Looks like it's because @stedolan goes silent and not delegating the right GitHub repo accesses to the existing maintainers.
He seems to be working at Jane Street though, so if anyone is able to reach him please help the jq community :)
https://signals-threads.simplecast.com/episodes/memory-manag...
What exactly is missing/broken in jq right now which warrants a fork? I've been using jq daily for years, and I can't remember the last time I hit a bug (must have been many years ago) and I can't recall any features I felt been missing for the years I've been using it.
For me it's kind of done. It could be faster, but then I tend to program a solution myself instead, otherwise I feel like it's Done Enough.
I wouldn't say I need the program to grow with more features, but at the bare minimum they should have been more diligent with cutting releases after accepting bug fixes, instead of letting those contributions langish on the main development branch out of reach for users.
I mean it would be understandable if the maintainers didn't have the time to keep working on it at all, but clearly the review work was done to accept some patches so why not make .point releases to allow the fixed code reach users via their distribution's channels?
What I miss from jq and what is implemented but unreleased is platform independent line delimiters.
jq on Windows produces \r\n terminated lines which can be annoying when used with Cygwin / MSYS2 / WSL. The '--binary' option to not convert line delimiters is one of those pending improvements.
https://github.com/stedolan/jq/commit/0dab2b18d73e561f511801...
> What exactly is missing/broken in jq right now which warrants a fork
AFAIK there’s quite a few bug fixes and features that are accumulated on the unreleased main branch, or opened as PRs but never merged.
IIRC I hit one of the bugs while trying to check whether an input document is valid JSON.
I should try checking out what’s happening to the fork, I’ve never opened a PR or something but I’ve read the source while trying to understand the jq language conceptually, and I’d say it’s quite elegant :)
The README fir Jj points out how it is exponentially faster than jq. Presumably some of those improvements would help this.
> It could be faster
A decaffinated sloth could be faster.