Gojq: Pure Go Implementation of Jq

simonw • 3 years ago

"gojq does not keep the order of object keys" is a bit disappointing.

I care about key order purely for cosmetic reasons: when I'm designing JSON APIs I like to put things like the "id" key first in an object layout, and when I'm manipulating JSON using jq or similar I like to maintain those aesthetic choices.

I know it's bad to write code that depends on key order, but it's important to me as a way of keeping JSON as human-readable as possible.

After all, human readability is one of the big benefits of JSON over various other binary formats.

haasted • 3 years ago

I bet it's an artifact of Go having a randomized iteration order over maps [0]. Getting a deterministic ordering requires extra work.

[0] https://stackoverflow.com/questions/9619479/go-what-determin...

simonw • 3 years ago

I used to have the exact same problem with Python, until Python 3.7 made maintaining sort order a feature of the language: https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/

c2h5oh • 3 years ago

Go actually went in the other direction for a bunch of reasons (e.g. hash collision dos) and made key order quasi-random when iterating. Small maps used to maintain order, but a change was made to randomize that so people didn't rely on that and get stung when their maps got larger: https://github.com/golang/go/issues/6719

+2

tialaramex • 3 years ago

+1

Groxx • 3 years ago

hoppla • 3 years ago

This burnt me when I wrote an algorithm. I depended on the order of keys in dicts as it allowed me reference the value both by index and key.

I wrote the code in python 3.7+, and ended up spending a good amount of time debugging it when I ran it in a earlier python version.

vips7L • 3 years ago

Does Go not have more than one Map implementation in the standard library?

esprehn • 3 years ago

It does not. Maps are not even a real interface you can implement, it's compiler magic encoded in the language spec: https://dave.cheney.net/2018/05/29/how-the-go-runtime-implem...

This is all fallout of not having generics.

ZephyrBlu • 3 years ago

I would have never expected that a language (Especially a compiled one) does that kind of fuckery behind the scenes.

+2

xmonkee • 3 years ago

+1

silon42 • 3 years ago

Someone • 3 years ago

No, it isn’t. “gojq does not keep the order of object keys” isn’t about ordering keys consistently across runs, it’s about keeping them in the order of the input file.

akpa1 • 3 years ago

Which it can't do because, as mentioned, Go randomly iterates over maps. That's the data structure that most would use to load arbitrary input files into the program.

Someone • 3 years ago

If you have a hammer in your toolbox, it doesn’t mean you have to use it in every job. It golangs maps don’t do what you want, pick a different data structure.

This is like claiming that, because updating native integers isn’t guaranteed to be atomic in a language, you can’t do multi-threaded programming.

rfiat • 3 years ago

GP is correct, per the README:

> gojq does not keep the order of object keys. I understand this might cause problems for some scripts but basically, we should not rely on the order of object keys. Due to this limitation, gojq does not have keys_unsorted function and --sort-keys (-S) option. I would implement when ordered map is implemented in the standard library of Go but I'm less motivated.

And later in the same file:

  gojq does not support some functions intentionally;
  <snip>
  --sort-keys, -S (sorts by default because map[string]interface{} does not keep the order),

+1

Someone • 3 years ago

zxcvbn4038 • 3 years ago

Yeah, this is a deal breaker. While technically the key order doesn’t matter, in the real world it really does matter. People have to read this stuff. People have to be able to differentiate between actual changes and stuff moving around just because. Luckily it’s a solved problem and you can write marshalers that preserve order, but it’s extra work and generally specific to an encoding format. It would be nice to have ordered maps in the base library as an option.

41b696ef1113 • 3 years ago

It does not even provide a `--sort-keys` option. That's like 90% of the the reason I ever lean on jq - to standardize API output for my human brain.

latchkey • 3 years ago

Best 3rd party Map library that I've found. https://github.com/cornelk/hashmap

silverwind • 3 years ago

Agree, this is deterring me from this tool. Many languages/tools nowadays guarantee object key order which is convenient in many ways.

lapser • 3 years ago

For what it's worth, JSON Objects are not guaranteed to be ordered. Maps in many different languages are implemented without an order.

cerved • 3 years ago

into it's not about code, it's about predicable and consistent layout so that you can easily diff

Lurkars • 3 years ago

This. For one project, I even write a tool to reorder keys to a specific order. And of course this has no technically reason. But I used JSON here for the human readability and that non-technical people have best changes to understand and change the data. And therefore starting with id and name on top is important than with an huge array of data.

xorcist • 3 years ago

Ordered keys in json is not only for cosmetic reasons. If this ever touches disk you want the ability to diff them or stash them in git without the whole file changing with every update.

hyperpallium2 • 3 years ago

jq used to do this, but changed to preserve key order.

fwip • 3 years ago

Not implementing key-sorting is a curious decision:

> gojq does not keep the order of object keys. I understand this might cause problems for some scripts but basically, we should not rely on the order of object keys. Due to this limitation, gojq does not have keys_unsorted function and --sort-keys (-S) option. I would implement when ordered map is implemented in the standard library of Go but I'm less motivated.

I feel like --sort-keys is most useful when it is producing output for tools that do not understand JSON - for example, generating diffs or hashes of the JSON string. There is value in the output formatting being deterministic for a given input.

eropple • 3 years ago

I agree with you that there's value to sorted keys from a presentational standpoint (we are not beep-boop robots, humans have to read this stuff too), but now there also exists a JSON canonicalization RFC that tools can/should follow (with all the usual caveats about canonicalization being fraught): https://www.rfc-editor.org/rfc/rfc8785

mdaniel • 3 years ago

I guess "Informational" is better than /dev/null, but unless everyone adopts it doesn't that run the risk of it just being My Favorite Canonicalization™?

Either way, I'm guessing if the gojq author has that much heartburn about implementing --sort-keys, --canonical is just absolutely off the table :-(

tialaramex • 3 years ago

> unless everyone adopts it doesn't that run the risk of it just being My Favorite Canonicalization

That's true regardless. The IETF has no enforcement arm. Even if people expend the effort to agree a standard, and make whatever signs and follow the rituals, if nobody implements it then de facto that isn't the standard after all.

fwip • 3 years ago

Thank you for letting me know! I hadn't thought to look.

krab • 3 years ago

You can always add this feature but it's problematic to remove it.

renewiltord • 3 years ago

Could pipe through gron and sort to resort

Someone • 3 years ago

That helps when you want to sort by key, but not when you want to keep the order of object keys as in the input file.

pbsds • 3 years ago

Gron has the same issue, as it too is written in go and randomizes key order.

lapser • 3 years ago

I have actually fully replaced my jq installation with gojq (including an `ln -s gojq jq`) for a few years, and no script has broken so far. I'm super impressed by the jq compatibility.

If you are going down this route, do be careful with performance. I don't know which is more performant as I've never really had to work with large data sets, but I can't help but feel jq will be faster than gojq in such case. I have no benchmarks backing this up, but who knows, maybe someone will benchmark both.

One of my favourite features is the fact that error messages are actually legible, unlike jq.

brundolf • 3 years ago

It's very possible it could be faster; jq seems to actually be fairly unoptimized. This implementation in OCaml was featured on HN a while back and it trashes the original jq in performance: https://github.com/davesnx/query-json

After seeing that one I did my own (less-complete) version in Rust and managed to squeeze out even more performance in the operations it supports: https://github.com/brundonsmith/jqr

oever • 3 years ago

Working with large json files is hard to parallelize. Just filtering the objects in a root array can take very long. jqr and gojq both die with OOM when running on large files like

https://dumps.wikimedia.org/wikidatawiki/entities/latest-all...

A fast tool to split a json file like that into a format with one json file per line would already help a lot.

brundolf • 3 years ago

Mine can :)

cerved • 3 years ago

jq is particularly bad at large stream progressing

aarchi • 3 years ago

To see if gojq works even with complex jq programs, I tested it on my wsjq[0] Whitespace language interpreter, which uses most of the advanced jq features. It impressively appears to support the full jq language, though I uncovered a bug[1] in gojq.

gojq's arbitrary-precision integer support will be useful (jq just uses 64-bit floating-point), though I suspect it will have performance regressions, since it uses math/big, instead of GMP.

[0]: https://github.com/andrewarchi/wsjq

[1]: https://github.com/itchyny/gojq/issues/186

hyperpallium2 • 3 years ago

jq uses bison (gnu's yacc), which is a nightmare for error diagnosis. Additionally, the founder (though brilliant - or maybe because brilliant) wouldn't accept improvements in error reporting.

edsiper2 • 3 years ago

Naming is hard, but please, do not repeat the mistake of many OSS project in the last 20 years calling each project by prefixing the name with the stack/environment involved.

Now a "trending" language can catch the attention, but tomorrow?.. maybe. So the value proposition and starting from it name should be different (if you want adoption).

For my use case, for a rewrite of jq I would expect one thing only: higher performance... show the numbers ;)

moharoune • 3 years ago

I'd also expect higher performance for a rewrite of jq or, for that matter, any other tool that works as expected and being used for a long time.

jeffbee • 3 years ago

Last time I profiled jq in my particular use case - querying large GeoJSON files - I discovered it spent practically all of its CPU in assert, and it went a lot faster when built with -DNDEBUG, but since I could not rule out that some of its asserts have side effects I went back to the upstream package.

I think beating the performance of jq would be very easy for anyone who set out with that as a goal. It also has its own internal strtod and dtoa which are easily beaten by ryu or C++'s from/to_chars, so I would start there after dumping the weird asserts.

cube2222 • 3 years ago

This looks quite cool! I'm not sure though why I would use this over the original jq. However, I can definitely see the value in embedding this into my own applications, to provide jq scripting inside of them.

Shameless plug: As I'm not a fan of the jq syntax, I've created jql[0] as an alternative to it. It's also written in Go and presents a lispy continuation-based query language (it sounds much scarier than it really is!). This way it has much less special syntax than jq and is - at least to me - much easier to compose for common day-to-day JSON manipulation (which is the use case it has been created for; there are definitely many features of jq that aren't covered by it).

It might seem dead, as it hasn't seen any commit in ages, but to me it's just finished, I still use it regularly instead of jq on my local dev machine. Check it out if you're not a fan of the jq syntax.

[0]: https://github.com/cube2222/jql

laqq3 • 3 years ago

One reason to prefer gojq is that gojq’s author is one of the most knowledgeable person for the original jq (as seen by GitHub PRs and issues), and his gojq fixes many long standing issues in jq.

Plus, for my use cases, gojq runtime performance beats jq by a fair margin.

spullara • 3 years ago

i neither know nor care what language the original jq was implemented in.

brundolf • 3 years ago

I can think of two reasons it matters here:

- Can be used as a library in Go projects

- Memory-safe (could be relevant when processing foreign data, esp as a part of some automated process)

donio • 3 years ago

Yep, Benthos is an example of a cool project that uses gojq for its jq syntax support.

hyperpallium2 • 3 years ago

FYNI jq was originally implemented in Haskell.

okasaki • 3 years ago

Why use a special syntax that's hard to remember when you can just use Python?

I wrote a jq-like that accepts Python syntax called pq: https://github.com/dvolk/pq

So you can write stuff like:

    $ echo '{ "US": 3, "China": 12, "UK": 1 }' | pq -c "sum(data.values())"
    16

nh23423fefe • 3 years ago

i deny the premise obviously

jbirer • 3 years ago

Go is not the choice I would make when writing a parser for JSON, good luck though.

honkler • 3 years ago

why?

EdwardDiego • 3 years ago

RIIR? I mean, RIIG? It would be fun to do so :)

But it seems that being able to embed it into Go applications is a nice positive.