Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jq turns large JSON number literals into the maximum double #1120

Closed
nh2 opened this issue Mar 25, 2016 · 12 comments
Closed

jq turns large JSON number literals into the maximum double #1120

nh2 opened this issue Mar 25, 2016 · 12 comments

Comments

@nh2
Copy link

nh2 commented Mar 25, 2016

Run echo '{"num": 1e1000}' | jq '.text = "hello"'

Expected output:

{
  "text": "hello",
  "num": 1e100
}

Actual output:

{
  "text": "hello",
  "num": 1.7976931348623157e+308
}

Exponential notation for numbers is allowed in JSON.

The spec says:

This specification allows implementations to set limits on the range and precision of numbers accepted.

So technically, jq is complying with the spec.

However, when using jq as a general purpose JSON processing tool, this is extremely unexpected; in my case, it has just caused a couple thousand dollars worth of debugging time.

It would be great if jq could leave exponential number literals untouched.

I understand that his may be hard to do in the general case, since jq can also do calculations, and it isn't really defined in which number type those should happen.

It may therefore be a better approach if jq could warn if it performs a conversion like the above, or fail with an error code.

@wtlangford
Copy link
Contributor

This does come up every so often. And you're right. While it is
spec-compliant, it is often surprising. I'll take a look at the json parser
and see if I can get a warning in there. I think errors are a bit too
strict, especially since they could break otherwise fine programs. I'm a
bit busy, so it may be a bit. If you're interested in giving it a shot
yourself, feel free to make a pull request. :)

On Thu, Mar 24, 2016, 20:36 Niklas Hambüchen [email protected]
wrote:

Run echo '{"num": 1e1000}' | jq '.text = "hello"'

Expected output:

{
"text": "hello",
"num": 1e100
}

Actual output:

{
"text": "hello",
"num": 1.7976931348623157e+308
}

Exponential notation for numbers is allowed in JSON
http://stackoverflow.com/questions/19554972/json-standard-floating-point-numbers
.

The spec says http://tools.ietf.org/html/rfc7159.html#section-6:

This specification allows implementations to set limits on the range and
precision of numbers accepted.

So technically, jq is complying with the spec.

However, when using jq as a general purpose JSON processing tool, this is
extremely unexpected; in my case, it has just caused a couple thousand
dollars worth of debugging time.

It would be great if jq could leave exponential number literals untouched.

I understand that his may be hard to do in the general case, since jq can
also do calculations, and it isn't really defined in which number type
those should happen.

It may therefore be a better approach if jq could warn if it performs a
conversion like the above, or fail with an error code.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1120

@wtlangford
Copy link
Contributor

I found a minute and took a look into that code and remembered that the number parsing is... intense. I think it's the same code that gcc uses, actually.

I'll play with it a bit, but I'm not too positive that anything I write related to that will actually function as intended. 😦

I assume this is why we haven't added a warning yet.

@nh2
Copy link
Author

nh2 commented Mar 25, 2016

Would a simpler approach work, parsing the number as is and string-formatting it in jq's way, and checking if it is the original string?

@wtlangford
Copy link
Contributor

That would be prone to false positives, due to float rounding and various other formatting quirks.

@nh2
Copy link
Author

nh2 commented Mar 25, 2016

Right, it would be stricter, e.g. would also complain when 1e10 becomes 10000000000, but a "very strict" mode would already go some way from the processing tool perspective (if you want to ensure thta jq changes only whitespace syntax and does your transformation, and leaves other numbers verbatim). But no question that not having such false positives would be better.

@wtlangford
Copy link
Contributor

Looks like I made enough sense out of that code to be able to detect the over/underflows during the parse phase for both JSON and jq programs. It doesn't detect over/underflows during calculations, so 1e308+1e308 won't give a warning.

I've pushed a branch with that feature to https://github.com/wtlangford/jq/tree/detect-over-underflow-during-parse. @nicowilliams can you bang on this and see if I've broken something subtle? I had to undefine a preprocessor macro, so all bets are off.

@nh2 feel free to test it as well. Currently it just prints warnings (feel free to complain about the language on them, I'm not happy with the strings) to stderr, but there might be a better way/place to output that. My basic test cases for it were 1e1000 and 1e-1000.

@queenvictoria
Copy link

queenvictoria commented Jan 21, 2023

I've come across a strange one that might be related to this issue. If a JSON number is over 10^16 difficult to understand results start appearing:

# Good
$ RES='{"user_id": 1000000000000003}'
$ echo $RES | jq
{
  "user_id": 1000000000000003
}

# Bad and hard to understand
$ RES='{"user_id": 10000000000000003}'
$ echo $RES | jq
{
  "user_id": 10000000000000004
}

# Weird but somewhat understandable
$ RES='{"user_id": 10000000000000001}'
$ echo $RES | jq
{
  "user_id": 1e+16
}

I had a look at the PR and I think it is trying to send a warning to the user if this kind of thing is happening under the hood. I've tried different kinds of out put like ...|tostring, --ascii-output, --raw-output to no avail. Is there something that can be done?

@wader
Copy link
Member

wader commented Jan 21, 2023

What version of jq is this? i think master do preserve big number better.

The examples can reproduced using javascript also:

$ echo 'console.log({"a": 10000000000000003, "b": 1e+16})' | node
{ a: 10000000000000004, b: 10000000000000000 }

The round is probably because JSON and JavaScript uses IEEE 754 "binary32" for all numbers. The use of scientific notation is indeed a bit confusing, not sure what the criteria is for that.

@queenvictoria
Copy link

queenvictoria commented Jan 21, 2023

Hey @wader

$ jq --version
jq-1.6

So (reading your note and the thread above) this is the underlying mechanism doing this and not jq so probably nothing to be done about it? I'll figure out how to run master and give it a shot.

Yip so I build it and that fixes it--thank you!

$  /usr/local/bin/jq --version
jq-1.6-159-gcff5336-dirty

$ RES='{"user_id": 10000000000000003}'
$ echo $RES | /usr/local/bin/jq '(.user_id)'

@wader
Copy link
Member

wader commented Jan 23, 2023

👍 if i remember correctly the number should be preserved as long as you don't do any arithmetic operations, i think this was the PR it was added if you want details #1752

@emanuele6
Copy link
Member

jq 1.7 released with the fix. closing

@Jamim
Copy link

Jamim commented Nov 8, 2023

jq works much better now 👏🏻, but the issue has reached a new level 🚀

~ $ echo '{"num": 1e100000000}' | jq '.text = "hello"' # this is fine
{
  "num": 1E+100000000,
  "text": "hello"
}
~ $ echo '{"num": 1e1000000000}' | jq '.text = "hello"' # this is not
{
  "num": 1.7976931348623157e+308,
  "text": "hello"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants