Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider removing -0/--nul-output before releasing 1.7 #2683

Closed
nicowilliams opened this issue Jul 9, 2023 · 7 comments · Fixed by #2684
Closed

Consider removing -0/--nul-output before releasing 1.7 #2683

nicowilliams opened this issue Jul 9, 2023 · 7 comments · Fixed by #2684
Milestone

Comments

@nicowilliams
Copy link
Contributor

See #1271. @vdukhovni believes we should remove it, and explains why in the comments.

@nicowilliams nicowilliams changed the title Consider removing -0/--nul-output Consider removing -0/--nul-output before releasing 1.7 Jul 9, 2023
@vdukhovni
Copy link

I think we should also add a couple of sentences of disclaimers about safety with the -r option. It is the jq program's responsibility to make sure that the result is syntactically (and perhaps semantically) appropriate for its downstream consumer.

@nicowilliams
Copy link
Contributor Author

One question I have is whether having a -0 is bad. You make a good argument that it's not necessary.

@nicowilliams
Copy link
Contributor Author

Let's recap current behavior:

  • jq -r includes a newline after each raw output:
    jq -nr '"foo"'|od -c
    0000000   f   o   o  \n
    0000004
    
  • jq -0 includes a NUL after each raw output:
    jq -n0 '"foo"'|od -c
    0000000   f   o   o  \0
    0000004
    
  • but fortunately we have -j which causes jq to not output a pesky newline, and then I can output a NUL -or whatever other separator- at will:
    jq -nrj '"foo\u0000"'|od -c
    0000000   f   o   o  \0
    0000004
    

I.e., we just don't need -0/--nul-output.

So I'm in favor of removing -0/--nul-output.

@vdukhovni
Copy link

Nice, and -j subsumes -r, so for example:

$ jq -nj '
  def nulterm: select(contains("\u0000")|not) | . + "\u0000";
  ("foo", "o\u0000ps", "bar") | nulterm' | cat -v; echo
foo^@bar^@

@itchyny
Copy link
Contributor

itchyny commented Jul 9, 2023

Can we just halt the program with -0 when the top level value is a string because that's the only unsafe case?

@vdukhovni
Copy link

$ cat nulterm.jq 
def nultermparanoid:
    if type != "string"
    then "string values required\n" | halt_error(1)
    elif contains("\u0000")
    then "internal NUL bytes invalid\n" | halt_error(1)
    else . + "\u0000"
    end;
("foo", "bar", .) | nultermparanoid

With that:

$ printf '"foo\0bar"\n' | jq -jf nulterm.jq > /dev/null
internal NUL bytes invalid

$ printf '"baz"\n' | jq -jf nulterm.jq  | cat -v; echo
foo^@bar^@baz^@

@svdb0
Copy link

svdb0 commented Jul 10, 2023

Repeating what I said in #2659, if --nul-output is retained, I suggest renaming it --raw-output0, for the following reasons:

  • It is intuitive; the added 0 suggests 'as --raw-output but with null bytes'.
  • Some standard tools use a similar naming: find -print0, rsync --from0, xz --files0, du --files0-from, wc --files0-from.
  • --nul-output suggests a symmetry with --null-input (-n), but they are completely different.
  • If Feature request: support for streaming input delimited by null characters #2659 is accepted, you'll have matching --raw-input0 and --raw-output0.

There is another option though: you could have a format filter @null similar to @json, @sh, @csv, etc.
It is after all just another way to format your output.
Or (and?), more general: @delimited("\u0000").

I also like the idea suggested here, of having jq raise an error when the value to be output contains the terminator character.
In particular when it is treated as just another output format filter (@null/@delimited("\u0000")), because in this case it is just another instance of 'the value cannot be encoded in the output format', which could happen for other formats too (depending on the format, and even more so if jq were ever to support arbitrary byte sequences).
Because almost always, when you use some character(s) as a delimiter, you do not intend them to occur in the fields themselves.
And if you really do mean to do that, there's still (. + "\u0000"), but then it's a concious choice. Better to have the default be the more secure option.

I agree with @vdukhovni's comments in #1271 that if you're about to output a value containing your seperator, something else is probably wrong (e.g. proper input validation).
But the fact is that people will make mistakes, and many may not even be aware of the issues, and for such cases, raising an error will add a welcome extra layer of protection.

@itchyny itchyny added this to the 1.7 release milestone Jul 24, 2023
itchyny added a commit to itchyny/jq that referenced this issue Jul 26, 2023
The option naming --nul-output was confusing, especially when we have a
similar option for input stream in the future (--nul-input vs --null-input).
Based on the observation of other command line tools, we rename the option
to --raw-output0. We also drop the short option -0 to avoid confusion when
on introducing the NUL-delimited input option.

Unlike the other command line tools outputting file names with NUL delimiter,
jq deals with JSON, and its strings may contain NUL character. To protect
users from the risk of injection attacks, we abort the program and print an
error before outputting a string includeing NUL character. Closes jqlang#2683.
itchyny added a commit to itchyny/jq that referenced this issue Jul 26, 2023
The option naming --nul-output was confusing, especially when we have a
similar option for input stream in the future (--nul-input vs --null-input).
Based on the observation of other command line tools, we rename the option
to --raw-output0. We also drop the short option -0 to avoid confusion on
introducing the NUL-delimited input option.

Unlike the other command line tools outputting file names with NUL delimiter,
jq deals with JSON, and its strings may contain NUL character. To protect
users from the risk of injection attacks, we abort the program and print an
error message before outputting strings including NUL character. Closes jqlang#2683.
nicowilliams pushed a commit that referenced this issue Jul 27, 2023
The option naming --nul-output was confusing, especially when we have a
similar option for input stream in the future (--nul-input vs --null-input).
Based on the observation of other command line tools, we rename the option
to --raw-output0. We also drop the short option -0 to avoid confusion on
introducing the NUL-delimited input option.

Unlike the other command line tools outputting file names with NUL delimiter,
jq deals with JSON, and its strings may contain NUL character. To protect
users from the risk of injection attacks, we abort the program and print an
error message before outputting strings including NUL character. Closes #2683.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants