-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Java: Add query to detect special characters in string literals #19875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1 @@ | |||
Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql |
Check warning
Code scanning / CodeQL
Query test without inline test expectations Warning test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider an inline expectations test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I quickly turned it into an inline expectations test, but it failed, I think due to the fact that the strings contain multiple special characters. So then I reverted it, and kept it as it is.
e1b966e
to
9db9200
Compare
9db9200
to
eeaaa87
Compare
eeaaa87
to
834cb1b
Compare
* correctness | ||
* maintainability | ||
* readability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking: you believe this matches both the correctness
and readability
categories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think readability
is definitely okay. correctness
might/could be challenged. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can belong to both. But between maintainability
and reliability
, you think it's better to choose maintainability
, because readability
is definitely an issue whereas correctness
is only occasionally an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't given it this much thought. I simply thought maintainability goes together with readability. I think we could remove the correctness
tag, and then it fully matches the guidelines, of having one top level tag and one subcategory from that tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh, I think it's fine as it is actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I just read the description of this PR and it says that we shouldn't use subcategories from different top-level categories. I will discuss the issue on that PR. Maybe wait until that discussion is complete to see if we should remove correctness
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to remove correctness
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already removed it. I think it's okay if we keep the tags to a minimum for the time being.
@@ -0,0 +1 @@ | |||
Violations of Best Practice/SpecialCharactersInLiterals/NonExplicitControlAndWhitespaceCharsInLiterals.ql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider an inline expectations test?
- [Unicode Control Characters](https://www.unicode.org/charts/PDF/U0000.pdf). | ||
- [Unicode C0 control codes](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). | ||
- [Unicode characters with property "WSpace=yes" or "White_Space=yes"](https://en.wikipedia.org/wiki/Unicode_character_property#Whitespace). | ||
- [Java String Literals](https://docs.oracle.com/javase/tutorial/java/data/characters.html). | ||
- [Java Class Charset](https://docs.oracle.com/javase/8/docs/api///?java/nio/charset/Charset.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit pedantic, but these don't follow the style guide for references. Examples:
- Java API Specification: [Pattern.MULTILINE](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html#MULTILINE)
- Wikipedia: [Template method pattern](https://en.wikipedia.org/wiki/Template_method_pattern).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good—only trivial comments 👍
String foo3Spacebar = "foo bar"; // COMPLIANT | ||
``` | ||
|
||
## Implementation Notes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Implementation Notes | |
## Implementation notes |
|
||
## Recommendation | ||
|
||
To avoid issues, use the encoded versions of control characters (e.g., ASCII `\n`, `\t`, or Unicode `U+000D`, `U+0009`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid issues, use the encoded versions of control characters (e.g., ASCII `\n`, `\t`, or Unicode `U+000D`, `U+0009`). | |
To avoid issues, use the encoded versions of control characters (e.g. ASCII `\n`, `\t`, or Unicode `U+000D`, `U+0009`). |
@@ -0,0 +1,111 @@ | |||
## Overview | |||
|
|||
This query detects non-explicit control and whitespace characters in java literals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query detects non-explicit control and whitespace characters in java literals. | |
This query detects non-explicit control and whitespace characters in Java literals. |
## Recommendation | ||
|
||
To avoid issues, use the encoded versions of control characters (e.g., ASCII `\n`, `\t`, or Unicode `U+000D`, `U+0009`). | ||
This makes the literals (e.g., string literals) more readable, and also helps to make the surrounding code less error-prone and more maintainable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes the literals (e.g., string literals) more readable, and also helps to make the surrounding code less error-prone and more maintainable. | |
This makes the literals (e.g. string literals) more readable, and also helps to make the surrounding code less error-prone and more maintainable. |
|
||
## Example | ||
|
||
The following examples illustrate `NON_COMPLIANT` and `COMPLIANT` code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are "NON_COMPLIANT
" and "COMPLIANT
" styled in monospace here (and in the headers) for a specific reason? Otherwise I'd just style them normally, i.e. "Non compliant" and "Compliant". Or use the standard query language format of "GOOD" and "BAD". Otherwise, just ignore this 👍
|
||
## Implementation Notes | ||
|
||
This query detects java literals that contain reserved control characters and/or non-printable whitespace characters, such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query detects java literals that contain reserved control characters and/or non-printable whitespace characters, such as: | |
This query detects Java literals that contain reserved control characters and/or non-printable whitespace characters, such as: |
| `\u001F` | 31 | unit separator | | ||
| `\u007F` | 127 | delete | | ||
|
||
- Zero-width Unicode characters (e.g., zero-width space, zero-width joiner), such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Zero-width Unicode characters (e.g., zero-width space, zero-width joiner), such as: | |
- Zero-width Unicode characters (e.g. zero-width space, zero-width joiner), such as: |
This query detects java literals that contain reserved control characters and/or non-printable whitespace characters, such as: | ||
|
||
- Decimal and hexidecimal representations of ASCII control characters (code points 0-8, 11, 14-31, and 127). | ||
- Invisible characters (e.g., zero-width space, zero-width joiner). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Invisible characters (e.g., zero-width space, zero-width joiner). | |
- Invisible characters (e.g. zero-width space, zero-width joiner). |
The following list outlines the _**explicit exclusions from query scope**_: | ||
|
||
- any number of simple space characters (`U+0020`, ASCII 32). | ||
- an escape character sequence (e.g., `\t`), or the Unicode equivalent (e.g., `\u0009`), for printable whitespace characters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- an escape character sequence (e.g., `\t`), or the Unicode equivalent (e.g., `\u0009`), for printable whitespace characters: | |
- an escape character sequence (e.g. `\t`), or the Unicode equivalent (e.g. `\u0009`), for printable whitespace characters: |
This pull request introduces a new query to detect non-explicit control and whitespace characters in Java literals, improving code readability and reducing potential bugs caused by invisible or hard-to-recognize characters.
Manually testing autofix works, the special characters are replaced with
\u0...
.