-
Notifications
You must be signed in to change notification settings - Fork 26.4k
fix xstrdup leak in parse_short_opt #1954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Welcome to GitGitGadgetHi @brandb97, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that either:
You can CC potential reviewers by adding a footer to the PR description with the following syntax:
NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
/allow |
User brandb97 is now allowed to use GitGitGadget. WARNING: brandb97 has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use. |
dc9af6e
to
7e31fe9
Compare
There are issues in commit ae3f4e4: |
There are issues in commit 7e31fe9: |
There are issues in commit ae3f4e4: |
There are issues in commit 7e31fe9: |
There are issues in commit b80d17b: |
There are issues in commit ae3f4e4: |
There are issues in commit 7e31fe9: |
There are issues in commit b80d17b: |
There are issues in commit 5f0d8e8: |
There are issues in commit ae3f4e4: |
There are issues in commit 7e31fe9: |
There are issues in commit b80d17b: |
There are issues in commit 5f0d8e8: |
25ce96a
to
bdb7c60
Compare
There are issues in commit ae3f4e4: |
There are issues in commit 7e31fe9: |
There are issues in commit b80d17b: |
There are issues in commit 5f0d8e8: |
There are issues in commit ae3f4e4: |
There are issues in commit 7e31fe9: |
There are issues in commit b80d17b: |
There are issues in commit 5f0d8e8: |
c936abc
to
a9cbca6
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
Makefile
Outdated
@@ -822,6 +822,7 @@ TEST_BUILTINS_OBJS += test-online-cpus.o | |||
TEST_BUILTINS_OBJS += test-pack-mtimes.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, May 07, 2025 at 02:33:21AM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <[email protected]>
This commit is missing an explanation of the problem. When does it
trigger? Which subsystem triggers it? Why didn't we observe the issue
beforehand? And given that the solution seems to be a bit more complex
it should also explain how you're fixing the issue.
Furthermore, the subject of such a commit should be prefixed with the
subsystem in which you're fixing the issue. So in your case,
"parse-options:".
I'd recommend reading through "Documentation/MyFirstContribution.adoc",
which explains all of this.
> diff --git a/parse-options.c b/parse-options.c
> index a9a39ecaef6c..4279dfe4d313 100644
> --- a/parse-options.c
> +++ b/parse-options.c
> @@ -638,6 +638,16 @@ static int has_subcommands(const struct option *options)
> return 0;
> }
>
> +static void set_strdup_fn(struct parse_opt_ctx_t *ctx, const struct option *options) {
Style: the opening brace should be on its own line.
> + for (; options->type != OPTION_END; options++)
> + ;
> + if (options->value && options->strdup_fn) {
> + ctx->unknown_opts = options->value;
> + ctx->strdup_fn = options->strdup_fn;
> + return;
> + }
It's not really clear what this is doing. Is the intent to only change
this this value for the last option? If so, why?
> @@ -981,7 +992,11 @@ enum parse_opt_result parse_options_step(struct parse_opt_ctx_t *ctx,
> *
> * This is leaky, too bad.
> */
> - ctx->argv[0] = xstrdup(ctx->opt - 1);
> + if (ctx->unknown_opts && ctx->strdup_fn) {
> + ctx->argv[0] = ctx->strdup_fn(ctx->unknown_opts, ctx->opt - 1);
> + } else {
> + ctx->argv[0] = xstrdup(ctx->opt - 1);
> + }
> *(char *)ctx->argv[0] = '-';
> goto unknown;
> case PARSE_OPT_NON_OPTION:
Hm. I'm at a loss what we're solving here. All of this really should be
explained in the commit message to give context to the reviewer.
> diff --git a/parse-options.h b/parse-options.h
> index 91c3e3c29b3d..af06a09abb8e 100644
> --- a/parse-options.h
> +++ b/parse-options.h
> @@ -388,6 +391,12 @@ static char *parse_options_noop_ignored_value MAYBE_UNUSED;
> }
> #define OPT_SUBCOMMAND(l, v, fn) OPT_SUBCOMMAND_F((l), (v), (fn), 0)
>
> +#define OPT_UNKNOWN(v, fn) { \
> + .type = OPTION_END, \
> + .value = (v), \
> + .strdup_fn = (fn), \
> +}
> +
> /*
> * parse_options() will filter out the processed options and leave the
> * non-option arguments in argv[]. argv0 is assumed program name and
> @@ -496,6 +505,9 @@ struct parse_opt_ctx_t {
> const char *prefix;
> const char **alias_groups; /* must be in groups of 3 elements! */
> struct parse_opt_cmdmode_list *cmdmode_list;
> +
> + void *unknown_opts;
> + parse_opt_strdup_fn *strdup_fn;
> };
>
> void parse_options_start(struct parse_opt_ctx_t *ctx,
Okay. So the intent seems to be that we start storing any options that
we don't understand?
> diff --git a/t/helper/test-free-unknown-options.c b/t/helper/test-free-unknown-options.c
> new file mode 100644
> index 000000000000..9d658115ba8f
> --- /dev/null
> +++ b/t/helper/test-free-unknown-options.c
> @@ -0,0 +1,35 @@
> +#include "git-compat-util.h"
> +#include "parse-options.h"
> +#include "setup.h"
> +#include "strvec.h"
> +
> +static const char *const free_unknown_options_usage[] = {
> + "test-tool free-unknown-options",
> + NULL
> +};
Can't we expand `test-tool parse-options` instead of introducing a new
helper?
> +int cmd__free_unknown_options(int argc, const char **argv) {
> + struct strvec *unknown_opts = xmalloc(sizeof(struct strvec));
> + strvec_init(unknown_opts);
> + const char *prefix = setup_git_directory();
> +
> + bool a, b;
> + struct option options[] = {
> + OPT_BOOL('a', "test-a", &a, N_("option a, only for test use")),
> + OPT_BOOL('b', "test-b", &b, N_("option b, only for test use")),
> + OPT_UNKNOWN(unknown_opts, (parse_opt_strdup_fn *)&strvec_push),
> + };
> +
> + parse_options(argc, argv, prefix, options,
> + free_unknown_options_usage, PARSE_OPT_KEEP_UNKNOWN_OPT);
> +
> + printf("a = %s\n", a? "true": "false");
> + printf("b = %s\n", b? "true": "false");
> +
> + int i;
> + for (i = 0; i < unknown_opts->nr; i++) {
> + printf("free unknown option: %s\n", unknown_opts->v[i]);
> + }
> + strvec_clear(unknown_opts);
> + free(unknown_opts);
> +}
> \ No newline at end of file
Missing newline.
> diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
> index 6d62a5b53d95..33fa7828b9f6 100644
> --- a/t/helper/test-tool.h
> +++ b/t/helper/test-tool.h
> @@ -44,6 +44,7 @@ int cmd__parse_options(int argc, const char **argv);
> int cmd__parse_options_flags(int argc, const char **argv);
> int cmd__parse_pathspec_file(int argc, const char** argv);
> int cmd__parse_subcommand(int argc, const char **argv);
> +int cmd__free_unknown_options(int argc, const char **argv);
> int cmd__partial_clone(int argc, const char **argv);
> int cmd__path_utils(int argc, const char **argv);
> int cmd__path_walk(int argc, const char **argv);
> diff --git a/t/t0040-parse-options.sh b/t/t0040-parse-options.sh
> index ca55ea8228c3..773f54103fd3 100755
> --- a/t/t0040-parse-options.sh
> +++ b/t/t0040-parse-options.sh
> @@ -822,4 +822,18 @@ test_expect_success 'u16 limits range' '
> test_grep "value 65536 for option .u16. not in range \[0,65535\]" err
> '
>
> +cat >expect <<\EOF
> +a = true
> +b = true
> +free unknown option: -c
> +free unknown option: -d
> +EOF
This should be done inside `test_expect_success` itself.
> +test_expect_success 'free unknown options' '
> + test-tool free-unknown-options -ac -bd \
> + >output 2>output.err &&
> + test_cmp expect output &&
> + test_must_be_empty output.err
> +'
> +
> test_done
Okay, so the memory leak seems to happen when handling unknown options
indeed, as I started to expect after the middle of this patch. In any
case, this commit really needs to be polished to have a proper commit
message that sets the stage for the reviewer.
Patrick
User |
parse-options.c
Outdated
@@ -638,6 +638,18 @@ static int has_subcommands(const struct option *options) | |||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, May 07, 2025 at 02:33:22AM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <[email protected]>
>
> Signed-off-by: Lidong Yan <[email protected]>
The same is true here -- this needs to have a proper commit subject and
message.
> diff --git a/parse-options.c b/parse-options.c
> index 4279dfe4d313..12721b83000a 100644
> --- a/parse-options.c
> +++ b/parse-options.c
> @@ -638,7 +638,9 @@ static int has_subcommands(const struct option *options)
> return 0;
> }
>
> -static void set_strdup_fn(struct parse_opt_ctx_t *ctx, const struct option *options) {
> +static void set_strdup_fn(struct parse_opt_ctx_t *ctx,
> + const struct option *options)
> +{
> for (; options->type != OPTION_END; options++)
> ;
> if (options->value && options->strdup_fn) {
Please refrain from doing code style cleanups of code you have just
introduced in the preceding commit.
Patrick
t/helper/test-free-unknown-options.c
Outdated
#include "test-tool.h" | ||
|
||
static const char *const free_unknown_options_usage[] = { | ||
"test-tool free-unknown-options", NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, May 07, 2025 at 02:33:23AM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <[email protected]>
>
> Signed-off-by: Lidong Yan <[email protected]>
This looks like another fixup-style commit. I assume that all of this
really should be a single commit, as the latter two commits don't seem
to do anything new compared to the first commit.
Patrick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, lidongyan wrote (reply to this):
Thank you for your suggestion. I will squash all the commits and include a detailed description in the next submission.
In short, calling parse_options may cause a memory leak when encountering unknown short options. To fix the leak, parse_options can use a user-provided allocation function, so that the user can later free the memory and avoid the leak. I also defined a macro called OPT_UNKNOWN to make it easier for users to pass in their own allocation function.
On the other hand, I think a new file should be added under t/helper/. The reason is that this new file not only serves as a test, but also provides a simple example of using OPT_UNKNOWN, which others can refer to when modifying their own option definitions to avoid memory leaks.
> 2025年5月7日 15:54,Patrick Steinhardt <[email protected]> 写道:
>
> On Wed, May 07, 2025 at 02:33:23AM +0000, Lidong Yan via GitGitGadget wrote:
>> From: Lidong Yan <[email protected]>
>>
>> Signed-off-by: Lidong Yan <[email protected]>
>
> This looks like another fixup-style commit. I assume that all of this
> really should be a single commit, as the latter two commits don't seem
> to do anything new compared to the first commit.
>
> Patrick
>
a9cbca6
to
998ce1e
Compare
There are issues in commit 998ce1e: |
c456910
to
e7b4465
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Patrick Steinhardt wrote (reply to this): On Wed, May 07, 2025 at 01:24:53PM +0000, Lidong Yan via GitGitGadget wrote:
> From: Lidong Yan <[email protected]>
>
> Most Git commands use parse_options to parse options, but parse_options
> causes a memory leak when it encounters unknown short options. The leak
> occurs at line 984 in parse-options.c, where git allocates some spaces
Again, no need to quote the exact line number.
> to store the unknown short options and will never release those spaces.
>
> To address this issue, users can be allowed to provide a custom function
> to allocate memory for unknown options. For example, users can use
> strvec_push to allocate memory for unknown options and later call
> strvec_clear at the end of the Git command to release the memory, thereby
> avoiding the memory leak.
So does that mean that _every_ user of the "parse-options" interfaces
now has to explicitly plug this memory leak when facing unknown options?
That sounds rather undesirable, as there are so many users out there.
> This commit allows users to provide a custom allocation function for
> unknown options via the strdup_fn field in the last struct option. For
> convenience, this commit also introduces a new macro, OPT_UNKNOWN, which
> behaves like OPT_END but takes an additional argument for the allocation
> function. parse_options could get the custom allocation function in struct
> parse_opt_ctx_t by using set_strdup_fn. A simple example to use
> OPT_UNKNOWN is put into t/helper/test-free-unknown-options.c
Hm. Is there any other usecase for the `strdup_fn` field that you can
think about in the future? Otherwise it feels a bit overengineered from
my perspective.
I unfortuntaely don't think the proposed solution works well. To really
plug the memory leak everywhere, we would have to convert every single
user of `parse_options()` to pass `OPTION_UNKNOWN()` instead of
`OPTION_END()`, pass a strvec and clear that strvec at the end. It just
doesn't quite feel like it's worth it to plug a single, bounded memory
leak.
The most important problem that we're facing is that the allocated
string has a lifetime longer than `parse_options()`, because it really
only becomes relevant in the case where `PARSE_OPT_KEEP_UNKNOWN_OPT` is
enabled. If so, the _caller_ of `parse_options()` will want to process
the modified `argv`, and thus they have to be the ones responsible for
freeing potentially-allocated memory once it's unused.
But that'd require us to keep some state. You solve this via the extra
strvec, but that doesn't really look like a great solution to me. An
alternative would be to expose the `parse_options_ctx` and let the
caller eventually call `parse_options_clear()` or something like that.
This would also require us to modify every caller, but the benefit would
be that we now have a single place where we can always allocate memory
in without having to worry about its lifetime.
All of that starts to become kind of involved though. So unless others
disagree with my analysis I just don't think this edge case really is
worth the complexity.
Thanks!
Patrick |
On the Git mailing list, lidongyan wrote (reply to this): 2025年5月9日 14:19,Patrick Steinhardt <[email protected]> 写道:
> So does that mean that _every_ user of the "parse-options" interfaces
> now has to explicitly plug this memory leak when facing unknown options?
> That sounds rather undesirable, as there are so many users out there.
Since the lifetime of `argv` last until the program terminates, a memory leak
can only occur if parse_option is called multiple times and at least two of
those calls use the `PARSE_OPT_KEEP_UNKNOWN` flag. In the other
words, the memory leak only occurs when the statement `ctx->argue[0] = xstrdup`
overwrites the result of a previous `xstrdup` call.
> Hm. Is there any other usecase for the `strdup_fn` field that you can
> think about in the future? Otherwise it feels a bit overengineered from
> my perspective.
I think a simple approach is to add a marker to the string allocated by `xstrdup`
, and before the next potential leaking `ctx->argv[0] = xstrdup`, check whether the
string needs to be freed. Like we could allocate one more byte in the end of the
string to store the marker.
|
On the Git mailing list, Phillip Wood wrote (reply to this): On 09/05/2025 07:19, Patrick Steinhardt wrote:
> > All of that starts to become kind of involved though. So unless others
> disagree with my analysis I just don't think this edge case really is
> worth the complexity.
I agree with this. We seem to be copying the string because argv is a
const char** but would it really be so bad to modify the string that
gets passed to us? We know that any bundled options that come before
an unknown option cannot have a value or else what we see as the
unknown option would be part of that value. So I think we could do
diff --git a/parse-options.c b/parse-options.c
index 35fbb3b0d6..9e6e46da27 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -924,12 +924,12 @@ enum parse_opt_result parse_options_step(struct parse_opt_ctx_t *ctx,
if (internal_help && *ctx->opt == 'h')
goto show_usage;
- /* fake a short option thing to hide the fact that we may have
+ /* move a short option thing to hide the fact that we may have
* started to parse aggregated stuff
- *
- * This is leaky, too bad.
*/
- ctx->argv[0] = xstrdup(ctx->opt - 1);
+ MOVE_ARRAY((char *)arg, ctx->opt - 1,
+ strlen(ctx->opt) + 2);
+ ctx->argv[0] = arg;
*(char *)ctx->argv[0] = '-';
goto unknown;
case PARSE_OPT_NON_OPTION:
Best Wishes
Phillip
|
User |
d451623
to
c35722e
Compare
On the Git mailing list, lidongyan wrote (reply to this): On 09/05/2025 21:08,Phillip Wood <[email protected]> write:
> diff --git a/parse-options.c b/parse-options.c
> index 35fbb3b0d6..9e6e46da27 100644
> --- a/parse-options.c
> +++ b/parse-options.c
> @@ -924,12 +924,12 @@ enum parse_opt_result parse_options_step(struct parse_opt_ctx_t *ctx,
> if (internal_help && *ctx->opt == 'h')
> goto show_usage;
>
> - /* fake a short option thing to hide the fact that we may have
> + /* move a short option thing to hide the fact that we may have
> * started to parse aggregated stuff
> - *
> - * This is leaky, too bad.
> */
> - ctx->argv[0] = xstrdup(ctx->opt - 1);
> + MOVE_ARRAY((char *)arg, ctx->opt - 1,
> + strlen(ctx->opt) + 2);
> + ctx->argv[0] = arg;
> *(char *)ctx->argv[0] = '-';
> goto unknown;
> case PARSE_OPT_NON_OPTION:
I’m not sure why git used to use `xtrdup` here instead of modifying `arg` directly.
If modifying `arg` is safe, then this is indeed a good solution. |
call parse_options twice and occasionally meet unknown option in the same place would cause memory leak, since the second `xstrdup` in `parse_options_step` would make the first `xstrdup` unreachable. One solution is allocate one more magic byte for the unknown option to indicate that argv[?] stores a heap allocated arg. Assume for all arg, `*((char *)arg - 1)` is valid, we could put a magic number before the unknown option. Next time calling "xstrdup" will check the magic number first, and frees the previous heap allocated memory. Signed-off-by: Lidong Yan <[email protected]>
c35722e
to
475f7b5
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
Pass a user defined strdup-like function in parse_opt_ctx to avoid memory leak.
cc: Patrick Steinhardt [email protected]
cc: Phillip Wood [email protected]