-
Notifications
You must be signed in to change notification settings - Fork 26.4k
Fix buffer underflow in xdl_build_script #1976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or `rchg2[i2-1]`), which underflows the buffer. This commit adds explicit `i1 > 0` and `i2 > 0` checks around those array accesses to prevent invalid negative indexing. Signed-off-by: Alex Guo <[email protected]>
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, René Scharfe wrote (reply to this): Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
> From: jinyaoguo <[email protected]>
>
> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
> `rchg2[i2-1]`), which underflows the buffer.
> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
> those array accesses to prevent invalid negative indexing.
xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
ends for rchg arrays, so an index of -1 should be within the bounds.
i1 and i2 are decreased in lockstep, though, so one of them can become
smaller than -1 if nrec is different between the files. And that's how
this code run can indeed run off into the weeds.
Curiously, AddressSanitizer doesn't report anything, but if I add the
following line after the outer for, I can trigger it to report a
heap-buffer-overflow with e.g., git show 8613c2bb6c:
if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
>
> Signed-off-by: Alex Guo <[email protected]>
> ---
> Fix buffer underflow in xdl_build_script
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
> Pull-Request: https://github.com/git/git/pull/1976
>
> xdiff/xdiffi.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 5a96e36dfbe..2e983965328 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
> * Trivial. Collects "groups" of changes and creates an edit script.
> */
> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
Should the || be a && instead? From a birds-eye view I would assume we
can stop scanning for changes when we exhaust (reach the top) of either
side. We just have to make sure everything from the other side is
accounted for in the last added change.
> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
> - for (l1 = i1; rchg1[i1 - 1]; i1--);
> - for (l2 = i2; rchg2[i2 - 1]; i2--);
> + if ((i1 > 0 && rchg1[i1 - 1]) ||
> + (i2 > 0 && rchg2[i2 - 1])) {
> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
Nit: The indentation of that line is off.
>
> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
> xdl_free_script(cscr);
>
> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
|
User |
On the Git mailing list, René Scharfe wrote (reply to this): Am 24.05.25 um 07:57 schrieb René Scharfe:
> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>> From: jinyaoguo <[email protected]>
>>
>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>> `rchg2[i2-1]`), which underflows the buffer.
>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>> those array accesses to prevent invalid negative indexing.
>
> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
> ends for rchg arrays, so an index of -1 should be within the bounds.
>
> i1 and i2 are decreased in lockstep, though, so one of them can become
> smaller than -1 if nrec is different between the files. And that's how
> this code run can indeed run off into the weeds.
Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
versa. Or can it? It makes sense that we reach the start of both
buffers at the same time if we walk backwards from the end, don't
misstep and have consistent rchg array contents, but I'm not sure.
Are you able to demonstrate any out-of-bounds access with e.g.,
Valgrind, AddressSanitizer or an assertion?
> Curiously, AddressSanitizer doesn't report anything, but if I add the
> following line after the outer for, I can trigger it to report a
> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>
> if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
That's because I forgot to add braces. D'oh! I can't trigger any
out-of-bounds access or that Oops with them properly in place. So I
let myself get fooled by a daring coding style. :-|
>
>>
>> Signed-off-by: Alex Guo <[email protected]>
>> ---
>> Fix buffer underflow in xdl_build_script
>>
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
>> Pull-Request: https://github.com/git/git/pull/1976
>>
>> xdiff/xdiffi.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 5a96e36dfbe..2e983965328 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>> * Trivial. Collects "groups" of changes and creates an edit script.
Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
>> */
>> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
>
> Should the || be a && instead? From a birds-eye view I would assume we
> can stop scanning for changes when we exhaust (reach the top) of either
> side. We just have to make sure everything from the other side is
> accounted for in the last added change.
>
>> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>> - for (l1 = i1; rchg1[i1 - 1]; i1--);
>> - for (l2 = i2; rchg2[i2 - 1]; i2--);
>> + if ((i1 > 0 && rchg1[i1 - 1]) ||
>> + (i2 > 0 && rchg2[i2 - 1])) {
>> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>
> Nit: The indentation of that line is off.
>
>>
>> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
>> xdl_free_script(cscr);
>>
>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
>
>
|
On the Git mailing list, Phillip Wood wrote (reply to this): On 24/05/2025 10:08, René Scharfe wrote:
> Am 24.05.25 um 07:57 schrieb René Scharfe:
>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>>> From: jinyaoguo <[email protected]>
>>>
>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>>> `rchg2[i2-1]`), which underflows the buffer.
>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>>> those array accesses to prevent invalid negative indexing.
>>
>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
>> ends for rchg arrays, so an index of -1 should be within the bounds.
and rchg[-1] == 0 so i1 and i2 can never drop below -1
>> i1 and i2 are decreased in lockstep, though, so one of them can become
>> smaller than -1 if nrec is different between the files. And that's how
>> this code run can indeed run off into the weeds.
> > Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
> versa. Or can it? It makes sense that we reach the start of both
> buffers at the same time if we walk backwards from the end, don't
> misstep and have consistent rchg array contents, but I'm not sure.
The code looks like
for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
for (l1 = i1; rchg1[i1 - 1]; i1--);
for (l2 = i2; rchg2[i2 - 1]; i2--);
I think I've convinced myself that it is safe assuming there are an equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == rchg2[-1] == 0. Each iteration consumes any changed lines in the preimage and the postimage plus a single context line from each (apart from the final iteration when the context lines may have been exhausted). At the start of the last iteration there are three possibilities
- i1 == -1 && i2 >= 0 => there are insertions as the start of the file.
As the context lines in the preimage have been exhausted all the
remaining rchg2 elements represent added lines and are consumed by
for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
i2 == 0 and the outer loop will exit.
- i1 >= 0 && i2 == -1 => there are deletions at the start of the file.
As the context lines in the postimage have been exhausted all the
remaining rchg1 elements represent deleted lines and are consumed by
for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
i1 == 0 and the outer loop will exit.
- i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
insertions and deletions at the beginning of the file. At the end of
the loop body i1 == 0 && i2 == 0 and the outer loop will exit.
We could add
if (i1 < -1 || i2 < -1)
BUG("mismatched context line count");
before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're worried about bugs that break the assumption that there are equal numbers of context lines on each side. Any such bug would generate invalid diffs. I don't know how likely that is to happen in practice.
Best Wishes
Phillip
> > Are you able to demonstrate any out-of-bounds access with e.g.,
> Valgrind, AddressSanitizer or an assertion?
> >> Curiously, AddressSanitizer doesn't report anything, but if I add the
>> following line after the outer for, I can trigger it to report a
>> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>>
>> if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
> > That's because I forgot to add braces. D'oh! I can't trigger any
> out-of-bounds access or that Oops with them properly in place. So I
> let myself get fooled by a daring coding style. :-|
> >>
>>>
>>> Signed-off-by: Alex Guo <[email protected]>
>>> ---
>>> Fix buffer underflow in xdl_build_script
>>>
>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
>>> Pull-Request: https://github.com/git/git/pull/1976
>>>
>>> xdiff/xdiffi.c | 7 ++++---
>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>> index 5a96e36dfbe..2e983965328 100644
>>> --- a/xdiff/xdiffi.c
>>> +++ b/xdiff/xdiffi.c
>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>>> * Trivial. Collects "groups" of changes and creates an edit script.
> > Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
> >>> */
>>> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
>>
>> Should the || be a && instead? From a birds-eye view I would assume we
>> can stop scanning for changes when we exhaust (reach the top) of either
>> side. We just have to make sure everything from the other side is
>> accounted for in the last added change.
>>
>>> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>>> - for (l1 = i1; rchg1[i1 - 1]; i1--);
>>> - for (l2 = i2; rchg2[i2 - 1]; i2--);
>>> + if ((i1 > 0 && rchg1[i1 - 1]) ||
>>> + (i2 > 0 && rchg2[i2 - 1])) {
>>> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>>> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>>
>> Nit: The indentation of that line is off.
>>
>>> >>> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
>>> xdl_free_script(cscr);
>>>
>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0 |
User |
On the Git mailing list, Phillip Wood wrote (reply to this): On 24/05/2025 14:38, Phillip Wood wrote:
> On 24/05/2025 10:08, René Scharfe wrote:
>> Am 24.05.25 um 07:57 schrieb René Scharfe:
>>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>>>> From: jinyaoguo <[email protected]>
>>>>
>>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>>>> `rchg2[i2-1]`), which underflows the buffer.
>>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>>>> those array accesses to prevent invalid negative indexing.
>>>
>>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
>>> ends for rchg arrays, so an index of -1 should be within the bounds.
> > and rchg[-1] == 0 so i1 and i2 can never drop below -1
> >>> i1 and i2 are decreased in lockstep, though, so one of them can become
>>> smaller than -1 if nrec is different between the files. And that's how
>>> this code run can indeed run off into the weeds.
>>
>> Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
>> versa. Or can it? It makes sense that we reach the start of both
>> buffers at the same time if we walk backwards from the end, don't
>> misstep and have consistent rchg array contents, but I'm not sure.
> The code looks like
> > for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; > i1--, i2--)
> if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
> for (l1 = i1; rchg1[i1 - 1]; i1--);
> for (l2 = i2; rchg2[i2 - 1]; i2--);
> > I think I've convinced myself that it is safe assuming there are an > equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == > rchg2[-1] == 0. Each iteration consumes any changed lines in the > preimage and the postimage plus a single context line from each (apart > from the final iteration when the context lines may have been > exhausted). At the start of the last iteration there are three > possibilities
> > - i1 == -1 && i2 >= 0 => there are insertions as the start of the file.
Sorry that should be i1 == 0 && i2 >= 0
> As the context lines in the preimage have been exhausted all the
> remaining rchg2 elements represent added lines and are consumed by
> for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
> i2 == 0 and the outer loop will exit.
> > - i1 >= 0 && i2 == -1 => there are deletions at the start of the file.
This one should be i1 >= 0 && i2 == 0
> As the context lines in the postimage have been exhausted all the
> remaining rchg1 elements represent deleted lines and are consumed by
> for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
> i1 == 0 and the outer loop will exit.
> > - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
> insertions and deletions at the beginning of the file. At the end of
> the loop body i1 == 0 && i2 == 0 and the outer loop will exit.
> > We could add
> > if (i1 < -1 || i2 < -1)
> BUG("mismatched context line count");
> > before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're > worried about bugs that break the assumption that there are equal > numbers of context lines on each side. Any such bug would generate > invalid diffs. I don't know how likely that is to happen in practice.
> > Best Wishes
> > Phillip
> >>
>> Are you able to demonstrate any out-of-bounds access with e.g.,
>> Valgrind, AddressSanitizer or an assertion?
>>
>>> Curiously, AddressSanitizer doesn't report anything, but if I add the
>>> following line after the outer for, I can trigger it to report a
>>> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>>>
>>> if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
>>
>> That's because I forgot to add braces. D'oh! I can't trigger any
>> out-of-bounds access or that Oops with them properly in place. So I
>> let myself get fooled by a daring coding style. :-|
>>
>>>
>>>>
>>>> Signed-off-by: Alex Guo <[email protected]>
>>>> ---
>>>> Fix buffer underflow in xdl_build_script
>>>>
>>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr- >>>> git-1976%2Fmugitya03%2Fbuf-1-v1
>>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr- >>>> git-1976/mugitya03/buf-1-v1
>>>> Pull-Request: https://github.com/git/git/pull/1976
>>>>
>>>> xdiff/xdiffi.c | 7 ++++---
>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>>> index 5a96e36dfbe..2e983965328 100644
>>>> --- a/xdiff/xdiffi.c
>>>> +++ b/xdiff/xdiffi.c
>>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t >>>> **xscr) {
>>>> * Trivial. Collects "groups" of changes and creates an edit >>>> script.
>>
>> Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
>>
>>>> */
>>>> for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= >>>> 0; i1--, i2--)
>>>
>>> Should the || be a && instead? From a birds-eye view I would assume we
>>> can stop scanning for changes when we exhaust (reach the top) of either
>>> side. We just have to make sure everything from the other side is
>>> accounted for in the last added change.
>>>
>>>> - if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>>>> - for (l1 = i1; rchg1[i1 - 1]; i1--);
>>>> - for (l2 = i2; rchg2[i2 - 1]; i2--);
>>>> + if ((i1 > 0 && rchg1[i1 - 1]) ||
>>>> + (i2 > 0 && rchg2[i2 - 1])) {
>>>> + for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>>>> + for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>>>
>>> Nit: The indentation of that line is off.
>>>
>>>> if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - >>>> i2))) {
>>>> xdl_free_script(cscr);
>>>>
>>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
> |
cc: René Scharfe [email protected]
cc: Phillip Wood [email protected]