Skip to content

Fix buffer underflow in xdl_build_script #1976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mugitya03
Copy link

@mugitya03 mugitya03 commented May 23, 2025

cc: René Scharfe [email protected]
cc: Phillip Wood [email protected]

The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
`i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
`rchg2[i2-1]`), which underflows the buffer.
This commit adds explicit `i1 > 0` and `i2 > 0` checks around
those array accesses to prevent invalid negative indexing.

Signed-off-by: Alex Guo <[email protected]>
@mugitya03
Copy link
Author

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1976/mugitya03/buf-1-v1

To fetch this version to local tag pr-git-1976/mugitya03/buf-1-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1976/mugitya03/buf-1-v1

Copy link

On the Git mailing list, René Scharfe wrote (reply to this):

Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
> From: jinyaoguo <[email protected]>
> 
> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
> `rchg2[i2-1]`), which underflows the buffer.
> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
> those array accesses to prevent invalid negative indexing.

xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
ends for rchg arrays, so an index of -1 should be within the bounds.  

i1 and i2 are decreased in lockstep, though, so one of them can become
smaller than -1 if nrec is different between the files.  And that's how
this code run can indeed run off into the weeds.

Curiously, AddressSanitizer doesn't report anything, but if I add the
following line after the outer for, I can trigger it to report a
heap-buffer-overflow with e.g., git show 8613c2bb6c:

	if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);

> 
> Signed-off-by: Alex Guo <[email protected]>
> ---
>     Fix buffer underflow in xdl_build_script
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
> Pull-Request: https://github.com/git/git/pull/1976
> 
>  xdiff/xdiffi.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 5a96e36dfbe..2e983965328 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>  	 * Trivial. Collects "groups" of changes and creates an edit script.
>  	 */
>  	for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)

Should the || be a && instead?  From a birds-eye view I would assume we
can stop scanning for changes when we exhaust (reach the top) of either
side.  We just have to make sure everything from the other side is
accounted for in the last added change.

> -		if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
> -			for (l1 = i1; rchg1[i1 - 1]; i1--);
> -			for (l2 = i2; rchg2[i2 - 1]; i2--);
> +		if ((i1 > 0 && rchg1[i1 - 1]) ||
> +			(i2 > 0 && rchg2[i2 - 1])) {
> +			for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
> +            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);

Nit: The indentation of that line is off.

>  
>  			if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
>  				xdl_free_script(cscr);
> 
> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0

Copy link

User René Scharfe <[email protected]> has been added to the cc: list.

Copy link

On the Git mailing list, René Scharfe wrote (reply to this):

Am 24.05.25 um 07:57 schrieb René Scharfe:
> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>> From: jinyaoguo <[email protected]>
>>
>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>> `rchg2[i2-1]`), which underflows the buffer.
>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>> those array accesses to prevent invalid negative indexing.
> 
> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
> ends for rchg arrays, so an index of -1 should be within the bounds.  
> 
> i1 and i2 are decreased in lockstep, though, so one of them can become
> smaller than -1 if nrec is different between the files.  And that's how
> this code run can indeed run off into the weeds.

Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
versa.  Or can it?  It makes sense that we reach the start of both
buffers at the same time if we walk backwards from the end, don't
misstep and have consistent rchg array contents, but I'm not sure.

Are you able to demonstrate any out-of-bounds access with e.g.,
Valgrind, AddressSanitizer or an assertion?

> Curiously, AddressSanitizer doesn't report anything, but if I add the
> following line after the outer for, I can trigger it to report a
> heap-buffer-overflow with e.g., git show 8613c2bb6c:
> 
> 	if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);

That's because I forgot to add braces.  D'oh!  I can't trigger any
out-of-bounds access or that Oops with them properly in place.  So I
let myself get fooled by a daring coding style. :-|

> 
>>
>> Signed-off-by: Alex Guo <[email protected]>
>> ---
>>     Fix buffer underflow in xdl_build_script
>>
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
>> Pull-Request: https://github.com/git/git/pull/1976
>>
>>  xdiff/xdiffi.c | 7 ++++---
>>  1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>> index 5a96e36dfbe..2e983965328 100644
>> --- a/xdiff/xdiffi.c
>> +++ b/xdiff/xdiffi.c
>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>>  	 * Trivial. Collects "groups" of changes and creates an edit script.

Trivial for Davide perhaps (libxdiff author), but not my mushy brain..

>>  	 */
>>  	for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
> 
> Should the || be a && instead?  From a birds-eye view I would assume we
> can stop scanning for changes when we exhaust (reach the top) of either
> side.  We just have to make sure everything from the other side is
> accounted for in the last added change.
> 
>> -		if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>> -			for (l1 = i1; rchg1[i1 - 1]; i1--);
>> -			for (l2 = i2; rchg2[i2 - 1]; i2--);
>> +		if ((i1 > 0 && rchg1[i1 - 1]) ||
>> +			(i2 > 0 && rchg2[i2 - 1])) {
>> +			for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>> +            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
> 
> Nit: The indentation of that line is off.
> 
>>  
>>  			if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
>>  				xdl_free_script(cscr);
>>
>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
> 
> 

Copy link

On the Git mailing list, Phillip Wood wrote (reply to this):

On 24/05/2025 10:08, René Scharfe wrote:
> Am 24.05.25 um 07:57 schrieb René Scharfe:
>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>>> From: jinyaoguo <[email protected]>
>>>
>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>>> `rchg2[i2-1]`), which underflows the buffer.
>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>>> those array accesses to prevent invalid negative indexing.
>>
>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
>> ends for rchg arrays, so an index of -1 should be within the bounds.

and rchg[-1] == 0 so i1 and i2 can never drop below -1

>> i1 and i2 are decreased in lockstep, though, so one of them can become
>> smaller than -1 if nrec is different between the files.  And that's how
>> this code run can indeed run off into the weeds.
> > Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
> versa.  Or can it?  It makes sense that we reach the start of both
> buffers at the same time if we walk backwards from the end, don't
> misstep and have consistent rchg array contents, but I'm not sure.
The code looks like

	for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
		if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
			for (l1 = i1; rchg1[i1 - 1]; i1--);
			for (l2 = i2; rchg2[i2 - 1]; i2--);

I think I've convinced myself that it is safe assuming there are an equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == rchg2[-1] == 0. Each iteration consumes any changed lines in the preimage and the postimage plus a single context line from each (apart from the final iteration when the context lines may have been exhausted). At the start of the last iteration there are three possibilities

 - i1 == -1 && i2 >= 0 => there are insertions as the start of the file.
   As the context lines in the preimage have been exhausted all the
   remaining rchg2 elements represent added lines and are consumed by
   for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
   i2 == 0 and the outer loop will exit.

 - i1 >= 0 && i2 == -1 => there are deletions at the start of the file.
   As the context lines in the postimage have been exhausted all the
   remaining rchg1 elements represent deleted lines and are consumed by
   for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
   i1 == 0 and the outer loop will exit.

 - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
   insertions and deletions at the beginning of the file. At the end of
   the loop body i1 == 0 && i2 == 0 and the outer loop will exit.

We could add

    if (i1 < -1 || i2 < -1)
        BUG("mismatched context line count");

before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're worried about bugs that break the assumption that there are equal numbers of context lines on each side. Any such bug would generate invalid diffs. I don't know how likely that is to happen in practice.

Best Wishes

Phillip

> > Are you able to demonstrate any out-of-bounds access with e.g.,
> Valgrind, AddressSanitizer or an assertion?
> >> Curiously, AddressSanitizer doesn't report anything, but if I add the
>> following line after the outer for, I can trigger it to report a
>> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>>
>> 	if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
> > That's because I forgot to add braces.  D'oh!  I can't trigger any
> out-of-bounds access or that Oops with them properly in place.  So I
> let myself get fooled by a daring coding style. :-|
> >>
>>>
>>> Signed-off-by: Alex Guo <[email protected]>
>>> ---
>>>      Fix buffer underflow in xdl_build_script
>>>
>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1976%2Fmugitya03%2Fbuf-1-v1
>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1976/mugitya03/buf-1-v1
>>> Pull-Request: https://github.com/git/git/pull/1976
>>>
>>>   xdiff/xdiffi.c | 7 ++++---
>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>> index 5a96e36dfbe..2e983965328 100644
>>> --- a/xdiff/xdiffi.c
>>> +++ b/xdiff/xdiffi.c
>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t **xscr) {
>>>   	 * Trivial. Collects "groups" of changes and creates an edit script.
> > Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
> >>>   	 */
>>>   	for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; i1--, i2--)
>>
>> Should the || be a && instead?  From a birds-eye view I would assume we
>> can stop scanning for changes when we exhaust (reach the top) of either
>> side.  We just have to make sure everything from the other side is
>> accounted for in the last added change.
>>
>>> -		if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>>> -			for (l1 = i1; rchg1[i1 - 1]; i1--);
>>> -			for (l2 = i2; rchg2[i2 - 1]; i2--);
>>> +		if ((i1 > 0 && rchg1[i1 - 1]) ||
>>> +			(i2 > 0 && rchg2[i2 - 1])) {
>>> +			for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>>> +            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>>
>> Nit: The indentation of that line is off.
>>
>>>   >>>   			if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - i2))) {
>>>   				xdl_free_script(cscr);
>>>
>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0

Copy link

User Phillip Wood <[email protected]> has been added to the cc: list.

Copy link

On the Git mailing list, Phillip Wood wrote (reply to this):

On 24/05/2025 14:38, Phillip Wood wrote:
> On 24/05/2025 10:08, René Scharfe wrote:
>> Am 24.05.25 um 07:57 schrieb René Scharfe:
>>> Am 23.05.25 um 22:51 schrieb Alex via GitGitGadget:
>>>> From: jinyaoguo <[email protected]>
>>>>
>>>> The loop in xdl_build_script used `i1 >= 0 || i2 >= 0`, causing
>>>> `i1` (or `i2`) to reach 0 and then access `rchg1[i1-1]` (or
>>>> `rchg2[i2-1]`), which underflows the buffer.
>>>> This commit adds explicit `i1 > 0` and `i2 > 0` checks around
>>>> those array accesses to prevent invalid negative indexing.
>>>
>>> xdl_prepare_ctx() in xdiff/xprepare.c allocates an extra entry at both
>>> ends for rchg arrays, so an index of -1 should be within the bounds.
> > and rchg[-1] == 0 so i1 and i2 can never drop below -1
> >>> i1 and i2 are decreased in lockstep, though, so one of them can become
>>> smaller than -1 if nrec is different between the files.  And that's how
>>> this code run can indeed run off into the weeds.
>>
>> Actually no, i1 can't seem to reach 0 without i2 also being 0 and vice
>> versa.  Or can it?  It makes sense that we reach the start of both
>> buffers at the same time if we walk backwards from the end, don't
>> misstep and have consistent rchg array contents, but I'm not sure.
> The code looks like
> >      for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= 0; > i1--, i2--)
>          if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>              for (l1 = i1; rchg1[i1 - 1]; i1--);
>              for (l2 = i2; rchg2[i2 - 1]; i2--);
> > I think I've convinced myself that it is safe assuming there are an > equal number of unchanged lines in rchg1 and rchg2 and rchg1[-1] == > rchg2[-1] == 0. Each iteration consumes any changed lines in the > preimage and the postimage plus a single context line from each (apart > from the final iteration when the context lines may have been > exhausted). At the start of the last iteration there are three > possibilities
> >   - i1 == -1 && i2 >= 0 => there are insertions as the start of the file.

Sorry that should be i1 == 0 && i2 >= 0

>     As the context lines in the preimage have been exhausted all the
>     remaining rchg2 elements represent added lines and are consumed by
>     for (l2 = i2; rchg2[i2 - 1]; i2--) so at the end of the loop body
>     i2 == 0 and the outer loop will exit.
> >   - i1 >= 0 && i2 == -1 => there are deletions at the start of the file.

This one should be i1 >= 0 && i2 == 0

>     As the context lines in the postimage have been exhausted all the
>     remaining rchg1 elements represent deleted lines and are consumed by
>     for (l1 = i1; rchg1[i1 - 1]; i1--) so at the end of the loop body
>     i1 == 0 and the outer loop will exit.
> >   - i1 >= 0 && i2 >= 0 => the first line is unchanged or there are
>     insertions and deletions at the beginning of the file. At the end of
>     the loop body i1 == 0 && i2 == 0 and the outer loop will exit.
> > We could add
> >      if (i1 < -1 || i2 < -1)
>          BUG("mismatched context line count");
> > before "if (rchg1[i1 - 1] || rchg2[i2 - 1])" inside the loop if we're > worried about bugs that break the assumption that there are equal > numbers of context lines on each side. Any such bug would generate > invalid diffs. I don't know how likely that is to happen in practice.
> > Best Wishes
> > Phillip
> >>
>> Are you able to demonstrate any out-of-bounds access with e.g.,
>> Valgrind, AddressSanitizer or an assertion?
>>
>>> Curiously, AddressSanitizer doesn't report anything, but if I add the
>>> following line after the outer for, I can trigger it to report a
>>> heap-buffer-overflow with e.g., git show 8613c2bb6c:
>>>
>>>     if (i1 < 0 || i2 < 0) fprintf(stderr, "Oops: %ld %ld\n", i1, i2);
>>
>> That's because I forgot to add braces.  D'oh!  I can't trigger any
>> out-of-bounds access or that Oops with them properly in place.  So I
>> let myself get fooled by a daring coding style. :-|
>>
>>>
>>>>
>>>> Signed-off-by: Alex Guo <[email protected]>
>>>> ---
>>>>      Fix buffer underflow in xdl_build_script
>>>>
>>>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr- >>>> git-1976%2Fmugitya03%2Fbuf-1-v1
>>>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr- >>>> git-1976/mugitya03/buf-1-v1
>>>> Pull-Request: https://github.com/git/git/pull/1976
>>>>
>>>>   xdiff/xdiffi.c | 7 ++++---
>>>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
>>>> index 5a96e36dfbe..2e983965328 100644
>>>> --- a/xdiff/xdiffi.c
>>>> +++ b/xdiff/xdiffi.c
>>>> @@ -951,9 +951,10 @@ int xdl_build_script(xdfenv_t *xe, xdchange_t >>>> **xscr) {
>>>>        * Trivial. Collects "groups" of changes and creates an edit >>>> script.
>>
>> Trivial for Davide perhaps (libxdiff author), but not my mushy brain..
>>
>>>>        */
>>>>       for (i1 = xe->xdf1.nrec, i2 = xe->xdf2.nrec; i1 >= 0 || i2 >= >>>> 0; i1--, i2--)
>>>
>>> Should the || be a && instead?  From a birds-eye view I would assume we
>>> can stop scanning for changes when we exhaust (reach the top) of either
>>> side.  We just have to make sure everything from the other side is
>>> accounted for in the last added change.
>>>
>>>> -        if (rchg1[i1 - 1] || rchg2[i2 - 1]) {
>>>> -            for (l1 = i1; rchg1[i1 - 1]; i1--);
>>>> -            for (l2 = i2; rchg2[i2 - 1]; i2--);
>>>> +        if ((i1 > 0 && rchg1[i1 - 1]) ||
>>>> +            (i2 > 0 && rchg2[i2 - 1])) {
>>>> +            for (l1 = i1; i1 > 0 && rchg1[i1 - 1]; i1--);
>>>> +            for (l2 = i2; i2 > 0 && rchg2[i2 - 1]; i2--);
>>>
>>> Nit: The indentation of that line is off.
>>>
>>>>               if (!(xch = xdl_add_change(cscr, i1, i2, l1 - i1, l2 - >>>> i2))) {
>>>>                   xdl_free_script(cscr);
>>>>
>>>> base-commit: 8613c2bb6cd16ef530dc5dd74d3b818a1ccbf1c0
> 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants