Skip to content

RFC: Accelerate xdiff and begin its rustification #1980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

ezekielnewren
Copy link

This series accelerates xdiff by 5-19%.

It also introduces Rust as a hard dependency.

This is just the beginning of many patches that I have to convert portions of, maybe eventually all of, xdiff to Rust. While working on that conversion, I found several ways to clarify the code, along with some optimizations.

So...

This obviously raises the question of whether we are ready to accept a hard dependency on Rust. Previous discussions on the mailing list and at Git Merge 2024 have not answered that question. If not now, will we be willing to accept such a hard dependency later? And what route do we want to take to get there?

About the optimizations in this series:

xdiff currently uses DJB2a for hashing (even though it is not explicitly named as such). This is an older hashing algorithm, and modern alternatives are superior. I chose xxhash because it’s faster, more collision resistant, and designed to be a standard. Other hash algorithms like aHash, MurMurHash, SipHash, and Fnv1a were considered, but my local testing made me feel like xxhash was the best choice for usage in xdiff.

In support of switching to xxhash, parsing and hashing were split into separate steps. And it turns out that memchr() is faster for parsing than character-by-character iteration.

My brother (Elijah, cc’ed) has been guiding and reviewing the development of converting xdiff to Rust.

cc: Elijah Newren [email protected]

Copy link

Welcome to GitGitGadget

Hi @ezekielnewren, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <[email protected]>, Ill Takalook <[email protected]>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

Copy link

There are issues in commit 2178ac1:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

1 similar comment
Copy link

There are issues in commit 2178ac1:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit 67f6c92:
Random stab in the dark with Rust building in ci
Commit checks stopped - the message is too short
Commit not signed off

@newren newren force-pushed the xdiff_rust_speedup branch 2 times, most recently from 8b3cc99 to 39211e9 Compare May 30, 2025 21:12
Copy link

There are issues in commit d647348:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

@newren newren force-pushed the xdiff_rust_speedup branch from 39211e9 to 94ccfab Compare May 30, 2025 22:07
Copy link

There are issues in commit d647348:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

@newren newren force-pushed the xdiff_rust_speedup branch from 94ccfab to d1acb81 Compare May 30, 2025 23:39
Copy link

There are issues in commit d647348:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Upcoming patches will accelerate and simplify xdiff, while also
porting parts of it to Rust. In preparation, add some stubs and setup
the Rust build. For now, it is easier to let cargo build rust and
have make or meson merely link against the static library that cargo
builds. In line with ongoing libification efforts, use multiple
crates to allow more modularity on the Rust side. xdiff is the crate
that this series will focus on, but we also introduce the interop
crate for future patch series.

In order to facilitate interoperability between C and Rust, introduce
C definitions for Rust primitive types in git-compat-util.h.

Signed-off-by: Ezekiel Newren <[email protected]>
Move xdl_prepare_env() later in the file to avoid the need
for forward declarations.

Signed-off-by: Ezekiel Newren <[email protected]>
xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized,
but never used for anything by the code. Remove them.

Signed-off-by: Ezekiel Newren <[email protected]>
A few commits ago, we added definitions for Rust primitive types,
to facilitate interoperability between C and Rust. Switch a
few variables to use these types. Which, for now, will
require adding some casts.

Signed-off-by: Ezekiel Newren <[email protected]>
We want to use xxhash for faster hashing. To facilitate that
and to simplify the code. Separate the concerns of parsing
and hashing into discrete steps. This makes swapping the hash
function much easier. Since xdl_hash_record() both parses and
hashses lines, this requires some slight code restructuring.

Signed-off-by: Ezekiel Newren <[email protected]>
When no whitespace flags are present use xxhash, for faster
hashing, otherwise use DJB2a (which is what xdiff has been
using all along).

The benchmark below compares my series with version v2.49.0
(built in build_release/ and build_v2.49.0/ respectively),
running log commands on linux kernel with 3 different machines.

$ BASE=/path/to/git/root

// laptop
// CPU: 6-core Intel Core i7-8750H (-MT MCP-) speed/min/max: 726/800/4100 MHz
$ hyperfine --warmup 3 -L exe $BASE/build_release/git,$BASE/build_v2.49.0/git '{exe} log --oneline --shortstat v6.8..v6.9 >/dev/null'
Benchmark 1: /home/ezekiel/development/work/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null
  Time (mean ± σ):     10.419 s ±  0.166 s    [User: 10.097 s, System: 0.284 s]
  Range (min … max):   10.215 s … 10.680 s    10 runs

Benchmark 2: /home/ezekiel/development/work/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null
  Time (mean ± σ):     10.980 s ±  0.137 s    [User: 10.633 s, System: 0.308 s]
  Range (min … max):   10.791 s … 11.178 s    10 runs

Summary
  /home/ezekiel/development/work/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null ran
    1.05 ± 0.02 times faster than /home/ezekiel/development/work/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null

// desktop
// CPU: 8-core Intel Core i7-9700 (-MCP-) speed/min/max: 800/800/4700 MHz
$ hyperfine --warmup 3 -L exe $BASE/build_release/git,$BASE/build_v2.49.0/git '{exe} log --oneline --shortstat v6.8..v6.9 >/dev/null'
Benchmark 1: /home/steamuser/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null
  Time (mean ± σ):      6.823 s ±  0.020 s    [User: 6.624 s, System: 0.180 s]
  Range (min … max):    6.801 s …  6.858 s    10 runs

Benchmark 2: /home/steamuser/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null
  Time (mean ± σ):      8.151 s ±  0.024 s    [User: 7.928 s, System: 0.198 s]
  Range (min … max):    8.105 s …  8.184 s    10 runs

Summary
  /home/steamuser/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null ran
    1.19 ± 0.01 times faster than /home/steamuser/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null

// router
// CPU: dual core Intel Celeron 3965U (-MCP-) speed/min/max: 1300/400/2200 MHz
$ hyperfine --warmup 3 -L exe $BASE/build_release/git,$BASE/build_v2.49.0/git '{exe} log --oneline --shortstat v6.8..v6.9 >/dev/null'
Benchmark 1: /home/metal/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null
  Time (mean ± σ):     21.209 s ±  0.054 s    [User: 20.341 s, System: 0.605 s]
  Range (min … max):   21.135 s … 21.309 s    10 runs

Benchmark 2: /home/metal/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null
  Time (mean ± σ):     23.683 s ±  0.060 s    [User: 22.735 s, System: 0.672 s]
  Range (min … max):   23.566 s … 23.751 s    10 runs

Summary
  /home/metal/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null ran
    1.12 ± 0.00 times faster than /home/metal/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null

Signed-off-by: Ezekiel Newren <[email protected]>
Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

@dscho
Copy link
Member

dscho commented Jun 1, 2025

/allow

Copy link

User ezekielnewren is now allowed to use GitGitGadget.

WARNING: ezekielnewren has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use.

Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit 10a0eb8:
meson.build: make rust the first dependency for the libgit target
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit 10a0eb8:
meson.build: make rust the first dependency for the libgit target
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 8116ad9:
meson.build: link libdl
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit 10a0eb8:
meson.build: make rust the first dependency for the libgit target
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 8116ad9:
meson.build: link libdl
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit c28361b:
.github/workflows/main.yml: use github actions to setup rust
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit 10a0eb8:
meson.build: make rust the first dependency for the libgit target
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 8116ad9:
meson.build: link libdl
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit c28361b:
.github/workflows/main.yml: use github actions to setup rust
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 564a178:
Revert "meson.build: link libdl"
Commit not signed off

Copy link

There are issues in commit f359002:
Revert "meson.build: make rust the first dependency for the libgit target"
Commit not signed off

newren and others added 3 commits June 5, 2025 12:40
...plus various build fixups.

DO NOT SUBMIT.  Split this into separate patches, squashing the fixups
into earlier commits.
Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit bc1ae01:
Random stab in the dark with Rust building in ci
Commit not signed off

Copy link

There are issues in commit 939b2ca:
install rust with ci/install-dependencies.sh
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 5951b2b:
new commit
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 9c59e82:
xdiff: conditionally use Rust's implementation of xxhash
Lines in the body of the commit messages should be wrapped between 60 and 76 characters.
Indented lines, and lines without whitespace, are exempt

Copy link

There are issues in commit bc1ae01:
Random stab in the dark with Rust building in ci
Commit not signed off

Copy link

There are issues in commit 939b2ca:
install rust with ci/install-dependencies.sh
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 5951b2b:
new commit
Commit checks stopped - the message is too short
Commit not signed off

Copy link

There are issues in commit 2bfe6b6:
use full path for cargo --version
Commit checks stopped - the message is too short
Commit not signed off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants