-
Notifications
You must be signed in to change notification settings - Fork 26.4k
RFC: Accelerate xdiff and begin its rustification #1980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Welcome to GitGitGadgetHi @ezekielnewren, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that either:
You can CC potential reviewers by adding a footer to the PR description with the following syntax:
NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description, Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join [email protected], where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
There are issues in commit 2178ac1: |
1 similar comment
There are issues in commit 2178ac1: |
There are issues in commit 67f6c92: |
8b3cc99
to
39211e9
Compare
There are issues in commit d647348: |
39211e9
to
94ccfab
Compare
There are issues in commit d647348: |
94ccfab
to
d1acb81
Compare
There are issues in commit d647348: |
Upcoming patches will accelerate and simplify xdiff, while also porting parts of it to Rust. In preparation, add some stubs and setup the Rust build. For now, it is easier to let cargo build rust and have make or meson merely link against the static library that cargo builds. In line with ongoing libification efforts, use multiple crates to allow more modularity on the Rust side. xdiff is the crate that this series will focus on, but we also introduce the interop crate for future patch series. In order to facilitate interoperability between C and Rust, introduce C definitions for Rust primitive types in git-compat-util.h. Signed-off-by: Ezekiel Newren <[email protected]>
Move xdl_prepare_env() later in the file to avoid the need for forward declarations. Signed-off-by: Ezekiel Newren <[email protected]>
xrecord_t.next, xdfile_t.hbits, xdfile_t.rhash are initialized, but never used for anything by the code. Remove them. Signed-off-by: Ezekiel Newren <[email protected]>
A few commits ago, we added definitions for Rust primitive types, to facilitate interoperability between C and Rust. Switch a few variables to use these types. Which, for now, will require adding some casts. Signed-off-by: Ezekiel Newren <[email protected]>
We want to use xxhash for faster hashing. To facilitate that and to simplify the code. Separate the concerns of parsing and hashing into discrete steps. This makes swapping the hash function much easier. Since xdl_hash_record() both parses and hashses lines, this requires some slight code restructuring. Signed-off-by: Ezekiel Newren <[email protected]>
When no whitespace flags are present use xxhash, for faster hashing, otherwise use DJB2a (which is what xdiff has been using all along). The benchmark below compares my series with version v2.49.0 (built in build_release/ and build_v2.49.0/ respectively), running log commands on linux kernel with 3 different machines. $ BASE=/path/to/git/root // laptop // CPU: 6-core Intel Core i7-8750H (-MT MCP-) speed/min/max: 726/800/4100 MHz $ hyperfine --warmup 3 -L exe $BASE/build_release/git,$BASE/build_v2.49.0/git '{exe} log --oneline --shortstat v6.8..v6.9 >/dev/null' Benchmark 1: /home/ezekiel/development/work/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null Time (mean ± σ): 10.419 s ± 0.166 s [User: 10.097 s, System: 0.284 s] Range (min … max): 10.215 s … 10.680 s 10 runs Benchmark 2: /home/ezekiel/development/work/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null Time (mean ± σ): 10.980 s ± 0.137 s [User: 10.633 s, System: 0.308 s] Range (min … max): 10.791 s … 11.178 s 10 runs Summary /home/ezekiel/development/work/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null ran 1.05 ± 0.02 times faster than /home/ezekiel/development/work/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null // desktop // CPU: 8-core Intel Core i7-9700 (-MCP-) speed/min/max: 800/800/4700 MHz $ hyperfine --warmup 3 -L exe $BASE/build_release/git,$BASE/build_v2.49.0/git '{exe} log --oneline --shortstat v6.8..v6.9 >/dev/null' Benchmark 1: /home/steamuser/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null Time (mean ± σ): 6.823 s ± 0.020 s [User: 6.624 s, System: 0.180 s] Range (min … max): 6.801 s … 6.858 s 10 runs Benchmark 2: /home/steamuser/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null Time (mean ± σ): 8.151 s ± 0.024 s [User: 7.928 s, System: 0.198 s] Range (min … max): 8.105 s … 8.184 s 10 runs Summary /home/steamuser/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null ran 1.19 ± 0.01 times faster than /home/steamuser/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null // router // CPU: dual core Intel Celeron 3965U (-MCP-) speed/min/max: 1300/400/2200 MHz $ hyperfine --warmup 3 -L exe $BASE/build_release/git,$BASE/build_v2.49.0/git '{exe} log --oneline --shortstat v6.8..v6.9 >/dev/null' Benchmark 1: /home/metal/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null Time (mean ± σ): 21.209 s ± 0.054 s [User: 20.341 s, System: 0.605 s] Range (min … max): 21.135 s … 21.309 s 10 runs Benchmark 2: /home/metal/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null Time (mean ± σ): 23.683 s ± 0.060 s [User: 22.735 s, System: 0.672 s] Range (min … max): 23.566 s … 23.751 s 10 runs Summary /home/metal/dev/git/build_release/git log --oneline --shortstat v6.8..v6.9 >/dev/null ran 1.12 ± 0.00 times faster than /home/metal/dev/git/build_v2.49.0/git log --oneline --shortstat v6.8..v6.9 >/dev/null Signed-off-by: Ezekiel Newren <[email protected]>
d1acb81
to
9af7ba0
Compare
There are issues in commit 9c59e82: |
/allow |
User ezekielnewren is now allowed to use GitGitGadget. WARNING: ezekielnewren has no public email address set on GitHub; GitGitGadget needs an email address to Cc: you on your contribution, so that you receive any feedback on the Git mailing list. Go to https://github.com/settings/profile to make your preferred email public to let GitGitGadget know which email address to use. |
There are issues in commit 9c59e82: |
There are issues in commit 10a0eb8: |
There are issues in commit 9c59e82: |
There are issues in commit 10a0eb8: |
There are issues in commit 8116ad9: |
There are issues in commit 9c59e82: |
There are issues in commit 10a0eb8: |
There are issues in commit 8116ad9: |
There are issues in commit c28361b: |
There are issues in commit 9c59e82: |
There are issues in commit 10a0eb8: |
There are issues in commit 8116ad9: |
There are issues in commit c28361b: |
There are issues in commit 564a178: |
There are issues in commit f359002: |
...plus various build fixups. DO NOT SUBMIT. Split this into separate patches, squashing the fixups into earlier commits.
f359002
to
5951b2b
Compare
There are issues in commit 9c59e82: |
There are issues in commit bc1ae01: |
There are issues in commit 939b2ca: |
There are issues in commit 5951b2b: |
There are issues in commit 9c59e82: |
There are issues in commit bc1ae01: |
There are issues in commit 939b2ca: |
There are issues in commit 5951b2b: |
There are issues in commit 2bfe6b6: |
This series accelerates xdiff by 5-19%.
It also introduces Rust as a hard dependency.
This is just the beginning of many patches that I have to convert portions of, maybe eventually all of, xdiff to Rust. While working on that conversion, I found several ways to clarify the code, along with some optimizations.
So...
This obviously raises the question of whether we are ready to accept a hard dependency on Rust. Previous discussions on the mailing list and at Git Merge 2024 have not answered that question. If not now, will we be willing to accept such a hard dependency later? And what route do we want to take to get there?
About the optimizations in this series:
xdiff currently uses DJB2a for hashing (even though it is not explicitly named as such). This is an older hashing algorithm, and modern alternatives are superior. I chose xxhash because it’s faster, more collision resistant, and designed to be a standard. Other hash algorithms like aHash, MurMurHash, SipHash, and Fnv1a were considered, but my local testing made me feel like xxhash was the best choice for usage in xdiff.
In support of switching to xxhash, parsing and hashing were split into separate steps. And it turns out that memchr() is faster for parsing than character-by-character iteration.
My brother (Elijah, cc’ed) has been guiding and reviewing the development of converting xdiff to Rust.
cc: Elijah Newren [email protected]