-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Comparing changes
Open a pull request
base repository: git/git
base: 066124da88a6d43d125b30a1bc8a66c2d8ef6423
head repository: git/git
compare: 43f70eaea0e3fa9d98c895e9341674a67262b657
- 13 commits
- 14 files changed
- 1 contributor
Commits on Mar 4, 2024
-
reftable/pq: use
size_t
to track iterator indexThe reftable priority queue is used by the merged iterator to yield records from its sub-iterators in the expected order. Each entry has a record corresponding to such a sub-iterator as well as an index that indicates which sub-iterator the record belongs to. But while the sub-iterators are tracked with a `size_t`, we store the index as an `int` in the entry. Fix this and use `size_t` consistently. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5c11529 - Browse repository at this point
Copy the full SHA 5c11529View commit details -
reftable/merged: make
merged_iter
structure privateThe `merged_iter` structure is not used anywhere outside of "merged.c", but is declared in its header. Move it into the code file so that it is clear that its implementation details are never exposed to anything. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 48929d2 - Browse repository at this point
Copy the full SHA 48929d2View commit details -
reftable/merged: advance subiter on subsequent iteration
When advancing the merged iterator, we pop the topmost entry from its priority queue and then advance the sub-iterator that the entry belongs to, adding the result as a new entry. This is quite sensible in the case where the merged iterator is used to actually iterate through records. But the merged iterator is also used when we look up a single record, only, so advancing the sub-iterator is wasted effort because we would never even look at the result. Instead of immediately advancing the sub-iterator, we can also defer this to the next iteration of the merged iterator by storing the intent-to-advance. This results in a small speedup when reading many records. The following benchmark creates 10000 refs, which will also end up with many ref lookups: Benchmark 1: update-ref: create many refs (revision = HEAD~) Time (mean ± σ): 337.2 ms ± 7.3 ms [User: 200.1 ms, System: 136.9 ms] Range (min … max): 329.3 ms … 373.2 ms 100 runs Benchmark 2: update-ref: create many refs (revision = HEAD) Time (mean ± σ): 332.5 ms ± 5.9 ms [User: 197.2 ms, System: 135.1 ms] Range (min … max): 327.6 ms … 359.8 ms 100 runs Summary update-ref: create many refs (revision = HEAD) ran 1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~) While this speedup alone isn't really worth it, this refactoring will also allow two additional optimizations in subsequent patches. First, it will allow us to special-case when there is only a single sub-iter left to circumvent the priority queue altogether. And second, it makes it easier to avoid copying records to the caller. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for aad8ad6 - Browse repository at this point
Copy the full SHA aad8ad6View commit details -
reftable/merged: make subiters own their records
For each subiterator, the merged table needs to track their current record. This record is owned by the priority queue though instead of by the merged iterator. This is not optimal performance-wise. For one, we need to move around records whenever we add or remove a record from the priority queue. Thus, the bigger the entries the more bytes we need to copy around. And compared to pointers, a reftable record is rather on the bigger side. The other issue is that this makes it harder to reuse the records. Refactor the code so that the merged iterator tracks ownership of the records per-subiter. Instead of having records in the priority queue, we can now use mere pointers to the per-subiter records. This also allows us to swap records between the caller and the per-subiter record instead of doing an actual copy via `reftable_record_copy_from()`, which removes the need to release the caller-provided record. This results in a noticeable speedup when iterating through many refs. The following benchmark iterates through 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 145.5 ms ± 4.5 ms [User: 142.5 ms, System: 2.8 ms] Range (min … max): 141.3 ms … 177.0 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 139.0 ms ± 4.7 ms [User: 136.1 ms, System: 2.8 ms] Range (min … max): 134.2 ms … 182.2 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) This refactoring also allows a subsequent refactoring where we start reusing memory allocated by the reftable records because we do not need to release the caller-provided record anymore. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for bb2d6be - Browse repository at this point
Copy the full SHA bb2d6beView commit details -
reftable/merged: remove unnecessary null check for subiters
Whenever we advance a subiter we first call `iterator_is_null()`. This is not needed though because we only ever advance subiters which have entries in the priority queue, and we do not end entries to the priority queue when the subiter has been exhausted. Drop the check as well as the now-unused function. This results in a surprisingly big speedup: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 138.1 ms ± 4.4 ms [User: 135.1 ms, System: 2.8 ms] Range (min … max): 133.4 ms … 167.3 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 134.4 ms ± 4.2 ms [User: 131.5 ms, System: 2.8 ms] Range (min … max): 130.0 ms … 164.0 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2d71a1d - Browse repository at this point
Copy the full SHA 2d71a1dView commit details -
reftable/merged: handle subiter cleanup on close only
When advancing one of the subiters fails we immediately release resources associated with that subiter. This is not necessary though as we will release these resources when closing the merged iterator anyway. Drop the logic and only release resources when the merged iterator is done. This is a mere cleanup that should help reduce the cognitive load when reading through the code. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3b6dd6a - Browse repository at this point
Copy the full SHA 3b6dd6aView commit details -
reftable/merged: circumvent pqueue with single subiter
The merged iterator uses a priority queue to order records so that we can yielid them in the expected order. This priority queue of course comes with some overhead as we need to add, compare and remove entries in that priority queue. In the general case, that overhead cannot really be avoided. But when we have a single subiter left then there is no need to use the priority queue anymore because the order is exactly the same as what that subiter would return. While having a single subiter may sound like an edge case, it happens more frequently than one might think. In the most common scenario, you can expect a repository to have a single large table that contains most of the records and then a set of smaller tables which contain later additions to the reftable stack. In this case it is quite likely that we exhaust subiters of those smaller stacks before exhausting the large table. Special-case this and return records directly from the remaining subiter. This results in a sizeable speedup when iterating over 1m refs in a repository with a single table: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 135.4 ms ± 4.4 ms [User: 132.5 ms, System: 2.8 ms] Range (min … max): 131.0 ms … 166.3 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 126.3 ms ± 3.9 ms [User: 123.3 ms, System: 2.8 ms] Range (min … max): 122.7 ms … 157.0 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f8c1a8e - Browse repository at this point
Copy the full SHA f8c1a8eView commit details -
reftable/merged: avoid duplicate pqueue emptiness check
When calling `merged_iter_next_void()` we first check whether the iter has been exhausted already. We already perform this check two levels down the stack in `merged_iter_next_entry()` though, which makes this check redundant. Now if this check was there to accelerate the common case it might have made sense to keep it. But the iterator being exhausted is rather the uncommon case because you can expect most reftable stacks to contain more than two refs. Simplify the code by removing the check. As `merged_iter_next_void()` is basically empty except for calling `merged_iter_next()` now, merge these two functions. This also results in a tiny speedup when iterating over many refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 125.6 ms ± 3.8 ms [User: 122.7 ms, System: 2.8 ms] Range (min … max): 122.4 ms … 153.4 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.8 ms] Range (min … max): 120.1 ms … 156.4 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 080f8c4 - Browse repository at this point
Copy the full SHA 080f8c4View commit details -
reftable/record: reuse refname when decoding
When decoding a reftable record we will first release the user-provided record and then decode the new record into it. This is quite inefficient as we basically need to reallocate at least the refname every time. Refactor the function to start tracking the refname capacity. Like this, we can stow away the refname, release, restore and then grow the refname to the required number of bytes via `REFTABLE_ALLOC_GROW()`. This refactoring is safe to do because all functions that assigning to the refname will first call `reftable_ref_record_release()`, which will zero out the complete record after releasing memory. This change results in a nice speedup when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 124.0 ms ± 3.9 ms [User: 121.1 ms, System: 2.7 ms] Range (min … max): 120.4 ms … 152.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 114.4 ms ± 3.7 ms [User: 111.5 ms, System: 2.7 ms] Range (min … max): 111.0 ms … 152.1 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Furthermore, with this change we now perform a mostly constant number of allocations when iterating. Before this change: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated After this change: HEAP SUMMARY: in use at exit: 13,603 bytes in 125 blocks total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 71d9a2e - Browse repository at this point
Copy the full SHA 71d9a2eView commit details -
reftable/record: reuse refname when copying
Do the same optimization as in the preceding commit, but this time for `reftable_record_copy()`. While not as noticeable, it still results in a small speedup when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 114.0 ms ± 3.8 ms [User: 111.1 ms, System: 2.7 ms] Range (min … max): 110.9 ms … 144.3 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 112.5 ms ± 3.7 ms [User: 109.5 ms, System: 2.8 ms] Range (min … max): 109.2 ms … 140.7 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.01 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6620f91 - Browse repository at this point
Copy the full SHA 6620f91View commit details -
reftable/record: decode keys in place
When reading a record from a block, we need to decode the record's key. As reftable keys are prefix-compressed, meaning they reuse a prefix from the preceding record's key, this is a bit more involved than just having to copy the relevant bytes: we need to figure out the prefix and suffix lengths, copy the prefix from the preceding record and finally copy the suffix from the current record. This is done by passing three buffers to `reftable_decode_key()`: one buffer that holds the result, one buffer that holds the last key, and one buffer that points to the current record. The final key is then assembled by calling `strbuf_add()` twice to copy over the prefix and suffix. Performing two memory copies is inefficient though. And we can indeed do better by decoding keys in place. Instead of providing two buffers, the caller may only call a single buffer that is already pre-populated with the last key. Like this, we only have to call `strbuf_setlen()` to trim the record to its prefix and then `strbuf_add()` to add the suffix. This refactoring leads to a noticeable performance bump when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 112.2 ms ± 3.9 ms [User: 109.3 ms, System: 2.8 ms] Range (min … max): 109.2 ms … 149.6 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 106.0 ms ± 3.5 ms [User: 103.2 ms, System: 2.7 ms] Range (min … max): 103.2 ms … 133.7 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for daf4f43 - Browse repository at this point
Copy the full SHA daf4f43View commit details -
reftable: allow inlining of a few functions
We have a few functions which are basically just accessors to structures. As those functions are executed inside the hot loop when iterating through many refs, the fact that they cannot be inlined is costing us some performance. Move the function definitions into their respective headers so that they can be inlined. This results in a performance improvement when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 105.9 ms ± 3.6 ms [User: 103.0 ms, System: 2.8 ms] Range (min … max): 103.1 ms … 133.4 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 100.7 ms ± 3.4 ms [User: 97.8 ms, System: 2.8 ms] Range (min … max): 97.8 ms … 124.0 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f1bf54a - Browse repository at this point
Copy the full SHA f1bf54aView commit details -
refs/reftable: precompute prefix length
We're recomputing the prefix length on every iteration of the ref iterator. Precompute it for another speedup when iterating over 1 million refs: Benchmark 1: show-ref: single matching ref (revision = HEAD~) Time (mean ± σ): 100.3 ms ± 3.7 ms [User: 97.3 ms, System: 2.8 ms] Range (min … max): 97.5 ms … 139.7 ms 1000 runs Benchmark 2: show-ref: single matching ref (revision = HEAD) Time (mean ± σ): 95.8 ms ± 3.4 ms [User: 92.9 ms, System: 2.8 ms] Range (min … max): 93.0 ms … 121.9 ms 1000 runs Summary show-ref: single matching ref (revision = HEAD) ran 1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~) Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 43f70ea - Browse repository at this point
Copy the full SHA 43f70eaView commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff 066124da88a6d43d125b30a1bc8a66c2d8ef6423...43f70eaea0e3fa9d98c895e9341674a67262b657