Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: 066124da88a6d43d125b30a1bc8a66c2d8ef6423
Choose a base ref
...
head repository: git/git
compare: 43f70eaea0e3fa9d98c895e9341674a67262b657
Choose a head ref
  • 13 commits
  • 14 files changed
  • 1 contributor

Commits on Mar 4, 2024

  1. reftable/pq: use size_t to track iterator index

    The reftable priority queue is used by the merged iterator to yield
    records from its sub-iterators in the expected order. Each entry has a
    record corresponding to such a sub-iterator as well as an index that
    indicates which sub-iterator the record belongs to. But while the
    sub-iterators are tracked with a `size_t`, we store the index as an
    `int` in the entry.
    
    Fix this and use `size_t` consistently.
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    5c11529 View commit details
    Browse the repository at this point in the history
  2. reftable/merged: make merged_iter structure private

    The `merged_iter` structure is not used anywhere outside of "merged.c",
    but is declared in its header. Move it into the code file so that it is
    clear that its implementation details are never exposed to anything.
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    48929d2 View commit details
    Browse the repository at this point in the history
  3. reftable/merged: advance subiter on subsequent iteration

    When advancing the merged iterator, we pop the topmost entry from its
    priority queue and then advance the sub-iterator that the entry belongs
    to, adding the result as a new entry. This is quite sensible in the case
    where the merged iterator is used to actually iterate through records.
    But the merged iterator is also used when we look up a single record,
    only, so advancing the sub-iterator is wasted effort because we would
    never even look at the result.
    
    Instead of immediately advancing the sub-iterator, we can also defer
    this to the next iteration of the merged iterator by storing the
    intent-to-advance. This results in a small speedup when reading many
    records. The following benchmark creates 10000 refs, which will also end
    up with many ref lookups:
    
        Benchmark 1: update-ref: create many refs (revision = HEAD~)
          Time (mean ± σ):     337.2 ms ±   7.3 ms    [User: 200.1 ms, System: 136.9 ms]
          Range (min … max):   329.3 ms … 373.2 ms    100 runs
    
        Benchmark 2: update-ref: create many refs (revision = HEAD)
          Time (mean ± σ):     332.5 ms ±   5.9 ms    [User: 197.2 ms, System: 135.1 ms]
          Range (min … max):   327.6 ms … 359.8 ms    100 runs
    
        Summary
          update-ref: create many refs (revision = HEAD) ran
            1.01 ± 0.03 times faster than update-ref: create many refs (revision = HEAD~)
    
    While this speedup alone isn't really worth it, this refactoring will
    also allow two additional optimizations in subsequent patches. First, it
    will allow us to special-case when there is only a single sub-iter left
    to circumvent the priority queue altogether. And second, it makes it
    easier to avoid copying records to the caller.
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    aad8ad6 View commit details
    Browse the repository at this point in the history
  4. reftable/merged: make subiters own their records

    For each subiterator, the merged table needs to track their current
    record. This record is owned by the priority queue though instead of by
    the merged iterator. This is not optimal performance-wise.
    
    For one, we need to move around records whenever we add or remove a
    record from the priority queue. Thus, the bigger the entries the more
    bytes we need to copy around. And compared to pointers, a reftable
    record is rather on the bigger side. The other issue is that this makes
    it harder to reuse the records.
    
    Refactor the code so that the merged iterator tracks ownership of the
    records per-subiter. Instead of having records in the priority queue, we
    can now use mere pointers to the per-subiter records. This also allows
    us to swap records between the caller and the per-subiter record instead
    of doing an actual copy via `reftable_record_copy_from()`, which removes
    the need to release the caller-provided record.
    
    This results in a noticeable speedup when iterating through many refs.
    The following benchmark iterates through 1 million refs:
    
      Benchmark 1: show-ref: single matching ref (revision = HEAD~)
        Time (mean ± σ):     145.5 ms ±   4.5 ms    [User: 142.5 ms, System: 2.8 ms]
        Range (min … max):   141.3 ms … 177.0 ms    1000 runs
    
      Benchmark 2: show-ref: single matching ref (revision = HEAD)
        Time (mean ± σ):     139.0 ms ±   4.7 ms    [User: 136.1 ms, System: 2.8 ms]
        Range (min … max):   134.2 ms … 182.2 ms    1000 runs
    
      Summary
        show-ref: single matching ref (revision = HEAD) ran
          1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    This refactoring also allows a subsequent refactoring where we start
    reusing memory allocated by the reftable records because we do not need
    to release the caller-provided record anymore.
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    bb2d6be View commit details
    Browse the repository at this point in the history
  5. reftable/merged: remove unnecessary null check for subiters

    Whenever we advance a subiter we first call `iterator_is_null()`. This
    is not needed though because we only ever advance subiters which have
    entries in the priority queue, and we do not end entries to the priority
    queue when the subiter has been exhausted.
    
    Drop the check as well as the now-unused function. This results in a
    surprisingly big speedup:
    
        Benchmark 1: show-ref: single matching ref (revision = HEAD~)
          Time (mean ± σ):     138.1 ms ±   4.4 ms    [User: 135.1 ms, System: 2.8 ms]
          Range (min … max):   133.4 ms … 167.3 ms    1000 runs
    
        Benchmark 2: show-ref: single matching ref (revision = HEAD)
          Time (mean ± σ):     134.4 ms ±   4.2 ms    [User: 131.5 ms, System: 2.8 ms]
          Range (min … max):   130.0 ms … 164.0 ms    1000 runs
    
        Summary
          show-ref: single matching ref (revision = HEAD) ran
            1.03 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    2d71a1d View commit details
    Browse the repository at this point in the history
  6. reftable/merged: handle subiter cleanup on close only

    When advancing one of the subiters fails we immediately release
    resources associated with that subiter. This is not necessary though as
    we will release these resources when closing the merged iterator anyway.
    
    Drop the logic and only release resources when the merged iterator is
    done. This is a mere cleanup that should help reduce the cognitive load
    when reading through the code.
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    3b6dd6a View commit details
    Browse the repository at this point in the history
  7. reftable/merged: circumvent pqueue with single subiter

    The merged iterator uses a priority queue to order records so that we
    can yielid them in the expected order. This priority queue of course
    comes with some overhead as we need to add, compare and remove entries
    in that priority queue.
    
    In the general case, that overhead cannot really be avoided. But when we
    have a single subiter left then there is no need to use the priority
    queue anymore because the order is exactly the same as what that subiter
    would return.
    
    While having a single subiter may sound like an edge case, it happens
    more frequently than one might think. In the most common scenario, you
    can expect a repository to have a single large table that contains most
    of the records and then a set of smaller tables which contain later
    additions to the reftable stack. In this case it is quite likely that we
    exhaust subiters of those smaller stacks before exhausting the large
    table.
    
    Special-case this and return records directly from the remaining
    subiter. This results in a sizeable speedup when iterating over 1m refs
    in a repository with a single table:
    
      Benchmark 1: show-ref: single matching ref (revision = HEAD~)
        Time (mean ± σ):     135.4 ms ±   4.4 ms    [User: 132.5 ms, System: 2.8 ms]
        Range (min … max):   131.0 ms … 166.3 ms    1000 runs
    
      Benchmark 2: show-ref: single matching ref (revision = HEAD)
        Time (mean ± σ):     126.3 ms ±   3.9 ms    [User: 123.3 ms, System: 2.8 ms]
        Range (min … max):   122.7 ms … 157.0 ms    1000 runs
    
      Summary
        show-ref: single matching ref (revision = HEAD) ran
          1.07 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    f8c1a8e View commit details
    Browse the repository at this point in the history
  8. reftable/merged: avoid duplicate pqueue emptiness check

    When calling `merged_iter_next_void()` we first check whether the iter
    has been exhausted already. We already perform this check two levels
    down the stack in `merged_iter_next_entry()` though, which makes this
    check redundant.
    
    Now if this check was there to accelerate the common case it might have
    made sense to keep it. But the iterator being exhausted is rather the
    uncommon case because you can expect most reftable stacks to contain
    more than two refs.
    
    Simplify the code by removing the check. As `merged_iter_next_void()` is
    basically empty except for calling `merged_iter_next()` now, merge these
    two functions. This also results in a tiny speedup when iterating over
    many refs:
    
        Benchmark 1: show-ref: single matching ref (revision = HEAD~)
          Time (mean ± σ):     125.6 ms ±   3.8 ms    [User: 122.7 ms, System: 2.8 ms]
          Range (min … max):   122.4 ms … 153.4 ms    1000 runs
    
        Benchmark 2: show-ref: single matching ref (revision = HEAD)
          Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.8 ms]
          Range (min … max):   120.1 ms … 156.4 ms    1000 runs
    
        Summary
          show-ref: single matching ref (revision = HEAD) ran
            1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    080f8c4 View commit details
    Browse the repository at this point in the history
  9. reftable/record: reuse refname when decoding

    When decoding a reftable record we will first release the user-provided
    record and then decode the new record into it. This is quite inefficient
    as we basically need to reallocate at least the refname every time.
    
    Refactor the function to start tracking the refname capacity. Like this,
    we can stow away the refname, release, restore and then grow the refname
    to the required number of bytes via `REFTABLE_ALLOC_GROW()`.
    
    This refactoring is safe to do because all functions that assigning to
    the refname will first call `reftable_ref_record_release()`, which will
    zero out the complete record after releasing memory.
    
    This change results in a nice speedup when iterating over 1 million
    refs:
    
      Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    
        Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.7 ms]
        Range (min … max):   120.4 ms … 152.7 ms    1000 runs
    
      Benchmark 2: show-ref: single matching ref (revision = HEAD)
        Time (mean ± σ):     114.4 ms ±   3.7 ms    [User: 111.5 ms, System: 2.7 ms]
        Range (min … max):   111.0 ms … 152.1 ms    1000 runs
    
      Summary
        show-ref: single matching ref (revision = HEAD) ran
          1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Furthermore, with this change we now perform a mostly constant number of
    allocations when iterating. Before this change:
    
      HEAP SUMMARY:
          in use at exit: 13,603 bytes in 125 blocks
        total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated
    
    After this change:
    
      HEAP SUMMARY:
          in use at exit: 13,603 bytes in 125 blocks
        total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    71d9a2e View commit details
    Browse the repository at this point in the history
  10. reftable/record: reuse refname when copying

    Do the same optimization as in the preceding commit, but this time for
    `reftable_record_copy()`. While not as noticeable, it still results in a
    small speedup when iterating over 1 million refs:
    
      Benchmark 1: show-ref: single matching ref (revision = HEAD~)
        Time (mean ± σ):     114.0 ms ±   3.8 ms    [User: 111.1 ms, System: 2.7 ms]
        Range (min … max):   110.9 ms … 144.3 ms    1000 runs
    
      Benchmark 2: show-ref: single matching ref (revision = HEAD)
        Time (mean ± σ):     112.5 ms ±   3.7 ms    [User: 109.5 ms, System: 2.8 ms]
        Range (min … max):   109.2 ms … 140.7 ms    1000 runs
    
      Summary
        show-ref: single matching ref (revision = HEAD) ran
          1.01 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    6620f91 View commit details
    Browse the repository at this point in the history
  11. reftable/record: decode keys in place

    When reading a record from a block, we need to decode the record's key.
    As reftable keys are prefix-compressed, meaning they reuse a prefix from
    the preceding record's key, this is a bit more involved than just having
    to copy the relevant bytes: we need to figure out the prefix and suffix
    lengths, copy the prefix from the preceding record and finally copy the
    suffix from the current record.
    
    This is done by passing three buffers to `reftable_decode_key()`: one
    buffer that holds the result, one buffer that holds the last key, and
    one buffer that points to the current record. The final key is then
    assembled by calling `strbuf_add()` twice to copy over the prefix and
    suffix.
    
    Performing two memory copies is inefficient though. And we can indeed do
    better by decoding keys in place. Instead of providing two buffers, the
    caller may only call a single buffer that is already pre-populated with
    the last key. Like this, we only have to call `strbuf_setlen()` to trim
    the record to its prefix and then `strbuf_add()` to add the suffix.
    
    This refactoring leads to a noticeable performance bump when iterating
    over 1 million refs:
    
      Benchmark 1: show-ref: single matching ref (revision = HEAD~)
        Time (mean ± σ):     112.2 ms ±   3.9 ms    [User: 109.3 ms, System: 2.8 ms]
        Range (min … max):   109.2 ms … 149.6 ms    1000 runs
    
      Benchmark 2: show-ref: single matching ref (revision = HEAD)
        Time (mean ± σ):     106.0 ms ±   3.5 ms    [User: 103.2 ms, System: 2.7 ms]
        Range (min … max):   103.2 ms … 133.7 ms    1000 runs
    
      Summary
        show-ref: single matching ref (revision = HEAD) ran
          1.06 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    daf4f43 View commit details
    Browse the repository at this point in the history
  12. reftable: allow inlining of a few functions

    We have a few functions which are basically just accessors to
    structures. As those functions are executed inside the hot loop when
    iterating through many refs, the fact that they cannot be inlined is
    costing us some performance.
    
    Move the function definitions into their respective headers so that they
    can be inlined. This results in a performance improvement when iterating
    over 1 million refs:
    
        Benchmark 1: show-ref: single matching ref (revision = HEAD~)
          Time (mean ± σ):     105.9 ms ±   3.6 ms    [User: 103.0 ms, System: 2.8 ms]
          Range (min … max):   103.1 ms … 133.4 ms    1000 runs
    
        Benchmark 2: show-ref: single matching ref (revision = HEAD)
          Time (mean ± σ):     100.7 ms ±   3.4 ms    [User: 97.8 ms, System: 2.8 ms]
          Range (min … max):    97.8 ms … 124.0 ms    1000 runs
    
        Summary
          show-ref: single matching ref (revision = HEAD) ran
            1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    f1bf54a View commit details
    Browse the repository at this point in the history
  13. refs/reftable: precompute prefix length

    We're recomputing the prefix length on every iteration of the ref
    iterator. Precompute it for another speedup when iterating over 1
    million refs:
    
        Benchmark 1: show-ref: single matching ref (revision = HEAD~)
          Time (mean ± σ):     100.3 ms ±   3.7 ms    [User: 97.3 ms, System: 2.8 ms]
          Range (min … max):    97.5 ms … 139.7 ms    1000 runs
    
        Benchmark 2: show-ref: single matching ref (revision = HEAD)
          Time (mean ± σ):      95.8 ms ±   3.4 ms    [User: 92.9 ms, System: 2.8 ms]
          Range (min … max):    93.0 ms … 121.9 ms    1000 runs
    
        Summary
          show-ref: single matching ref (revision = HEAD) ran
            1.05 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)
    
    Signed-off-by: Patrick Steinhardt <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    pks-t authored and gitster committed Mar 4, 2024
    Configuration menu
    Copy the full SHA
    43f70ea View commit details
    Browse the repository at this point in the history