Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: git/git
base: 3256584c36f649abb2af58e7b190d3cf674ba56e
Choose a base ref
...
head repository: git/git
compare: 9ccf3e9b22b6843892319b189fd7aed37c451420
Choose a head ref
  • 17 commits
  • 26 files changed
  • 1 contributor

Commits on Mar 12, 2024

  1. config: forbid newline as core.commentChar

    Since we usually look for a comment char while parsing line-oriented
    files, setting core.commentChar to a single newline can confuse our code
    quite a bit. For example, using it with "git commit" causes us to fail
    to recognize any of the template as comments, including it in the config
    message. Which kind of makes sense, since the template content is on its
    own line (so no line can "start" with a newline). In other spots I would
    not be surprised if you can create more mischief (e.g., violating loop
    assumptions) but I didn't dig into it.
    
    Since comment characters are a local preference, to some degree this is
    a case of "if it hurts, don't do it". But given that this would be a
    silly and pointless thing to do, and that it makes it harder to reason
    about code parsing comment lines, let's just forbid it.
    
    There are other cases that are perhaps questionable (e.g., setting the
    comment char to a single space), but they seem to behave reasonably (at
    least a simple "git commit" will correctly identify and strip the
    template lines). So I haven't worried about going on a hunt for every
    stupid thing a user might do to themselves, and just focused on the most
    confusing case.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    727565e View commit details
    Browse the repository at this point in the history
  2. strbuf: simplify comment-handling in add_lines() helper

    In strbuf_add_commented_lines(), we prepare two strings with potential
    prefixes: one with just the comment char, and one with an additional
    space. In the add_lines() helper, we use the one without the extra space
    for blank lines or lines starting with a tab.
    
    While passing in two separate prefixes to the helper is very flexible,
    it's more flexibility than we actually use (or are likely to use, since
    the rules inside add_lines() only make sense if "prefix2" is a variant
    of "prefix1" without the extra space). And setting up the two strings
    makes refactoring in strbuf_add_commented_lines() awkward.
    
    Instead, let's pass in a single string, and just let add_lines() add the
    extra space to the result as appropriate.
    
    We do still need to pass in a flag to trigger this behavior. The helper
    is shared by strbuf_add_lines(), which passes in a NULL "prefix2" to
    inhibit this extra handling.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    db7f930 View commit details
    Browse the repository at this point in the history
  3. strbuf: avoid static variables in strbuf_add_commented_lines()

    In strbuf_add_commented_lines(), we have to convert the single-byte
    comment_line_char into a string to pass to add_lines(). We cache the
    created string using a static-local variable. But this makes the
    function non-reentrant, and it's doubtful that this provides any real
    performance benefit given that we know the string always contains a
    single character.
    
    So let's just create it from scratch each time, and to give the compiler
    the maximal opportunity to make it fast we'll ditch the over-complicated
    xsnprintf() and just assign directly into the array.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    3b45450 View commit details
    Browse the repository at this point in the history
  4. commit: refactor base-case of adjust_comment_line_char()

    When core.commentChar is set to "auto", we check a set of candidate
    characters against the proposed buffer to see which if any can be used
    without ambiguity. But before we do that, we optimize for the common
    case that the default "#" is fine by just seeing if it is present in the
    buffer at all.
    
    The way we do this is a bit subtle, though: we assign the candidate
    character to comment_line_char preemptively, then check if it works, and
    return if it does. The subtle part is that sometimes setting
    comment_line_char is important (after we return, the important outcome
    is the fact that we have set the variable) and sometimes it is useless
    (if our optimization fails, we go on to do the more careful checks and
    eventually assign something else instead).
    
    To make it more clear what is happening (and to make further refactoring
    of comment_line_char easier), let's check our candidate character
    directly, and then assign as part of returning if it worked out.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    1751e58 View commit details
    Browse the repository at this point in the history
  5. strbuf: avoid shadowing global comment_line_char name

    Several comment-related strbuf functions take a comment_line_char
    parameter. There's also a global comment_line_char variable, which is
    closely related (most callers pass it in as this parameter). Let's avoid
    shadowing the global name. This makes it more obvious that we're not
    using the global value, and it will be especially helpful as we refactor
    the global in future patches (in particular, any macro trickery wouldn't
    work because the preprocessor doesn't respect scope).
    
    We'll use "comment_prefix". That should be descriptive enough, and as a
    bonus is more neutral with respect to the "char" type (since we'll
    eventually swap it out for a string).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    2786d05 View commit details
    Browse the repository at this point in the history
  6. environment: store comment_line_char as a string

    We'd like to eventually support multi-byte comment prefixes, but the
    comment_line_char variable is referenced in many spots, making the
    transition difficult.
    
    Let's start by storing the character in a NUL-terminated string. That
    will let us switch code over incrementally to the string format, and we
    can easily support the existing code with a macro wrapper (since we'll
    continue to allow only a single-byte prefix, this will behave
    identically).
    
    Once all references to the "char" variable have been converted, we can
    drop it and enable longer strings.
    
    We'll still have to touch all of the spots that create or set the
    variable in this patch, but there are only a few (reading the config,
    and the "auto" character selector).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    72a7d5d View commit details
    Browse the repository at this point in the history
  7. strbuf: accept a comment string for strbuf_stripspace()

    As part of our transition to multi-byte comment characters, let's take a
    NUL-terminated string pointer for strbuf_stripspace(), rather than a
    single character. We can continue to support its feature of ignoring
    comments by accepting a NULL pointer (as opposed to the current behavior
    of a NUL byte).
    
    All of the callers have to be adjusted, but they can all just pass
    comment_line_str (or NULL).
    
    Inside the function we detect comments by comparing the first byte of a
    line to the comment character. We'll adjust that to use starts_with(),
    which will match multiple bytes (though for now, of course, we still
    only allow a single byte, so it's academic).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    2982b65 View commit details
    Browse the repository at this point in the history
  8. strbuf: accept a comment string for strbuf_commented_addf()

    As part of our transition to multi-byte comment characters, let's take a
    NUL-terminated string pointer for strbuf_commented_addf() rather than a
    single character.
    
    All of the callers have to be adjusted, but they can just pass
    comment_line_str rather than comment_line_char.
    
    Note that we rely on strbuf_add_commented_lines() under the hood, so
    we'll cheat a bit to squeeze our string into a single character (for now
    the two are equivalent, and we'll address this TODO in the next patch).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    3a35d96 View commit details
    Browse the repository at this point in the history
  9. strbuf: accept a comment string for strbuf_add_commented_lines()

    As part of our transition to multi-byte comment characters, let's take a
    NUL-terminated string pointer for strbuf_add_commented_lines() rather
    than a single character.
    
    All of the callers have to be adjusted; most can just pass
    comment_line_str rather than comment_line_char.
    
    And now our "cheat" in strbuf_commented_addf() can go away, as we can
    take the full string from it.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    a1bb146 View commit details
    Browse the repository at this point in the history
  10. prefer comment_line_str to comment_line_char for printing

    As part of our transition to multi-byte comment characters, we should
    use the string variable rather than the historical character variable.
    All of the sites adjusted here are just swapping out "%c" for "%s" in
    format strings, or strbuf_addch() for strbuf_addstr(). The type system
    and printf-attribute give the compiler enough information to make sure
    our formats and variable changes all match (especially important for
    cases where the format string is defined far away from its use, like
    prepare_to_commit() in commit.c).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    f99e1d9 View commit details
    Browse the repository at this point in the history
  11. find multi-byte comment chars in NUL-terminated strings

    Several parts of the code need to identify lines that begin with the
    comment character, and do so with a simple byte equality check. As part
    of the transition to handling multi-byte characters, we need to match
    all of the bytes. For cases where we are looking in a NUL-terminated
    string, we can just use starts_with(), which checks all of the
    characters in comment_line_str.
    
    Note that we can drop the "line.len" check in wt-status.c's
    read_rebase_todolist(). The starts_with() function handles the case of
    an empty haystack buffer (it will always return false for a non-empty
    prefix).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    600559b View commit details
    Browse the repository at this point in the history
  12. find multi-byte comment chars in unterminated buffers

    As with the previous patch, we need to swap out single-byte matching for
    something like starts_with() to match all bytes of a multi-byte comment
    character. But for cases where the buffer is not NUL-terminated (and we
    instead have an explicit size or end pointer), it's not safe to use
    starts_with(), as it might walk off the end of the buffer.
    
    Let's introduce a new starts_with_mem() that does the same thing but
    also accepts the length of the "haystack" str and makes sure not to walk
    past it.
    
    Note that in most cases the existing code did not need a length check at
    all, since it was written in a way that knew we had at least one byte
    available (and that was all we checked). So I had to read each one to
    find the appropriate bounds. The one exception is sequencer.c's
    add_commented_lines(), where we can actually get rid of the length
    check. Just like starts_with(), our starts_with_mem() handles an empty
    haystack variable by not matching (assuming a non-empty prefix).
    
    A few notes on the implementation of starts_with_mem():
    
      - it would be equally correct to take an "end" pointer (and indeed,
        many of the callers have this and have to subtract to come up with
        the length). I think taking a ptr/size combo is a more usual
        interface for our codebase, though, and has the added benefit that
        the function signature makes it harder to mix up the three
        parameters.
    
      - we could obviously build starts_with() on top of this by passing
        strlen(str) as the length. But it's possible that starts_with() is a
        relatively hot code path, and it should not pay that penalty (it can
        generally return an answer proportional to the size of the prefix,
        not the whole string).
    
      - it naively feels like xstrncmpz() should be able to do the same
        thing, but that's not quite true. If you pass the length of the
        haystack buffer, then strncmp() finds that a shorter prefix string
        is "less than" than the haystack, even if the haystack starts with
        the prefix. If you pass the length of the prefix, then you risk
        reading past the end of the haystack if it is shorter than the
        prefix. So I think we really do need a new function.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    2ec225d View commit details
    Browse the repository at this point in the history
  13. sequencer: handle multi-byte comment characters when writing todo list

    We already match multi-byte comment characters in parse_insn_line(),
    thanks to the previous commit, yielding a TODO_COMMENT entry. But in
    todo_list_to_strbuf(), we may call command_to_char() to convert that
    back into something we can output.
    
    We can't just return comment_line_char anymore, since it may require
    multiple bytes. Instead, we'll return "0" for this case, which is the
    same thing we'd return for a command which does not have a single-letter
    abbreviation (e.g., "revert" or "noop"). There is only a single caller
    of command_to_char(), and upon seeing "0" it falls back to outputting
    the full name via command_to_string(). So we can handle TODO_COMMENT
    there, returning the full string.
    
    Note that there are many other callers of command_to_string(), which
    will now behave differently if they pass TODO_COMMENT. But we would not
    expect that to happen; prior to this commit, the function just calls
    die() in this case. And looking at those callers, that makes sense;
    e.g., do_pick_commit() will only be called when servicing a pick
    command, and should never be called for a comment in the first place.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    7eb35e0 View commit details
    Browse the repository at this point in the history
  14. wt-status: drop custom comment-char stringification

    In wt_longstatus_print_tracking() we may conditionally show a comment
    prefix based on the wt_status->display_comment_prefix flag. We handle
    that by creating a local "comment_line_string" that is either the empty
    string or the comment character followed by a space.
    
    For a single-byte comment, the maximum length of this string is 2 (plus
    a NUL byte). But to handle multi-byte comment characters, it can be
    arbitrarily large. One way to handle this is to just call
    xstrfmt("%s ", comment_line_str), and then free it when we're done.
    
    But we can simplify things further by just conditionally switching
    between our prefix string and an empty string when formatting. We
    couldn't just do that with the previous code, because the comment
    character was a single byte. There's no way to have a "%c" format switch
    between some character and "no character at all". Whereas with "%s" you
    can switch between some string and the empty string. So now that we have
    a comment string and not a comment char, we can just use it directly
    when formatting. Do note that we have to also conditionally add the
    trailing space at the same time.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    78275b0 View commit details
    Browse the repository at this point in the history
  15. environment: drop comment_line_char compatibility macro

    There is no longer any code which references the single-byte
    comment_line_char. Let's drop it, clearing the way for true multi-byte
    entries in comment_line_str.
    
    It's possible there are topics in flight that have added new references
    to comment_line_char. But we would prefer to fail compilation (and then
    fix it) upon merging with this, rather than have them quietly ignore the
    bytes after the first.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    103d563 View commit details
    Browse the repository at this point in the history
  16. config: allow multi-byte core.commentChar

    Now that all of the code handles multi-byte comment characters, it's
    safe to allow users to set them.
    
    There is one special case I kept: we still will not allow an empty
    string for the commentChar. While it might make sense in some contexts
    (e.g., output where you don't want any comment prefix), there are plenty
    where it will behave badly (e.g., all of our starts_with() checks will
    indicate that every line is a comment!). It might be reasonable to
    assign some meaningful semantics, but it would probably involve checking
    how each site behaves. In the interim let's forbid it and we can loosen
    things later.
    
    Likewise, the "commentChar cannot be a newline" rule is now extended to
    "it cannot contain a newline" (for the same reason: it can confuse our
    parsing loops).
    
    Since comment_line_str is used in many parts of the code, it's hard to
    cover all possibilities with tests. We can convert the existing
    double-semicolon prefix test to show that "git status" works. And we'll
    give it a more challenging case in t7507, where we confirm that
    git-commit strips out the commit template along with any --verbose text
    when reading the edited commit message back in. That covers the basics,
    though it's possible there could be issues in more exotic spots (e.g.,
    the sequencer todo list uses its own code).
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 12, 2024
    Configuration menu
    Copy the full SHA
    8b31147 View commit details
    Browse the repository at this point in the history

Commits on Mar 27, 2024

  1. config: add core.commentString

    The core.commentChar code recently learned to accept more than a
    single ASCII character. But using it is annoying with multiple versions
    of Git, since older ones will reject it outright:
    
        $ git.v2.44.0 -c core.commentchar=foo stripspace -s
        error: core.commentChar should only be one ASCII character
        fatal: unable to parse 'core.commentchar' from command-line config
    
    Let's add an alias core.commentString. That's arguably a better name
    anyway, since we now can handle strings, and it makes it possible to
    have a config that works reasonably with both old and new versions of
    Git (see the example in the documentation).
    
    This is strictly an alias, so there's not much point in adding duplicate
    tests; I added a single one to t0030 that exercises the alias code.
    
    Note also that the error messages for invalid values will now show the
    variable the config parser handed us, and thus will be normalized to
    lowercase (rather than camelcase). A few tests in t0030 are adjusted to
    match.
    
    Signed-off-by: Jeff King <[email protected]>
    Signed-off-by: Junio C Hamano <[email protected]>
    peff authored and gitster committed Mar 27, 2024
    Configuration menu
    Copy the full SHA
    9ccf3e9 View commit details
    Browse the repository at this point in the history