Skip to content

Commit 2ec225d

Browse files
peffgitster
authored andcommitted
find multi-byte comment chars in unterminated buffers
As with the previous patch, we need to swap out single-byte matching for something like starts_with() to match all bytes of a multi-byte comment character. But for cases where the buffer is not NUL-terminated (and we instead have an explicit size or end pointer), it's not safe to use starts_with(), as it might walk off the end of the buffer. Let's introduce a new starts_with_mem() that does the same thing but also accepts the length of the "haystack" str and makes sure not to walk past it. Note that in most cases the existing code did not need a length check at all, since it was written in a way that knew we had at least one byte available (and that was all we checked). So I had to read each one to find the appropriate bounds. The one exception is sequencer.c's add_commented_lines(), where we can actually get rid of the length check. Just like starts_with(), our starts_with_mem() handles an empty haystack variable by not matching (assuming a non-empty prefix). A few notes on the implementation of starts_with_mem(): - it would be equally correct to take an "end" pointer (and indeed, many of the callers have this and have to subtract to come up with the length). I think taking a ptr/size combo is a more usual interface for our codebase, though, and has the added benefit that the function signature makes it harder to mix up the three parameters. - we could obviously build starts_with() on top of this by passing strlen(str) as the length. But it's possible that starts_with() is a relatively hot code path, and it should not pay that penalty (it can generally return an answer proportional to the size of the prefix, not the whole string). - it naively feels like xstrncmpz() should be able to do the same thing, but that's not quite true. If you pass the length of the haystack buffer, then strncmp() finds that a shorter prefix string is "less than" than the haystack, even if the haystack starts with the prefix. If you pass the length of the prefix, then you risk reading past the end of the haystack if it is shorter than the prefix. So I think we really do need a new function. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 600559b commit 2ec225d

File tree

5 files changed

+18
-5
lines changed

5 files changed

+18
-5
lines changed

commit.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1796,7 +1796,8 @@ size_t ignored_log_message_bytes(const char *buf, size_t len)
17961796
else
17971797
next_line++;
17981798

1799-
if (buf[bol] == comment_line_char || buf[bol] == '\n') {
1799+
if (starts_with_mem(buf + bol, cutoff - bol, comment_line_str) ||
1800+
buf[bol] == '\n') {
18001801
/* is this the first of the run of comments? */
18011802
if (!boc)
18021803
boc = bol;

sequencer.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1840,7 +1840,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag)
18401840
static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
18411841
{
18421842
const char *s = str;
1843-
while (len > 0 && s[0] == comment_line_char) {
1843+
while (starts_with_mem(s, len, comment_line_str)) {
18441844
size_t count;
18451845
const char *n = memchr(s, '\n', len);
18461846
if (!n)
@@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
25622562
/* left-trim */
25632563
bol += strspn(bol, " \t");
25642564

2565-
if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
2565+
if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
25662566
item->command = TODO_COMMENT;
25672567
item->commit = NULL;
25682568
item->arg_offset = bol - buf;

strbuf.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,17 @@ int istarts_with(const char *str, const char *prefix)
2424
return 0;
2525
}
2626

27+
int starts_with_mem(const char *str, size_t len, const char *prefix)
28+
{
29+
const char *end = str + len;
30+
for (; ; str++, prefix++) {
31+
if (!*prefix)
32+
return 1;
33+
else if (str == end || *str != *prefix)
34+
return 0;
35+
}
36+
}
37+
2738
int skip_to_optional_arg_default(const char *str, const char *prefix,
2839
const char **arg, const char *def)
2940
{

strbuf.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -673,6 +673,7 @@ char *xstrfmt(const char *fmt, ...);
673673

674674
int starts_with(const char *str, const char *prefix);
675675
int istarts_with(const char *str, const char *prefix);
676+
int starts_with_mem(const char *str, size_t len, const char *prefix);
676677

677678
/*
678679
* If the string "str" is the same as the string in "prefix", then the "arg"

trailer.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
882882

883883
/* The first paragraph is the title and cannot be trailers */
884884
for (s = buf; s < buf + len; s = next_line(s)) {
885-
if (s[0] == comment_line_char)
885+
if (starts_with_mem(s, buf + len - s, comment_line_str))
886886
continue;
887887
if (is_blank_line(s))
888888
break;
@@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
902902
const char **p;
903903
ssize_t separator_pos;
904904

905-
if (bol[0] == comment_line_char) {
905+
if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
906906
non_trailer_lines += possible_continuation_lines;
907907
possible_continuation_lines = 0;
908908
continue;

0 commit comments

Comments
 (0)