-
Notifications
You must be signed in to change notification settings - Fork 26.4k
daemon: explicitly allow EINTR during poll() #2002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
5af3b11
to
a450bdb
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this), regarding a450bdb (outdated): This will need the following fixup on top:
---- >8 ----
diff --git a/daemon.c b/daemon.c
index 542e638223..f5f0c426c3 100644
--- a/daemon.c
+++ b/daemon.c
@@ -912,7 +912,7 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
add_child(&cld, addr, addrlen);
}
-static void child_handler(int signo)
+static void child_handler(int signo UNUSED)
{
/*
* Empty handler because systemcalls should get interrupted |
User |
@@ -250,6 +250,13 @@ char *gitdirname(char *); | |||
#define NAME_MAX 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Carlo Marcelo Arenas Belón via GitGitGadget"
<[email protected]> writes:
> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
>
> Systems without SA_RESTART where using custom CFLAGS instead of
> the standard header file.
This is not a sentence, isn't it? "where" -> "are"??? I dunno.
> Consolidate that, so it will be easier to use in a future commit.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> ---
> compat/posix.h | 7 +++++++
> config.mak.uname | 3 ---
> 2 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/compat/posix.h b/compat/posix.h
> index 067a00f33b83..2612a8515897 100644
> --- a/compat/posix.h
> +++ b/compat/posix.h
> @@ -250,6 +250,13 @@ char *gitdirname(char *);
> #define NAME_MAX 255
> #endif
>
> +/* On most systems <signal.h> would have given us this, but
> + * not on some systems (e.g. NonStop, QNX).
> + */
> +#ifndef SA_RESTART
> +#define SA_RESTART 0 /* disabled for sigaction() */
> +#endif
So not just on QNX and NonStop, we have SA_RESTART defined
everywhere. I do not know offhand what the ramifications of this
change is, but we seem to use the symbol in places outside #ifdef
meaning that anybody other than QNX and NonStop that lack SA_RESTART
wouldn't have been able to build Git before this change, and now
they would be. The resulting binary may not work for them at all
yet but that is not any worse than before.
> typedef uintmax_t timestamp_t;
> #define PRItime PRIuMAX
> #define parse_timestamp strtoumax
> diff --git a/config.mak.uname b/config.mak.uname
> index b1c5c4d5e8ed..52160ef5cb07 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -654,8 +654,6 @@ ifeq ($(uname_S),NONSTOP_KERNEL)
> FREAD_READS_DIRECTORIES = UnfortunatelyYes
>
> # Not detected (nor checked for) by './configure'.
> - # We don't have SA_RESTART on NonStop, unfortunalety.
> - COMPAT_CFLAGS += -DSA_RESTART=0
> # Apparently needed in compat/fnmatch/fnmatch.c.
> COMPAT_CFLAGS += -DHAVE_STRING_H=1
> NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
> @@ -782,7 +780,6 @@ ifeq ($(uname_S),MINGW)
> endif
> endif
> ifeq ($(uname_S),QNX)
> - COMPAT_CFLAGS += -DSA_RESTART=0
> EXPAT_NEEDS_XMLPARSE_H = YesPlease
> HAVE_STRINGS_H = YesPlease
> NEEDS_SOCKET = YesPlease
On the Git mailing list, Junio C Hamano wrote (reply to this), regarding a450bdb (outdated): "Carlo Marcelo Arenas Belón via GitGitGadget"
<[email protected]> writes:
> -static void child_handler(int signo UNUSED)
> +static void child_handler(int signo)
> {
> /*
> - * Otherwise empty handler because systemcalls will get interrupted
> - * upon signal receipt
> + * Empty handler because systemcalls should get interrupted
> + * upon signal receipt.
> */
> +#ifdef NO_SIGINTERRUPT
> + /* SysV needs the handler to be rearmed */
> + signal(signo, child_handler);
> +#endif
> }
On NO_SIGINTERRUPT systems, signo is UNUSED, isn't it?
Can we abstract this a bit better? #if/#else/#endif sprinkled
everywhere is simply an eyesore.
> static int set_reuse_addr(int sockfd)
> @@ -1118,8 +1122,10 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
>
> static int service_loop(struct socketlist *socklist)
> {
> - struct pollfd *pfd;
> +#ifndef NO_SIGINTERRUPT
> struct sigaction sa;
> +#endif
> + struct pollfd *pfd;
> CALLOC_ARRAY(pfd, socklist->nr);
>
> @@ -1128,14 +1134,22 @@ static int service_loop(struct socketlist *socklist)
> pfd[i].events = POLLIN;
> }
>
> +#ifdef NO_SIGINTERRUPT
> + signal(SIGCHLD, child_handler);
> +#else
> sigemptyset(&sa.sa_mask);
> sa.sa_flags = SA_NOCLDSTOP | SA_RESTART;
> sa.sa_handler = child_handler;
> sigaction(SIGCHLD, &sa, NULL);
> +#endif
>
> for (;;) {
> check_dead_children();
>
> +#ifndef NO_SIGINTERRUPT
> + sa.sa_flags &= ~SA_RESTART;
> + sigaction(SIGCHLD, &sa, NULL);
> +#endif
> if (poll(pfd, socklist->nr, -1) < 0) {
> if (errno != EINTR) {
> logerror("Poll failed, resuming: %s",
> @@ -1144,6 +1158,10 @@ static int service_loop(struct socketlist *socklist)
> }
> continue;
> }
> +#ifndef NO_SIGINTERRUPT
> + sa.sa_flags |= SA_RESTART;
> + sigaction(SIGCHLD, &sa, NULL);
> +#endif |
This patch series was integrated into seen via 8fdef1e. |
User |
9d68ce8
to
b737e03
Compare
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Phillip Wood wrote (reply to this): On 25/06/2025 08:35, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> This series addresses and ambiguity that is at least visible in OpenBSD,
> where zombie proceses would only be cleared after a new connection is
> received.
There is still a race where a child that exits after it has been checked in check_dead_children() but before we call poll() will not be collected until a new connection is received or a child exits while we're polling. If we used the self-pipe trick described on the select(2) man page [1] we would avoid that race and would not need to mess with SA_RESTART and so would not need to introduce USE_NON_POSIX_SIGNAL.
Best Wishes
Phillip
[1] https://www.man7.org/linux/man-pages/man2/select.2.html
> The underlying problem is that when this code was originally introduced,
> SA_RESTART was not widely implemented, and the signal() call usually
> implemented SysV like semantics, at least until it started being
> reimplemented by calling sigaction() internally.
> > Changes since v1
> > * Almost all references to siginterrupt has been removed and a better named
> variable used instead
> * Changes had been anstracted to minimize ifdefs and their introduction
> staged more naturally
> > Carlo Marcelo Arenas Belón (3):
> compat/posix.h: track SA_RESTART fallback
> daemon: use sigaction() to install child_handler()
> daemon: explicitly allow EINTR during poll()
> > Makefile | 6 +++++
> compat/mingw-posix.h | 1 -
> compat/posix.h | 8 +++++++
> config.mak.uname | 7 +++---
> configure.ac | 17 +++++++++++++++
> daemon.c | 52 +++++++++++++++++++++++++++++++++++++++-----
> meson.build | 4 ++++
> 7 files changed, 85 insertions(+), 10 deletions(-)
> > > base-commit: cb3b40381e1d5ee32dde96521ad7cfd68eb308a6
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-2002%2Fcarenas%2Fsiginterrupt-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-2002/carenas/siginterrupt-v2
> Pull-Request: https://github.com/git/git/pull/2002
> > Range-diff vs v1:
> > 1: 2b5a58e53ac ! 1: e82b7425bbc compat/posix.h: track SA_RESTART fallback
> @@ Metadata
> ## Commit message ##
> compat/posix.h: track SA_RESTART fallback
> > - Systems without SA_RESTART where using custom CFLAGS instead of
> - the standard header file.
> + Systems without SA_RESTART are using custom CFLAGS or headers
> + instead of the standard header file.
> > - Consolidate that, so it will be easier to use in a future commit.
> + Correct that, and invent a Makefile variable to track the
> + exceptions which will become handy in the next commits.
> > Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> > + ## Makefile ##
> +@@ Makefile: include shared.mak
> + # when attempting to read from an fopen'ed directory (or even to fopen
> + # it at all).
> + #
> ++# Define USE_NON_POSIX_SIGNAL if don't have support for SA_RESTART or
> ++# prefer to use ANSI C signal() over POSIX sigaction()
> ++#
> + # Define OPEN_RETURNS_EINTR if your open() system call may return EINTR
> + # when a signal is received (as opposed to restarting).
> + #
> +@@ Makefile: ifdef FREAD_READS_DIRECTORIES
> + COMPAT_CFLAGS += -DFREAD_READS_DIRECTORIES
> + COMPAT_OBJS += compat/fopen.o
> + endif
> ++ifdef USE_NON_POSIX_SIGNAL
> ++ COMPAT_CFLAGS += -DUSE_NON_POSIX_SIGNAL
> ++endif
> + ifdef OPEN_RETURNS_EINTR
> + COMPAT_CFLAGS += -DOPEN_RETURNS_EINTR
> + endif
> +
> + ## compat/mingw-posix.h ##
> +@@ compat/mingw-posix.h: struct sigaction {
> + sig_handler_t sa_handler;
> + unsigned sa_flags;
> + };
> +-#define SA_RESTART 0
> +
> + struct itimerval {
> + struct timeval it_value, it_interval;
> +
> ## compat/posix.h ##
> @@ compat/posix.h: char *gitdirname(char *);
> #define NAME_MAX 255
> #endif
> > -+/* On most systems <signal.h> would have given us this, but
> ++/*
> ++ * On most systems <signal.h> would have given us this, but
> + * not on some systems (e.g. NonStop, QNX).
> + */
> +#ifndef SA_RESTART
> -+#define SA_RESTART 0 /* disabled for sigaction() */
> ++# define SA_RESTART 0 /* disabled for sigaction() */
> +#endif
> +
> typedef uintmax_t timestamp_t;
> @@ compat/posix.h: char *gitdirname(char *);
> #define parse_timestamp strtoumax
> > ## config.mak.uname ##
> +@@ config.mak.uname: ifeq ($(uname_S),Windows)
> + NO_STRTOUMAX = YesPlease
> + NO_MKDTEMP = YesPlease
> + NO_INTTYPES_H = YesPlease
> ++ USE_NON_POSIX_SIGNAL = YesPlease
> + CSPRNG_METHOD = rtlgenrandom
> + # VS2015 with UCRT claims that snprintf and friends are C99 compliant,
> + # so we don't need this:
> @@ config.mak.uname: ifeq ($(uname_S),NONSTOP_KERNEL)
> FREAD_READS_DIRECTORIES = UnfortunatelyYes
> > @@ config.mak.uname: ifeq ($(uname_S),NONSTOP_KERNEL)
> # Apparently needed in compat/fnmatch/fnmatch.c.
> COMPAT_CFLAGS += -DHAVE_STRING_H=1
> NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
> +@@ config.mak.uname: ifeq ($(uname_S),NONSTOP_KERNEL)
> + NO_MMAP = YesPlease
> + NO_POLL = YesPlease
> + NO_INTPTR_T = UnfortunatelyYes
> ++ USE_NON_POSIX_SIGNAL = UnfortunatelyYes
> + CSPRNG_METHOD = openssl
> + SANE_TOOL_PATH = /usr/coreutils/bin:/usr/local/bin
> + SHELL_PATH = /usr/coreutils/bin/bash
> +@@ config.mak.uname: ifeq ($(uname_S),MINGW)
> + NEEDS_LIBICONV = YesPlease
> + NO_STRTOUMAX = YesPlease
> + NO_MKDTEMP = YesPlease
> ++ USE_NON_POSIX_SIGNAL = YesPlease
> + NO_SVN_TESTS = YesPlease
> +
> + # The builtin FSMonitor requires Named Pipes and Threads on Windows.
> @@ config.mak.uname: ifeq ($(uname_S),MINGW)
> endif
> endif
> @@ config.mak.uname: ifeq ($(uname_S),MINGW)
> EXPAT_NEEDS_XMLPARSE_H = YesPlease
> HAVE_STRINGS_H = YesPlease
> NEEDS_SOCKET = YesPlease
> +@@ config.mak.uname: ifeq ($(uname_S),QNX)
> + NO_PTHREADS = YesPlease
> + NO_STRCASESTR = YesPlease
> + NO_STRLCPY = YesPlease
> ++ USE_NON_POSIX_SIGNAL = UnfortunatelyYes
> + endif
> +
> + ## configure.ac ##
> +@@ configure.ac: fi
> + GIT_CONF_SUBST([ICONV_OMITS_BOM])
> + fi
> +
> ++# Define USE_NON_POSIX_SIGNAL if don't have support for SA_RESTART or
> ++# prefer using ANSI C signal() over POSIX sigaction()
> ++
> ++AC_CACHE_CHECK([whether SA_RESTART is supported], [ac_cv_siginterrupt], [
> ++ AC_COMPILE_IFELSE(
> ++ [AC_LANG_PROGRAM([#include <signal.h>], [[
> ++ #ifdef SA_RESTART
> ++ #endif
> ++ siginterrupt(SIGCHLD, 1)
> ++ ]])],[ac_cv_siginterrupt=yes],[
> ++ ac_cv_siginterrupt=no
> ++ USE_NON_POSIX_SIGNAL=UnfortunatelyYes
> ++ ]
> ++ )
> ++])
> ++GIT_CONF_SUBST([USE_NON_POSIX_SIGNAL])
> ++
> + ## Checks for typedefs, structures, and compiler characteristics.
> + AC_MSG_NOTICE([CHECKS for typedefs, structures, and compiler characteristics])
> + #
> +
> + ## meson.build ##
> +@@ meson.build: else
> + build_options_config.set('NO_EXPAT', '1')
> + endif
> +
> ++if compiler.get_define('SA_RESTART', prefix: '#include <signal.h>') == ''
> ++ libgit_c_args += '-DUSE_NON_POSIX_SIGNAL'
> ++endif
> ++
> + if not compiler.has_header('sys/select.h')
> + libgit_c_args += '-DNO_SYS_SELECT_H'
> + endif
> 2: 2e8c4643a60 ! 2: 05d945aa1e5 daemon: use sigaction() to install child_handler()
> @@ Commit message
> In a future change, the flags used for processing SIGCHLD will need to
> be updated, which is only possible by using sigaction().
> > - Replace the call, which hs the added benefit of using BSD semantics
> - reliably and therefore not needing the rearming call.
> + Factor out the call to set the signal handler and use sigaction instead
> + of signal for the systems that allow that, which has the added benefit
> + of using BSD semantics reliably and therefore not needing the rearming
> + call.
> > Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> > ## daemon.c ##
> -@@ daemon.c: static void child_handler(int signo UNUSED)
> +@@ daemon.c: static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
> + add_child(&cld, addr, addrlen);
> + }
> +
> +-static void child_handler(int signo UNUSED)
> ++static void child_handler(int signo MAYBE_UNUSED)
> + {
> /*
> - * Otherwise empty handler because systemcalls will get interrupted
> - * upon signal receipt
> +- * Otherwise empty handler because systemcalls will get interrupted
> +- * upon signal receipt
> - * SysV needs the handler to be rearmed
> ++ * Otherwise empty handler because systemcalls should get interrupted
> ++ * upon signal receipt.
> */
> - signal(SIGCHLD, child_handler);
> ++#ifdef USE_NON_POSIX_SIGNAL
> ++ /*
> ++ * SysV needs the handler to be rearmed, but this is known
> ++ * to trigger infinite recursion crashes at least in AIX.
> ++ */
> ++ signal(signo, child_handler);
> ++#endif
> }
> > static int set_reuse_addr(int sockfd)
> @@ daemon.c: static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> + }
> + }
> +
> ++#ifndef USE_NON_POSIX_SIGNAL
> ++
> ++static void set_signal_handler(struct sigaction *psa)
> ++{
> ++ sigemptyset(&psa->sa_mask);
> ++ psa->sa_flags = SA_NOCLDSTOP | SA_RESTART;
> ++ psa->sa_handler = child_handler;
> ++ sigaction(SIGCHLD, psa, NULL);
> ++}
> ++
> ++#else
> ++
> ++static void set_signal_handler(struct sigaction *psa UNUSED)
> ++{
> ++ signal(SIGCHLD, child_handler);
> ++}
> ++
> static int service_loop(struct socketlist *socklist)
> {
> - struct pollfd *pfd;
> + struct sigaction sa;
> + struct pollfd *pfd;
> > CALLOC_ARRAY(pfd, socklist->nr);
> -
> @@ daemon.c: static int service_loop(struct socketlist *socklist)
> pfd[i].events = POLLIN;
> }
> > - signal(SIGCHLD, child_handler);
> -+ sigemptyset(&sa.sa_mask);
> -+ sa.sa_flags = SA_NOCLDSTOP | SA_RESTART;
> -+ sa.sa_handler = child_handler;
> -+ sigaction(SIGCHLD, &sa, NULL);
> ++ set_signal_handler(&sa);
> > for (;;) {
> check_dead_children();
> 3: a450bdb0066 ! 3: b737e0389df daemon: explicitly allow EINTR during poll()
> @@ Commit message
> might not return with -1 and set errno to EINTR when a signal is
> received.
> > - Since the logic to reap zombie childs relies om those interruptions
> + Since the logic to reap zombie childs relies on those interruptions
> make sure to explicitly disable SA_RESTART around this function.
> > - Add a Makefile flag for portability to systems that don't have the
> - functionality to change those flags or where it is not needed.
> -
> Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> > - ## Makefile ##
> -@@ Makefile: include shared.mak
> - # Define NO_PREAD if you have a problem with pread() system call (e.g.
> - # cygwin1.dll before v1.5.22).
> - #
> -+# Define NO_SIGINTERRUPT if you don't have siginterrupt() or SA_RESTART
> -+# or if your signal(SIGCHLD) implementation doesn't set SA_RESTART.
> -+#
> - # Define NO_SETITIMER if you don't have setitimer()
> - #
> - # Define NO_STRUCT_ITIMERVAL if you don't have struct itimerval
> -@@ Makefile: ifdef NO_PREAD
> - COMPAT_CFLAGS += -DNO_PREAD
> - COMPAT_OBJS += compat/pread.o
> - endif
> -+ifdef NO_SIGINTERRUPT
> -+ COMPAT_CFLAGS += -DNO_SIGINTERRUPT
> -+endif
> - ifdef NO_FAST_WORKING_DIRECTORY
> - BASIC_CFLAGS += -DNO_FAST_WORKING_DIRECTORY
> - endif
> -
> - ## config.mak.uname ##
> -@@ config.mak.uname: ifeq ($(uname_S),Windows)
> - NO_STRTOUMAX = YesPlease
> - NO_MKDTEMP = YesPlease
> - NO_INTTYPES_H = YesPlease
> -+ NO_SIGINTERRUPT = YesPlease
> - CSPRNG_METHOD = rtlgenrandom
> - # VS2015 with UCRT claims that snprintf and friends are C99 compliant,
> - # so we don't need this:
> -@@ config.mak.uname: ifeq ($(uname_S),NONSTOP_KERNEL)
> - NO_PREAD = YesPlease
> - NO_MMAP = YesPlease
> - NO_POLL = YesPlease
> -+ NO_SIGINTERRUPT = UnfortunatelyYes
> - NO_INTPTR_T = UnfortunatelyYes
> - CSPRNG_METHOD = openssl
> - SANE_TOOL_PATH = /usr/coreutils/bin:/usr/local/bin
> -@@ config.mak.uname: ifeq ($(uname_S),MINGW)
> - NEEDS_LIBICONV = YesPlease
> - NO_STRTOUMAX = YesPlease
> - NO_MKDTEMP = YesPlease
> -+ NO_SIGINTERRUPT = YesPlease
> - NO_SVN_TESTS = YesPlease
> -
> - # The builtin FSMonitor requires Named Pipes and Threads on Windows.
> -@@ config.mak.uname: ifeq ($(uname_S),QNX)
> - NO_PTHREADS = YesPlease
> - NO_STRCASESTR = YesPlease
> - NO_STRLCPY = YesPlease
> -+ NO_SIGINTERRUPT = UnfortunatelyYes
> - endif
> -
> - ## configure.ac ##
> -@@ configure.ac: GIT_CHECK_FUNC(getdelim,
> - [HAVE_GETDELIM=])
> - GIT_CONF_SUBST([HAVE_GETDELIM])
> - #
> -+# Define NO_SIGINTERRUPT if you don't have siginterrupt.
> -+GIT_CHECK_FUNC(siginterrupt,
> -+[NO_SIGINTERRUPT=],
> -+[NO_SIGINTERRUPT=YesPlease])
> -+GIT_CONF_SUBST([NO_SIGINTERRUPT])
> - #
> - # Define NO_MMAP if you want to avoid mmap.
> - #
> -
> ## daemon.c ##
> -@@ daemon.c: static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
> - add_child(&cld, addr, addrlen);
> +@@ daemon.c: static void set_signal_handler(struct sigaction *psa)
> + sigaction(SIGCHLD, psa, NULL);
> }
> > --static void child_handler(int signo UNUSED)
> -+static void child_handler(int signo)
> - {
> - /*
> -- * Otherwise empty handler because systemcalls will get interrupted
> -- * upon signal receipt
> -+ * Empty handler because systemcalls should get interrupted
> -+ * upon signal receipt.
> - */
> -+#ifdef NO_SIGINTERRUPT
> -+ /* SysV needs the handler to be rearmed */
> -+ signal(signo, child_handler);
> -+#endif
> ++static void set_sa_restart(struct sigaction *psa, int enable)
> ++{
> ++ if (enable)
> ++ psa->sa_flags |= SA_RESTART;
> ++ else
> ++ psa->sa_flags &= ~SA_RESTART;
> ++ sigaction(SIGCHLD, psa, NULL);
> ++}
> ++
> + #else
> +
> + static void set_signal_handler(struct sigaction *psa UNUSED)
> +@@ daemon.c: static void set_signal_handler(struct sigaction *psa UNUSED)
> + signal(SIGCHLD, child_handler);
> }
> > - static int set_reuse_addr(int sockfd)
> -@@ daemon.c: static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> -
> ++static void set_sa_restart(struct sigaction *psa UNUSED, int enable UNUSED)
> ++{
> ++}
> ++
> ++#endif
> ++
> static int service_loop(struct socketlist *socklist)
> {
> -- struct pollfd *pfd;
> -+#ifndef NO_SIGINTERRUPT
> struct sigaction sa;
> -+#endif
> -+ struct pollfd *pfd;
> -
> - CALLOC_ARRAY(pfd, socklist->nr);
> -
> @@ daemon.c: static int service_loop(struct socketlist *socklist)
> - pfd[i].events = POLLIN;
> - }
> -
> -+#ifdef NO_SIGINTERRUPT
> -+ signal(SIGCHLD, child_handler);
> -+#else
> - sigemptyset(&sa.sa_mask);
> - sa.sa_flags = SA_NOCLDSTOP | SA_RESTART;
> - sa.sa_handler = child_handler;
> - sigaction(SIGCHLD, &sa, NULL);
> -+#endif
> -
> for (;;) {
> check_dead_children();
> > -+#ifndef NO_SIGINTERRUPT
> -+ sa.sa_flags &= ~SA_RESTART;
> -+ sigaction(SIGCHLD, &sa, NULL);
> -+#endif
> ++ set_sa_restart(&sa, 0);
> if (poll(pfd, socklist->nr, -1) < 0) {
> if (errno != EINTR) {
> logerror("Poll failed, resuming: %s",
> @@ daemon.c: static int service_loop(struct socketlist *socklist)
> }
> continue;
> }
> -+#ifndef NO_SIGINTERRUPT
> -+ sa.sa_flags |= SA_RESTART;
> -+ sigaction(SIGCHLD, &sa, NULL);
> -+#endif
> ++ set_sa_restart(&sa, 1);
> > for (size_t i = 0; i < socklist->nr; i++) {
> if (pfd[i].revents & POLLIN) {
> -
> - ## meson.build ##
> -@@ meson.build: checkfuncs = {
> - 'setenv' : ['setenv.c'],
> - 'mkdtemp' : ['mkdtemp.c'],
> - 'initgroups' : [],
> -+ 'siginterrupt' : [],
> - 'strtoumax' : ['strtoumax.c', 'strtoimax.c'],
> - 'pread' : ['pread.c'],
> - }
> |
User |
daemon.c
Outdated
@@ -912,14 +912,19 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen) | |||
add_child(&cld, addr, addrlen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Carlo Marcelo Arenas Belón via GitGitGadget"
<[email protected]> writes:
> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
>
> In a future change, the flags used for processing SIGCHLD will need to
> be updated, which is only possible by using sigaction().
>
> Factor out the call to set the signal handler and use sigaction instead
> of signal for the systems that allow that, which has the added benefit
> of using BSD semantics reliably and therefore not needing the rearming
> call.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> ---
> daemon.c | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> diff --git a/daemon.c b/daemon.c
> index d1be61fd5789..8133bd902157 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -912,14 +912,19 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
> add_child(&cld, addr, addrlen);
> }
>
> -static void child_handler(int signo UNUSED)
> +static void child_handler(int signo MAYBE_UNUSED)
> {
> /*
> - * Otherwise empty handler because systemcalls will get interrupted
> - * upon signal receipt
> - * SysV needs the handler to be rearmed
> + * Otherwise empty handler because systemcalls should get interrupted
> + * upon signal receipt.
> */
> - signal(SIGCHLD, child_handler);
> +#ifdef USE_NON_POSIX_SIGNAL
> + /*
> + * SysV needs the handler to be rearmed, but this is known
> + * to trigger infinite recursion crashes at least in AIX.
> + */
> + signal(signo, child_handler);
> +#endif
> }
>
> static int set_reuse_addr(int sockfd)
> @@ -1118,8 +1123,26 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> }
> }
>
> +#ifndef USE_NON_POSIX_SIGNAL
Does this #ifndef block with #else lack its #endif probably before
the service_loop() is defined ...
> +static void set_signal_handler(struct sigaction *psa)
> +{
> + sigemptyset(&psa->sa_mask);
> + psa->sa_flags = SA_NOCLDSTOP | SA_RESTART;
> + psa->sa_handler = child_handler;
> + sigaction(SIGCHLD, psa, NULL);
> +}
> +
> +#else
> +
> +static void set_signal_handler(struct sigaction *psa UNUSED)
> +{
> + signal(SIGCHLD, child_handler);
> +}
... around here?
I am wondering if it would make it even cleaner to also define
rearm_signal_handler() in this "if sigaction() can be used, do this,
otherwise do that" block, and move the whole thing a bit earlier in
this file. That way, the primary code paths do not have to see much
of the #ifdef conditionals. With IPV6 related #ifdef noise already
contaminating the file, it may not be a huge deal, but these crufts
tend to build up unless we tightly control them with discipline.
> static int service_loop(struct socketlist *socklist)
> {
> + struct sigaction sa;
> struct pollfd *pfd;
>
> CALLOC_ARRAY(pfd, socklist->nr);
> @@ -1129,7 +1152,7 @@ static int service_loop(struct socketlist *socklist)
> pfd[i].events = POLLIN;
> }
>
> - signal(SIGCHLD, child_handler);
> + set_signal_handler(&sa);
>
> for (;;) {
> check_dead_children();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Junio C Hamano <[email protected]> writes:
> Does this #ifndef block with #else lack its #endif probably before
> the service_loop() is defined ...
> ...
> ... around here?
>
> I am wondering if it would make it even cleaner to also define
> rearm_signal_handler() in this "if sigaction() can be used, do this,
> otherwise do that" block, and move the whole thing a bit earlier in
> this file. That way, the primary code paths do not have to see much
> of the #ifdef conditionals. With IPV6 related #ifdef noise already
> contaminating the file, it may not be a huge deal, but these crufts
> tend to build up unless we tightly control them with discipline.
Applying this on top of your series would illustrate what I meant.
I didn't spend enough braincycles on the naming issues for the
conditional compilation so I left it as USE_NON_POSIX_SIGNAL even
though I know it has to be fixed before we move forward.
Thanks.
daemon.c | 81 +++++++++++++++++++++++++++++++++-------------------------------
1 file changed, 42 insertions(+), 39 deletions(-)
diff --git c/daemon.c w/daemon.c
index 01337fcfed..8a371518b8 100644
--- c/daemon.c
+++ w/daemon.c
@@ -912,19 +912,54 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
add_child(&cld, addr, addrlen);
}
-static void child_handler(int signo MAYBE_UNUSED)
+#ifndef USE_NON_POSIX_SIGNAL
+
+static void set_signal_handler(struct sigaction *psa, void (*child_handler)(int))
+{
+ sigemptyset(&psa->sa_mask);
+ psa->sa_flags = SA_NOCLDSTOP | SA_RESTART;
+ psa->sa_handler = child_handler;
+ sigaction(SIGCHLD, psa, NULL);
+}
+
+static void rearm_signal_handler(int signo UNUSED, void (*child_handler)(int) UNUSED)
+{
+}
+
+static void set_sa_restart(struct sigaction *psa, int enable)
+{
+ if (enable)
+ psa->sa_flags |= SA_RESTART;
+ else
+ psa->sa_flags &= ~SA_RESTART;
+ sigaction(SIGCHLD, psa, NULL);
+}
+
+#else
+
+static void set_signal_handler(struct sigaction *psa UNUSED, void (*child_handler)(int))
+{
+ signal(SIGCHLD, child_handler);
+}
+
+static void rearm_signal_handler(int signo, void (*child_handler)(int))
{
- /*
- * Otherwise empty handler because systemcalls should get interrupted
- * upon signal receipt.
- */
-#ifdef USE_NON_POSIX_SIGNAL
/*
* SysV needs the handler to be rearmed, but this is known
* to trigger infinite recursion crashes at least in AIX.
*/
signal(signo, child_handler);
+}
+
+static void set_sa_restart(struct sigaction *psa UNUSED, int enable UNUSED)
+{
+}
+
#endif
+
+static void child_handler(int signo)
+{
+ rearm_signal_handler(signo, child_handler);
}
static int set_reuse_addr(int sockfd)
@@ -1123,38 +1158,6 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
}
}
-#ifndef USE_NON_POSIX_SIGNAL
-
-static void set_signal_handler(struct sigaction *psa)
-{
- sigemptyset(&psa->sa_mask);
- psa->sa_flags = SA_NOCLDSTOP | SA_RESTART;
- psa->sa_handler = child_handler;
- sigaction(SIGCHLD, psa, NULL);
-}
-
-static void set_sa_restart(struct sigaction *psa, int enable)
-{
- if (enable)
- psa->sa_flags |= SA_RESTART;
- else
- psa->sa_flags &= ~SA_RESTART;
- sigaction(SIGCHLD, psa, NULL);
-}
-
-#else
-
-static void set_signal_handler(struct sigaction *psa UNUSED)
-{
- signal(SIGCHLD, child_handler);
-}
-
-static void set_sa_restart(struct sigaction *psa UNUSED, int enable UNUSED)
-{
-}
-
-#endif
-
static int service_loop(struct socketlist *socklist)
{
struct sigaction sa;
@@ -1167,7 +1170,7 @@ static int service_loop(struct socketlist *socklist)
pfd[i].events = POLLIN;
}
- set_signal_handler(&sa);
+ set_signal_handler(&sa, child_handler);
for (;;) {
check_dead_children();
On the Git mailing list, Junio C Hamano wrote (reply to this): "Carlo Marcelo Arenas Belón via GitGitGadget"
<[email protected]> writes:
> +@@ Makefile: include shared.mak
> + # when attempting to read from an fopen'ed directory (or even to fopen
> + # it at all).
> + #
> ++# Define USE_NON_POSIX_SIGNAL if don't have support for SA_RESTART or
> ++# prefer to use ANSI C signal() over POSIX sigaction()
> ++#
> ...
> ++ifdef USE_NON_POSIX_SIGNAL
> ++ COMPAT_CFLAGS += -DUSE_NON_POSIX_SIGNAL
> ++endif
The new symbol sounds like "POSIX does not have signal(2) but on
this platform we have a usable signal(2), so we use it here", but I
do not think that it is what we want to say (as POSIX inherits this
from ANSI C anyway). More importantly, this "USE_X" sounds as if we
allow builders to set it and magically we stop using sigaction(2),
which is not what is going on. We have tons of calls to both
signal(2) and sigaction(2), and we turn calls to signal(2) we have
in daemon.c to sigaction(2) but on some platforms their sigaction(2)
cannot do what we ask it to do, so we are stuck with signal(2) on
these platforms only for these calls in daemon.c. It may be obvious
to those who develop and review this series, but not for anybody else.
Isn't the situation more like:
We use sigaction(2) everywhere and have been happy with it in
our code, but this topic discovered that on some platforms,
their sigaction(2) does not do XYZ that everybody else's
sigaction(2) does, so on them we need to fall back on the plain
old signal(2) on selected code paths that we need XYZ out of the
signal handling interface.
What is this XYZ that describes the characteristics of
signal/sigaction implementation on these platforms? A name
constructed more like SIGACTION_LACKS_XYZ (hence we have to resort
to signal), possibly with a more appropriate verb than "lack", would
be less confusing.
I think the topic is moving in the right direction with cleaner code
than the previous round. Thanks for investing time in it. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Phillip Wood <[email protected]> writes:
> On 25/06/2025 08:35, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
>> This series addresses and ambiguity that is at least visible in OpenBSD,
>> where zombie proceses would only be cleared after a new connection is
>> received.
>
> There is still a race where a child that exits after it has been
> checked in check_dead_children() but before we call poll() will not be
> collected until a new connection is received or a child exits while
> we're polling. If we used the self-pipe trick described on the
> select(2) man page [1] we would avoid that race and would not need to
> mess with SA_RESTART and so would not need to introduce
> USE_NON_POSIX_SIGNAL.
>
> Best Wishes
>
> Phillip
>
> [1] https://www.man7.org/linux/man-pages/man2/select.2.html
The principle should apply equally to poll-based service loop, I
presume.
Thanks. |
On the Git mailing list, Phillip Wood wrote (reply to this): On 25/06/2025 17:24, Junio C Hamano wrote:
> Phillip Wood <[email protected]> writes:
> >> On 25/06/2025 08:35, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
>>> This series addresses and ambiguity that is at least visible in OpenBSD,
>>> where zombie proceses would only be cleared after a new connection is
>>> received.
>>
>> There is still a race where a child that exits after it has been
>> checked in check_dead_children() but before we call poll() will not be
>> collected until a new connection is received or a child exits while
>> we're polling. If we used the self-pipe trick described on the
>> select(2) man page [1] we would avoid that race and would not need to
>> mess with SA_RESTART and so would not need to introduce
>> USE_NON_POSIX_SIGNAL.
>>
>> Best Wishes
>>
>> Phillip
>>
>> [1] https://www.man7.org/linux/man-pages/man2/select.2.html
> > The principle should apply equally to poll-based service loop, I
> presume.
Yes, you create a pipe, add the read end to the set of file descriptors monitored by poll() and write to the other end of the pipe when a signal is received.
Thanks
Phillip
> Thanks.
|
This patch series was integrated into seen via e56a707. |
@@ -37,6 +37,9 @@ include shared.mak | |||
# when attempting to read from an fopen'ed directory (or even to fopen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Carlo Marcelo Arenas Belón via GitGitGadget"
<[email protected]> writes:
> +# Define USE_NON_POSIX_SIGNAL if don't have support for SA_RESTART or
> +# prefer to use ANSI C signal() over POSIX sigaction()
> +#
I may or may not have mentioned this, but this is not helpful enough
for folks, as there is no clue for users to decide if they "prefer".
We need to state how they need to decide between the use of
"signal()" and "sigaction()" in the affected codepath, especially
when they have both.
If their system lacks sigaction() at all, the decision may be
trivial, but the conditional compilation you are trying to achieve
in daemon.c is _not_ like "my system has both, so I can use either
one and the resulting binary works just fine". Rather, "Even though
my platform has both, the sigaction() my system has is not quite
right in such and such way and I have to use signal(), unlike
everybody else, unfortunately".
It may cause us to rethink the name USE_NON_POSIX_SIGNAL; I though
I've already discussed this point in my response to the cover
letter.
Thanks.
Systems without SA_RESTART are using custom CFLAGS or headers instead of the standard header file. Correct that, and invent a Makefile variable to track the exceptions which will become handy in the next commits. Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
A future change will start using sigaction to setup a SIGCHLD signal handler. The current code uses signal() which returns SIG_ERR (but doesn't seem to set errno) so instruct sigaction() to do the same. Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
In a future change, the flags used for processing SIGCHLD will need to be updated, which is only possible by using sigaction(). Replace signal() with an equivalent invocation of sigaction(), which has the added benefit of using BSD semantics reliably and therefore not needing the rearming call in the signal handler. Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
If the setup for the SIGCHLD signal handler sets SA_RESTART, poll() might not return with -1 and set errno to EINTR when a signal is received. Since the logic to reap zombie childs relies on those interruptions make sure to explicitly disable SA_RESTART around this function. Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this): On Wed, Jun 25, 2025 at 09:07:11AM -0800, Junio C Hamano wrote:
> "Carlo Marcelo Arenas Belón via GitGitGadget"
> <[email protected]> writes:
>
> > +@@ Makefile: include shared.mak
> > + # when attempting to read from an fopen'ed directory (or even to fopen
> > + # it at all).
> > + #
> > ++# Define USE_NON_POSIX_SIGNAL if don't have support for SA_RESTART or
> > ++# prefer to use ANSI C signal() over POSIX sigaction()
> > ++#
> > ...
> > ++ifdef USE_NON_POSIX_SIGNAL
> > ++ COMPAT_CFLAGS += -DUSE_NON_POSIX_SIGNAL
> > ++endif
>
> The new symbol sounds like "POSIX does not have signal(2) but on
> this platform we have a usable signal(2), so we use it here", but I
> do not think that it is what we want to say (as POSIX inherits this
> from ANSI C anyway). More importantly, this "USE_X" sounds as if we
> allow builders to set it and magically we stop using sigaction(2),
> which is not what is going on. We have tons of calls to both
> signal(2) and sigaction(2), and we turn calls to signal(2) we have
> in daemon.c to sigaction(2) but on some platforms their sigaction(2)
> cannot do what we ask it to do, so we are stuck with signal(2) on
> these platforms only for these calls in daemon.c. It may be obvious
> to those who develop and review this series, but not for anybody else.
>
> Isn't the situation more like:
>
> We use sigaction(2) everywhere and have been happy with it in
> our code, but this topic discovered that on some platforms,
> their sigaction(2) does not do XYZ that everybody else's
> sigaction(2) does, so on them we need to fall back on the plain
> old signal(2) on selected code paths that we need XYZ out of the
> signal handling interface.
>
> What is this XYZ that describes the characteristics of
> signal/sigaction implementation on these platforms? A name
> constructed more like SIGACTION_LACKS_XYZ (hence we have to resort
> to signal), possibly with a more appropriate verb than "lack", would
> be less confusing.
sigaction(2) doesn't have any issues and it is indeed a better option
every time as it behaves the same in all platforms that have it.
the problems we have come from our codebase:
1) we have a hack to workaround the lack of support for SA_RESTART in
some platforms, which sets it to 0 and allow us to compile as if it
works.
2) Windows doesn't have sigaction() and their compability version needs
updating if we would use it for this new code.
Keeping the fallback to use signal() isn't really needed, and is indeed
problematic, as it crashes in AIX and probably other SysIII derived
systems because of the hack Chris described for non BSD signal().
Carlo |
@@ -95,7 +95,7 @@ struct sigaction { | |||
sig_handler_t sa_handler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
> > A future change will start using sigaction to setup a SIGCHLD signal
> handler.
> > The current code uses signal() which returns SIG_ERR (but doesn't
> seem to set errno) so instruct sigaction() to do the same.
Why are we returning -1 below instead of SIG_ERR if we want the behavior to match?
Thanks
Phillip
> > Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> ---
> compat/mingw-posix.h | 1 +
> compat/mingw.c | 4 +++-
> 2 files changed, 4 insertions(+), 1 deletion(-)
> > diff --git a/compat/mingw-posix.h b/compat/mingw-posix.h
> index a0dca756d104..847d558c9b2d 100644
> --- a/compat/mingw-posix.h
> +++ b/compat/mingw-posix.h
> @@ -95,6 +95,7 @@ struct sigaction {
> sig_handler_t sa_handler;
> unsigned sa_flags;
> };
> +#define SA_NOCLDSTOP 1
> > struct itimerval {
> struct timeval it_value, it_interval;
> diff --git a/compat/mingw.c b/compat/mingw.c
> index 8a9972a1ca19..5d69ae32f4b9 100644
> --- a/compat/mingw.c
> +++ b/compat/mingw.c
> @@ -2561,7 +2561,9 @@ int setitimer(int type UNUSED, struct itimerval *in, struct itimerval *out)
> > int sigaction(int sig, struct sigaction *in, struct sigaction *out)
> {
> - if (sig != SIGALRM)
> + if (sig == SIGCHLD)
> + return -1;
> + else if (sig != SIGALRM)
> return errno = EINVAL,
> error("sigaction only implemented for SIGALRM");
> if (out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this):
On Thu, Jun 26, 2025 at 01:52:47PM -0800, Phillip Wood wrote:
> On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> > From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
> >
> > A future change will start using sigaction to setup a SIGCHLD signal
> > handler.
> >
> > The current code uses signal() which returns SIG_ERR (but doesn't
> > seem to set errno) so instruct sigaction() to do the same.
>
> Why are we returning -1 below instead of SIG_ERR if we want the behavior to
> match?
By "match", I mean that in both cases we will get an error return value
and errno won't be set to EINVAL (which is what POSIX requires)
In our codebase since we ignore the return code anyway, it wouldn't make
a difference, either way.
signal() returns a pointer, and sigaction() returns and int, so you can
have the later be literally SIG_ERR, eventhough it will be ironically
equivalent it casted into an int.
Csrlo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 26/06/2025 14:15, Carlo Marcelo Arenas Belón wrote:
> On Thu, Jun 26, 2025 at 01:52:47PM -0800, Phillip Wood wrote:
>> On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
>>> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
>>>
>>> A future change will start using sigaction to setup a SIGCHLD signal
>>> handler.
>>>
>>> The current code uses signal() which returns SIG_ERR (but doesn't
>>> seem to set errno) so instruct sigaction() to do the same.
>>
>> Why are we returning -1 below instead of SIG_ERR if we want the behavior to
>> match?
> > By "match", I mean that in both cases we will get an error return value
> and errno won't be set to EINVAL (which is what POSIX requires)
> > In our codebase since we ignore the return code anyway, it wouldn't make
> a difference, either way.
> > signal() returns a pointer, and sigaction() returns and int, Oh right, I'd forgotten they have different return types. I think we should probably be setting errno = EINVAL before returning -1 to match what this function does with other signals it does not support - just because our current callers ignore the return value doesn't mean that future callers will and they might want check errno if they see the function fail.
Thanks
Phillip
> so you can
> have the later be literally SIG_ERR, eventhough it will be ironically
> equivalent it casted into an int.
> > Csrlo
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this):
On Thu, Jun 26, 2025 at 02:56:22PM -0800, Phillip Wood wrote:
> On 26/06/2025 14:15, Carlo Marcelo Arenas Belón wrote:
> > On Thu, Jun 26, 2025 at 01:52:47PM -0800, Phillip Wood wrote:
> > > On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> > > > From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
> > > >
> > > > A future change will start using sigaction to setup a SIGCHLD signal
> > > > handler.
> > > >
> > > > The current code uses signal() which returns SIG_ERR (but doesn't
> > > > seem to set errno) so instruct sigaction() to do the same.
> > >
> > > Why are we returning -1 below instead of SIG_ERR if we want the behavior to
> > > match?
> >
> > By "match", I mean that in both cases we will get an error return value
> > and errno won't be set to EINVAL (which is what POSIX requires)
> >
> > In our codebase since we ignore the return code anyway, it wouldn't make
> > a difference, either way.
> >
> > signal() returns a pointer, and sigaction() returns and int,
>
> Oh right, I'd forgotten they have different return types. I think we should
> probably be setting errno = EINVAL before returning -1 to match what this
> function does with other signals it does not support - just because our
> current callers ignore the return value doesn't mean that future callers
> will and they might want check errno if they see the function fail.
I agree, and indeed had to triple check and change my implementation after I
confirmed that signal(SIGCHLD) does not change errno on Windows (not our
version, neither of the windows libc or mingw, even if it is documented[1] to
do so.
It might be because the signal number itself is bogus (there is none for
SIGCHLD in their headers, and git uses their own numbers in compat), but
either way, I would rather be consistent with signal() at least originally.
Carlo
[1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, [email protected] wrote (reply to this):
On 26/06/2025 15:58, Carlo Marcelo Arenas Belón wrote:
> On Thu, Jun 26, 2025 at 02:56:22PM -0800, Phillip Wood wrote:
>> On 26/06/2025 14:15, Carlo Marcelo Arenas Belón wrote:
>>> On Thu, Jun 26, 2025 at 01:52:47PM -0800, Phillip Wood wrote:
>>>> On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
>>>>> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
>>>>>
>>>>> A future change will start using sigaction to setup a SIGCHLD signal
>>>>> handler.
>>>>>
>>>>> The current code uses signal() which returns SIG_ERR (but doesn't
>>>>> seem to set errno) so instruct sigaction() to do the same.
>>>>
>>>> Why are we returning -1 below instead of SIG_ERR if we want the behavior to
>>>> match?
>>>
>>> By "match", I mean that in both cases we will get an error return value
>>> and errno won't be set to EINVAL (which is what POSIX requires)
>>>
>>> In our codebase since we ignore the return code anyway, it wouldn't make
>>> a difference, either way.
>>>
>>> signal() returns a pointer, and sigaction() returns and int,
>>
>> Oh right, I'd forgotten they have different return types. I think we should
>> probably be setting errno = EINVAL before returning -1 to match what this
>> function does with other signals it does not support - just because our
>> current callers ignore the return value doesn't mean that future callers
>> will and they might want check errno if they see the function fail.
> > I agree, and indeed had to triple check and change my implementation after I
> confirmed that signal(SIGCHLD) does not change errno on Windows (not our
> version, neither of the windows libc or mingw, even if it is documented[1] to
> do so.
> > It might be because the signal number itself is bogus (there is none for
> SIGCHLD in their headers, and git uses their own numbers in compat), but
> either way, I would rather be consistent with signal() at least originally.
I'm not sure I understand - don't we want the sigaction() wrapper to behave like sigaction() would?
Thanks
Phillip
> Carlo
> > [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this):
On Thu, Jun 26, 2025 at 04:19:11PM -0800, [email protected] wrote:
> On 26/06/2025 15:58, Carlo Marcelo Arenas Belón wrote:
> > On Thu, Jun 26, 2025 at 02:56:22PM -0800, Phillip Wood wrote:
> > > On 26/06/2025 14:15, Carlo Marcelo Arenas Belón wrote:
> > > > On Thu, Jun 26, 2025 at 01:52:47PM -0800, Phillip Wood wrote:
> > > > > On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> > > > > > From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
> > > > > >
> > > > > > A future change will start using sigaction to setup a SIGCHLD signal
> > > > > > handler.
> > > > > >
> > > > > > The current code uses signal() which returns SIG_ERR (but doesn't
> > > > > > seem to set errno) so instruct sigaction() to do the same.
> > > > >
> > > > > Why are we returning -1 below instead of SIG_ERR if we want the behavior to
> > > > > match?
> > > >
> > > > By "match", I mean that in both cases we will get an error return value
> > > > and errno won't be set to EINVAL (which is what POSIX requires)
> > > >
> > > > In our codebase since we ignore the return code anyway, it wouldn't make
> > > > a difference, either way.
> > > >
> > > > signal() returns a pointer, and sigaction() returns and int,
> > >
> > > Oh right, I'd forgotten they have different return types. I think we should
> > > probably be setting errno = EINVAL before returning -1 to match what this
> > > function does with other signals it does not support - just because our
> > > current callers ignore the return value doesn't mean that future callers
> > > will and they might want check errno if they see the function fail.
> >
> > I agree, and indeed had to triple check and change my implementation after I
> > confirmed that signal(SIGCHLD) does not change errno on Windows (not our
> > version, neither of the windows libc or mingw, even if it is documented[1] to
> > do so.
> >
> > It might be because the signal number itself is bogus (there is none for
> > SIGCHLD in their headers, and git uses their own numbers in compat), but
> > either way, I would rather be consistent with signal() at least originally.
>
> I'm not sure I understand - don't we want the sigaction() wrapper to behave
> like sigaction() would?
for at least the first iteration, I would rather have sigaction() behave
like signal(), so that the change doesn't introduce any regressions.
eventually, sigaction() should behave like any other sigaction(), but to
do so, I suspect the windows emulation might need to change their SIGCHLD
to match.
just confirmed with MSVC that if I use 20 instead of 17, errno gets updated
just like the documentation says it should.
Carlo
PS. Maybe we should get dscho involved?
> >
> > [1] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal
@@ -915,11 +915,9 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen) | |||
static void child_handler(int signo UNUSED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Carlo
On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
> > In a future change, the flags used for processing SIGCHLD will need to
> be updated, which is only possible by using sigaction().
Only because you have chosen to use SA_RESTART here. I think it would be better to drop that and instead say something like
POSIX leaves several aspects of the behavior of signal() up to the implementer. It is unspecified whether long running syscalls are restarted after an interrupt is received. The event loop in "git daemon" assumes that poll() will return with EINTR if a child exits while it is polling but if poll() is restarted after an interrupt is received that does not happen which can lead to a build up of uncollected zombie processes.
It is also unspecified whether the handler is reset after an interrupt is received. In order to work correctly on operating systems such as Solaris where the handler for SIGCLD is reset when a signal is received "git daemon" calls signal() from the SIGCLD handler to re-establish a handler for that signal. Unfortunately this causes infinite recursion on other operating systems such as AIX.
Both of these problems can be addressed by using sigaction() instead of signal() which has much stricter semantics and so, by setting the appropriate flags, we can rely on poll() being interrupted and the SIGCLD handler not being reset. This change means that all long running system calls could fail with EINTR not just pall() but rest of the event loop code is designed to gracefully handle that.
After this change there is still a race where a child that exits after it has been checked in check_dead_children() but before we call poll() will not be collected until a new connection is received or a child exits while we're polling. We could fix this by using the "self-pipe trick" but do not because ....
Then we can drop patches 1 and 4.
Best Wishes
Phillip
> > Replace signal() with an equivalent invocation of sigaction(), which
> has the added benefit of using BSD semantics reliably and therefore
> not needing the rearming call in the signal handler.
> > Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> ---
> daemon.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
> > diff --git a/daemon.c b/daemon.c
> index d1be61fd5789..155b2e180167 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -915,11 +915,9 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
> static void child_handler(int signo UNUSED)
> {
> /*
> - * Otherwise empty handler because systemcalls will get interrupted
> - * upon signal receipt
> - * SysV needs the handler to be rearmed
> + * Otherwise empty handler because systemcalls should get interrupted
> + * upon signal receipt.
> */
> - signal(SIGCHLD, child_handler);
> }
> > static int set_reuse_addr(int sockfd)
> @@ -1120,6 +1118,7 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> > static int service_loop(struct socketlist *socklist)
> {
> + struct sigaction sa;
> struct pollfd *pfd;
> > CALLOC_ARRAY(pfd, socklist->nr);
> @@ -1129,7 +1128,10 @@ static int service_loop(struct socketlist *socklist)
> pfd[i].events = POLLIN;
> }
> > - signal(SIGCHLD, child_handler);
> + sigemptyset(&sa.sa_mask);
> + sa.sa_flags = SA_NOCLDSTOP | SA_RESTART;
> + sa.sa_handler = child_handler;
> + sigaction(SIGCHLD, &sa, NULL);
> > for (;;) {
> check_dead_children();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Phillip Wood <[email protected]> writes:
> Only because you have chosen to use SA_RESTART here. I think it would
> be better to drop that and instead say something like
>
>
> POSIX leaves several aspects of the behavior of signal() up to the
> implementer. It is unspecified whether long running syscalls are
> restarted after an interrupt is received. The event loop in "git
> daemon" assumes that poll() will return with EINTR if a child exits
> while it is polling but if poll() is restarted after an interrupt is
> received that does not happen which can lead to a build up of
> uncollected zombie processes.
>
> It is also unspecified whether the handler is reset after an interrupt
> is received. In order to work correctly on operating systems such as
> Solaris where the handler for SIGCLD is reset when a signal is
> received "git daemon" calls signal() from the SIGCLD handler to
> re-establish a handler for that signal. Unfortunately this causes
> infinite recursion on other operating systems such as AIX.
>
> Both of these problems can be addressed by using sigaction() instead
> of signal() which has much stricter semantics and so, by setting the
> appropriate flags, we can rely on poll() being interrupted and the
> SIGCLD handler not being reset. This change means that all long
> running system calls could fail with EINTR not just pall() but rest of
> the event loop code is designed to gracefully handle that.
>
> After this change there is still a race where a child that exits after
> it has been checked in check_dead_children() but before we call poll()
> will not be collected until a new connection is received or a child
> exits while we're polling. We could fix this by using the "self-pipe
> trick" but do not because ....
>
>
> Then we can drop patches 1 and 4.
Thanks for a suggestion. I really like the "everybody in these code
paths are prepared to receive and handle EINTR just fine, so we do
not have to do the SA_RESTART" observation very much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this):
On Thu, Jun 26, 2025 at 08:33:29AM -0800, Junio C Hamano wrote:
> Phillip Wood <[email protected]> writes:
>
> > Only because you have chosen to use SA_RESTART here. I think it would
> > be better to drop that and instead say something like
> >
> >
> > POSIX leaves several aspects of the behavior of signal() up to the
> > implementer. It is unspecified whether long running syscalls are
> > restarted after an interrupt is received. The event loop in "git
> > daemon" assumes that poll() will return with EINTR if a child exits
> > while it is polling but if poll() is restarted after an interrupt is
> > received that does not happen which can lead to a build up of
> > uncollected zombie processes.
> >
> > It is also unspecified whether the handler is reset after an interrupt
> > is received. In order to work correctly on operating systems such as
> > Solaris where the handler for SIGCLD is reset when a signal is
> > received "git daemon" calls signal() from the SIGCLD handler to
> > re-establish a handler for that signal. Unfortunately this causes
> > infinite recursion on other operating systems such as AIX.
> >
> > Both of these problems can be addressed by using sigaction() instead
> > of signal() which has much stricter semantics and so, by setting the
> > appropriate flags, we can rely on poll() being interrupted and the
> > SIGCLD handler not being reset. This change means that all long
> > running system calls could fail with EINTR not just pall() but rest of
> > the event loop code is designed to gracefully handle that.
> >
> > After this change there is still a race where a child that exits after
> > it has been checked in check_dead_children() but before we call poll()
> > will not be collected until a new connection is received or a child
> > exits while we're polling. We could fix this by using the "self-pipe
> > trick" but do not because ....
> >
> >
> > Then we can drop patches 1 and 4.
>
> Thanks for a suggestion. I really like the "everybody in these code
> paths are prepared to receive and handle EINTR just fine, so we do
> not have to do the SA_RESTART" observation very much.
Except that is unlikely to be true, as the code has changed a lot on
those 20 years (including that it now uses run_command) and that we
had been effectively using SA_RESTART under the covers for most of
that time because signal() behaviour changed. An obvious bug:
(ex: [email protected])
I think using SA_RESTART by default might be safer, specially considering
that patch 4 already exist and it is not that complicated now that we
have 2.
Carlo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
Hi Carlo
On 26/06/2025 17:36, Carlo Marcelo Arenas Belón wrote:
> On Thu, Jun 26, 2025 at 08:33:29AM -0800, Junio C Hamano wrote:
>> Phillip Wood <[email protected]> writes:
>>
>> Thanks for a suggestion. I really like the "everybody in these code
>> paths are prepared to receive and handle EINTR just fine, so we do
>> not have to do the SA_RESTART" observation very much.
> > Except that is unlikely to be true, as the code has changed a lot on
> those 20 years (including that it now uses run_command) and that we
> had been effectively using SA_RESTART under the covers for most of
> that time because signal() behaviour changed. An obvious bug:
> (ex: [email protected])
I don't think the syscall handling has changed much since 695605b508 (git-daemon: Simplify dead-children reaping logic, 2008-08-14) when we started relying on poll() returning EINTR to collect children that exit while we are waiting for a new connection. In any case my comment was based on reading the code. You're right that we started using run_command() after 695605b508 but when I read the code in run-command.c it looked to me like it is pretty careful in the way it handles signals and EINTR - what part of the code are you concerned about? As git supports operating systems that do not provide SA_RESTART we should to make sure we handle EINTR correctly in this code anyway. As far as the patch you link to goes, it may not have been handling EINTR as efficiently as you'd like but it would still accept the client on the next iteration of the loop which would happen immediately because the connection would be pending when we call poll().
> I think using SA_RESTART by default might be safer,
The counter argument to this is that it is masking bugs that happen on platforms without SA_RESTART.
Best Wishes
Phillip
@@ -1118,8 +1116,28 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Phillip Wood wrote (reply to this):
On 26/06/2025 09:53, Carlo Marcelo Arenas Belón via GitGitGadget wrote:
> From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <[email protected]>
> > If the setup for the SIGCHLD signal handler sets SA_RESTART, poll()
> might not return with -1 and set errno to EINTR when a signal is
> received.
> > Since the logic to reap zombie childs relies on those interruptions
> make sure to explicitly disable SA_RESTART around this function.
Given "git daemon" seems to have been designed with the assumption that long running system calls can fail with EINTR I don't know why this series is trying to use SA_RESTART in the first place.
Thanks
Phillip
> Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> ---
> daemon.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
> > diff --git a/daemon.c b/daemon.c
> index 155b2e180167..7e29c03e313f 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -1116,6 +1116,25 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> }
> }
> > +#ifndef NO_RESTARTABLE_SIGNALS
> +
> +static void set_sa_restart(struct sigaction *psa, int enable)
> +{
> + if (enable)
> + psa->sa_flags |= SA_RESTART;
> + else
> + psa->sa_flags &= ~SA_RESTART;
> + sigaction(SIGCHLD, psa, NULL);
> +}
> +
> +#else
> +
> +static void set_sa_restart(struct sigaction *psa UNUSED, int enable UNUSED)
> +{
> +}
> +
> +#endif
> +
> static int service_loop(struct socketlist *socklist)
> {
> struct sigaction sa;
> @@ -1136,6 +1155,7 @@ static int service_loop(struct socketlist *socklist)
> for (;;) {
> check_dead_children();
> > + set_sa_restart(&sa, 0);
> if (poll(pfd, socklist->nr, -1) < 0) {
> if (errno != EINTR) {
> logerror("Poll failed, resuming: %s",
> @@ -1144,6 +1164,7 @@ static int service_loop(struct socketlist *socklist)
> }
> continue;
> }
> + set_sa_restart(&sa, 1);
> > for (size_t i = 0; i < socklist->nr; i++) {
> if (pfd[i].revents & POLLIN) {
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this): There has always been a small race condition between the time a
children status is checked, and the arrival of a SIGCHLD that
would alert us of their demise, hopefully interrupt poll().
To close the gap, add the reading side of a pipe to the poll and
use the signal handler to write to it for each child.
Suggested=by: Phillip Wood <[email protected]>
Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
---
Implements the "self pipe" trick Hillip suggested.
I tried to make self healing and optional, as I also presume the race
condition it is meant to address is very unlikely.
An obvious disadvantage (at least of this implementation), is that it
actually doubles the number of events that need to be handled for each
children process on most cases (ex: when `poll()` gets interrupted)
I suspect that if fixing that last race condition is so important with
the current foundation, it might be better to reintroduce some sort of
timeout to poll(), so that they will be cleared periodically.
I had a prototype (only the bare minimum) that I thought was more
efficient and that would instead remove completely the need for a
signal handler which I would post (only for RFC) later.
daemon.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 46 insertions(+), 9 deletions(-)
diff --git a/daemon.c b/daemon.c
index d1be61fd57..d3b9421575 100644
--- a/daemon.c
+++ b/daemon.c
@@ -912,14 +912,17 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
add_child(&cld, addr, addrlen);
}
-static void child_handler(int signo UNUSED)
+int poll_pipe[2] = { -1, -1 };
+
+static void child_handler(int signo)
{
/*
- * Otherwise empty handler because systemcalls will get interrupted
- * upon signal receipt
* SysV needs the handler to be rearmed
*/
signal(SIGCHLD, child_handler);
+
+ if (poll_pipe[1] >= 0)
+ write(poll_pipe[1], &signo, 1);
}
static int set_reuse_addr(int sockfd)
@@ -1121,20 +1124,43 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
static int service_loop(struct socketlist *socklist)
{
struct pollfd *pfd;
+ unsigned long nfds = 1 + socklist->nr;
+
+ ALLOC_ARRAY(pfd, nfds);
+ if (!pipe(poll_pipe)) {
+ for (int i = 0; i < 2; i++) {
+ int flags;
+
+ flags = fcntl(poll_pipe[i], F_GETFD, 0);
+ if (flags >= 0)
+ fcntl(poll_pipe[i], F_SETFD, flags | FD_CLOEXEC);
+
+ flags = fcntl(poll_pipe[i], F_GETFL, 0);
+ if (flags < 0 || fcntl(poll_pipe[i], F_SETFL,
+ flags | O_NONBLOCK) == -1) {
+ close(poll_pipe[0]);
+ close(poll_pipe[1]);
+ poll_pipe[0] = poll_pipe[1] = -1;
+ break;
+ }
+ }
+ }
+ pfd[0].fd = poll_pipe[0];
+ pfd[0].events = POLLIN;
- CALLOC_ARRAY(pfd, socklist->nr);
-
- for (size_t i = 0; i < socklist->nr; i++) {
- pfd[i].fd = socklist->list[i];
+ for (size_t i = 1; i < nfds; i++) {
+ pfd[i].fd = socklist->list[i - 1];
pfd[i].events = POLLIN;
}
signal(SIGCHLD, child_handler);
for (;;) {
+ int nevents, scratch;
+
check_dead_children();
- if (poll(pfd, socklist->nr, -1) < 0) {
+ if ((nevents = poll(pfd, nfds, -1)) <= 0) {
if (errno != EINTR) {
logerror("Poll failed, resuming: %s",
strerror(errno));
@@ -1143,7 +1169,17 @@ static int service_loop(struct socketlist *socklist)
continue;
}
- for (size_t i = 0; i < socklist->nr; i++) {
+ if ((pfd[0].revents & POLLIN) && pfd[0].fd != -1) {
+ if (nevents == 1 && read(pfd[0].fd, &scratch, 1) > 0)
+ continue;
+ else if (!read(pfd[0].fd, &scratch, 1)) {
+ close(pfd[0].fd);
+ pfd[0].fd = -1;
+ }
+ nevents--;
+ }
+
+ for (size_t i = 1; nevents && i < nfds; i++) {
if (pfd[i].revents & POLLIN) {
union {
struct sockaddr sa;
@@ -1165,6 +1201,7 @@ static int service_loop(struct socketlist *socklist)
}
}
handle(incoming, &ss.sa, sslen);
+ nevents--;
}
}
}
--
2.50.0.131.gcf6f63ea6b.dirty
|
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this): This is a minimum POC of what could be done to remove signals when
handling children in git-daemon.
The advantage is that it is at least portable to *NIX systems, unlike
using a Linux specific API like pidfd which also allows collecting
status from children through fds.
It has to do some innefficient management of the fds array that is
given to poll, and wouldn't work as a general solution, since the
children could close the pipe() used for tracking unintentionally,
but since we control both sides in this case, that might not be a
problem.
Carlo Marcelo Arenas Belón (2):
run-command: add a pipe() write end to childs
daemon: poor man's pidfd like POC
daemon.c | 74 ++++++++++++++++++++++++++++++++++++---------------
run-command.c | 3 +++
run-command.h | 2 ++
3 files changed, 57 insertions(+), 22 deletions(-)
--
2.39.5 (Apple Git-154)
|
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this): Instruct the child to close the read side of a pipe provided by
the parent for notification that would be used in a future patch.
Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
---
run-command.c | 3 +++
run-command.h | 2 ++
2 files changed, 5 insertions(+)
diff --git a/run-command.c b/run-command.c
index 8833b23367..0156d3a01d 100644
--- a/run-command.c
+++ b/run-command.c
@@ -796,6 +796,9 @@ int start_command(struct child_process *cmd)
set_error_routine(child_error_fn);
set_warn_routine(child_warn_fn);
+ if (cmd->parent_ipc_in != -1)
+ close(cmd->parent_ipc_in);
+
close(notify_pipe[0]);
set_cloexec(notify_pipe[1]);
child_notifier = notify_pipe[1];
diff --git a/run-command.h b/run-command.h
index 0df25e445f..d731b400b9 100644
--- a/run-command.h
+++ b/run-command.h
@@ -81,6 +81,7 @@ struct child_process {
const char *trace2_child_class;
const char *trace2_hook_name;
+ int parent_ipc_in;
/*
* Using .in, .out, .err:
* - Specify 0 for no redirections. No new file descriptor is allocated.
@@ -147,6 +148,7 @@ struct child_process {
#define CHILD_PROCESS_INIT { \
.args = STRVEC_INIT, \
.env = STRVEC_INIT, \
+ .parent_ipc_in = -1, \
}
/**
--
2.39.5 (Apple Git-154)
|
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this): Create a pipe for each child, and let the write side leak into
the children, then add the read side to the poll and wait for
the pipe to close (presumed to happen when the child is done).
There is no need to change the children code, since the OS (at
least in *NIX) will close all fds on termination.
This implementation is suboptimal, as it shouldn't need to
scan all children once a notification is received, but does so
to minimize changes.
Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
---
daemon.c | 74 +++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 52 insertions(+), 22 deletions(-)
diff --git a/daemon.c b/daemon.c
index d1be61fd57..c78556be97 100644
--- a/daemon.c
+++ b/daemon.c
@@ -811,9 +811,19 @@ static struct child {
struct sockaddr_storage address;
} *firstborn;
-static void add_child(struct child_process *cld, struct sockaddr *addr, socklen_t addrlen)
+static void add_child(nfds_t *nfds, struct pollfd **pfd,
+ struct child_process *cld,
+ struct sockaddr *addr, socklen_t addrlen)
{
struct child *newborn, **cradle;
+ struct pollfd newfd;
+ nfds_t size = *nfds;
+
+ newfd.fd = cld->parent_ipc_in;
+ newfd.events = POLL_HUP;
+ *nfds = size + 1;
+ REALLOC_ARRAY(*pfd, *nfds);
+ memcpy(*pfd + size, &newfd, sizeof(newfd));
CALLOC_ARRAY(newborn, 1);
live_children++;
@@ -846,10 +856,11 @@ static void kill_some_child(void)
}
}
-static void check_dead_children(void)
+static void check_dead_children(nfds_t *nfds, struct pollfd *pfd, nfds_t base)
{
int status;
pid_t pid;
+ nfds_t size = *nfds;
struct child **cradle, *blanket;
for (cradle = &firstborn; (blanket = *cradle);)
@@ -859,6 +870,20 @@ static void check_dead_children(void)
dead = " (with error)";
loginfo("[%"PRIuMAX"] Disconnected%s", (uintmax_t)pid, dead);
+ for (nfds_t i = base; i < size; i++)
+ if (pfd[i].fd == blanket->cld.parent_ipc_in) {
+ close(blanket->cld.parent_ipc_in);
+ size--;
+ if (size - i) {
+ MOVE_ARRAY(pfd + i,
+ pfd + i + 1,
+ size - i);
+ }
+
+ *nfds = size;
+ break;
+ }
+
/* remove the child */
*cradle = blanket->next;
live_children--;
@@ -869,14 +894,16 @@ static void check_dead_children(void)
}
static struct strvec cld_argv = STRVEC_INIT;
-static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
+static void handle(int incoming, nfds_t *nfds, struct pollfd **pfd, nfds_t base,
+ struct sockaddr *addr, socklen_t addrlen)
{
+ int ipc[2] = { -1, -1 };
struct child_process cld = CHILD_PROCESS_INIT;
if (max_connections && live_children >= max_connections) {
kill_some_child();
sleep(1); /* give it some time to die */
- check_dead_children();
+ check_dead_children(nfds, *pfd, base);
if (live_children >= max_connections) {
close(incoming);
logerror("Too many children, dropping connection");
@@ -902,24 +929,22 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
#endif
}
+#ifndef GIT_WINDOWS_NATIVE
+ pipe(ipc);
+ cld.parent_ipc_in = ipc[0];
+#endif
+
strvec_pushv(&cld.args, cld_argv.v);
cld.in = incoming;
cld.out = dup(incoming);
if (start_command(&cld))
logerror("unable to fork");
- else
- add_child(&cld, addr, addrlen);
-}
-
-static void child_handler(int signo UNUSED)
-{
- /*
- * Otherwise empty handler because systemcalls will get interrupted
- * upon signal receipt
- * SysV needs the handler to be rearmed
- */
- signal(SIGCHLD, child_handler);
+ else {
+ if (ipc[1] != -1)
+ close(ipc[1]);
+ add_child(nfds, pfd, &cld, addr, addrlen);
+ }
}
static int set_reuse_addr(int sockfd)
@@ -1121,6 +1146,7 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
static int service_loop(struct socketlist *socklist)
{
struct pollfd *pfd;
+ nfds_t nfds = socklist->nr;
CALLOC_ARRAY(pfd, socklist->nr);
@@ -1129,12 +1155,10 @@ static int service_loop(struct socketlist *socklist)
pfd[i].events = POLLIN;
}
- signal(SIGCHLD, child_handler);
-
for (;;) {
- check_dead_children();
+ size_t i;
- if (poll(pfd, socklist->nr, -1) < 0) {
+ if (poll(pfd, nfds, -1) < 0) {
if (errno != EINTR) {
logerror("Poll failed, resuming: %s",
strerror(errno));
@@ -1143,7 +1167,12 @@ static int service_loop(struct socketlist *socklist)
continue;
}
- for (size_t i = 0; i < socklist->nr; i++) {
+ for (i = socklist->nr; i < nfds; i++) {
+ if (pfd[i].revents & POLLHUP)
+ check_dead_children(&nfds, pfd, socklist->nr);
+ }
+
+ for (i = 0; i < socklist->nr; i++) {
if (pfd[i].revents & POLLIN) {
union {
struct sockaddr sa;
@@ -1164,7 +1193,8 @@ static int service_loop(struct socketlist *socklist)
die_errno("accept returned");
}
}
- handle(incoming, &ss.sa, sslen);
+ handle(incoming, &nfds, &pfd, socklist->nr,
+ &ss.sa, sslen);
}
}
}
--
2.39.5 (Apple Git-154)
|
@@ -37,6 +37,8 @@ include shared.mak | |||
# when attempting to read from an fopen'ed directory (or even to fopen |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Carlo Marcelo Arenas Belón via GitGitGadget"
<[email protected]> writes:
> +AC_CACHE_CHECK([whether SA_RESTART is supported], [ac_cv_siginterrupt], [
> + AC_COMPILE_IFELSE(
> + [AC_LANG_PROGRAM([#include <signal.h>], [[
> + #ifdef SA_RESTART
> + restartable signals supported
> + #endif
So, where SA_RESTART is defined, we fail the compilation.
> + ]])],[
> + ac_cv_siginterrupt=no
> + NO_RESTARTABLE_SIGNALS=UnfortunatelyYes
As this is IFELSE, we know the condition that did not fail the
compilation is where we did not see SA_RESTART. So we set the
NO_RESTARTABLE_SIGNALS=UnfortunatelyYes, which makes sense.
> + ], [ac_cv_siginterrupt=yes]
> + )
> +])
> +GIT_CONF_SUBST([NO_RESTARTABLE_SIGNALS])
It is curious that throughout the two renames, the cached variable
used by autoconf hasn't changed its name. Is it because it is
totally invisible to the end-users/builders?
On the Git mailing list, Phillip Wood wrote (reply to this): Hi Carlo
On 26/06/2025 19:24, Carlo Marcelo Arenas Belón wrote:
> There has always been a small race condition between the time a
> children status is checked, and the arrival of a SIGCHLD that
> would alert us of their demise, hopefully interrupt poll().
The race was introduced by 695605b5080 (git-daemon: Simplify dead-children reaping logic, 2008-08-14). Before that children were reaped inside the signal handler so there was no race.
> To close the gap, add the reading side of a pipe to the poll and
> use the signal handler to write to it for each child.
As well as closing the race this fixes the issue with poll() not being interrupted when a child exits. It also allows us to move the re-arming of the handler into the event loop which would fix the infinite recursion on AIX.
> Suggested=by: Phillip Wood <[email protected]>
s/=/-/
> Signed-off-by: Carlo Marcelo Arenas Belón <[email protected]>
> ---
> Implements the "self pipe" trick Hillip suggested.
I've never been called that before :-/
> > I tried to make self healing and optional, as I also presume the race
> condition it is meant to address is very unlikely.
I see this as an alternative fix for the other problems you've been working on as well.
> An obvious disadvantage (at least of this implementation), is that it
> actually doubles the number of events that need to be handled for each
> children process on most cases (ex: when `poll()` gets interrupted)
> > I suspect that if fixing that last race condition is so important with
> the current foundation, it might be better to reintroduce some sort of
> timeout to poll(), so that they will be cleared periodically.
> > I had a prototype (only the bare minimum) that I thought was more
> efficient and that would instead remove completely the need for a
> signal handler which I would post (only for RFC) later.
I'm not sure injecting an fd into each child process is a good direction.
> daemon.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 46 insertions(+), 9 deletions(-)
> > diff --git a/daemon.c b/daemon.c
> index d1be61fd57..d3b9421575 100644
> --- a/daemon.c
> +++ b/daemon.c
> @@ -912,14 +912,17 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
> add_child(&cld, addr, addrlen);
> }
> > -static void child_handler(int signo UNUSED)
> +int poll_pipe[2] = { -1, -1 };
Maybe call this signal_pipe? I'm not sure what poll_pipe means.
> +
> +static void child_handler(int signo)
> {
> /*
> - * Otherwise empty handler because systemcalls will get interrupted
> - * upon signal receipt
> * SysV needs the handler to be rearmed
> */
> signal(SIGCHLD, child_handler);
I think from Chris' email that it is conventional to do this at the end of the handler. As I said above we could add an additional commit that moves this into the event loop to fix the infinite recursion on AIX.
> +
> + if (poll_pipe[1] >= 0)
> + write(poll_pipe[1], &signo, 1);
write() might fail so we should save errno around it. Conventionally one would re-try on EINTR as well though in this case the most likely reason for that is another child exiting which means the pipe would be written to anyway.
> }
> > static int set_reuse_addr(int sockfd)
> @@ -1121,20 +1124,43 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> static int service_loop(struct socketlist *socklist)
> {
> struct pollfd *pfd;
> + unsigned long nfds = 1 + socklist->nr;
> +
> + ALLOC_ARRAY(pfd, nfds);
> + if (!pipe(poll_pipe)) {
If we cannot create a pipe here then things have gone pretty badly wrong and I think it is unlikely we're going to be able to accept incoming connections so it would be best to die(). Relying on the signal pipe would solve the problem of children not being collected until we receive an new connection as well.
> + for (int i = 0; i < 2; i++) {
The body of this loop is quite indented - I think it would be better to turn it into a function.
> + int flags;
> +
> + flags = fcntl(poll_pipe[i], F_GETFD, 0);
> + if (flags >= 0)
> + fcntl(poll_pipe[i], F_SETFD, flags | FD_CLOEXEC);
I think we should probably close the pipes if we do not set FD_CLOEXEC.
> +
> + flags = fcntl(poll_pipe[i], F_GETFL, 0);
> + if (flags < 0 || fcntl(poll_pipe[i], F_SETFL,
> + flags | O_NONBLOCK) == -1) {
> + close(poll_pipe[0]);
> + close(poll_pipe[1]);
> + poll_pipe[0] = poll_pipe[1] = -1;
> + break;
> + }
> + }
> + }
> + pfd[0].fd = poll_pipe[0];
> + pfd[0].events = POLLIN;
> > - CALLOC_ARRAY(pfd, socklist->nr);
> -
> - for (size_t i = 0; i < socklist->nr; i++) {
> - pfd[i].fd = socklist->list[i];
> + for (size_t i = 1; i < nfds; i++) {
> + pfd[i].fd = socklist->list[i - 1];
> pfd[i].events = POLLIN;
> }
> > signal(SIGCHLD, child_handler);
> > for (;;) {
> + int nevents, scratch;
> +
> check_dead_children();
> > - if (poll(pfd, socklist->nr, -1) < 0) {
> + if ((nevents = poll(pfd, nfds, -1)) <= 0) {
> if (errno != EINTR) {
> logerror("Poll failed, resuming: %s",
> strerror(errno));
sleep(1);
}
continue;
We need to drain the signal pipe on EINTR. I think it would be best to do that just before calling check_dead_childern() above. We can avoid calling check_dead_children() if we don't read anything.
}
> @@ -1143,7 +1169,17 @@ static int service_loop(struct socketlist *socklist)
> continue;
> }
> > - for (size_t i = 0; i < socklist->nr; i++) {
> + if ((pfd[0].revents & POLLIN) && pfd[0].fd != -1) {
> + if (nevents == 1 && read(pfd[0].fd, &scratch, 1) > 0)
There can be more than one byte to read from the signal_pipe - we should drain it until we see EAGAIN. As I said above I think it would be better to do that just before calling check_dead_children(). Here we can just decrement nevents if poll tells us the signal_pipe is readable.
Thanks
Phillip
> + continue;
> + else if (!read(pfd[0].fd, &scratch, 1)) {
> + close(pfd[0].fd);
> + pfd[0].fd = -1;
> + }
> + nevents--;
> + }
> +
> + for (size_t i = 1; nevents && i < nfds; i++) {
> if (pfd[i].revents & POLLIN) {
> union {
> struct sockaddr sa;
> @@ -1165,6 +1201,7 @@ static int service_loop(struct socketlist *socklist)
> }
> }
> handle(incoming, &ss.sa, sslen);
> + nevents--;
> }
> }
> } |
This patch series was integrated into seen via 7d05ba3. |
On the Git mailing list, Carlo Marcelo Arenas Belón wrote (reply to this): On Fri, Jun 27, 2025 at 09:38:36AM -0800, Phillip Wood wrote:
>
> On 26/06/2025 19:24, Carlo Marcelo Arenas Belón wrote:
> >
> > I had a prototype (only the bare minimum) that I thought was more
> > efficient and that would instead remove completely the need for a
> > signal handler which I would post (only for RFC) later.
>
> I'm not sure injecting an fd into each child process is a good direction.
I don't either, but frankly don't think it is as much of an issue, if we
consider that all the children are known internal processes we also own.
Was hoping that by producing a working (albeit ugly in purpose so changes
to the current code would be minimized) prototype, we could have a
discussion of the potential issues/benefits that would be a little more
verbose.
Indeed I posted this an the other series both as RFC, and as replies of
each other hoping to give them both a chance to be evaluated as alternatives
together.
> > diff --git a/daemon.c b/daemon.c
> > index d1be61fd57..d3b9421575 100644
> > --- a/daemon.c
> > +++ b/daemon.c
> > @@ -912,14 +912,17 @@ static void handle(int incoming, struct sockaddr *addr, socklen_t addrlen)
> > add_child(&cld, addr, addrlen);
> > }
> > -static void child_handler(int signo UNUSED)
> > +int poll_pipe[2] = { -1, -1 };
> Maybe call this signal_pipe? I'm not sure what poll_pipe means.
>
> > +
> > +static void child_handler(int signo)
> > {
> > /*
> > - * Otherwise empty handler because systemcalls will get interrupted
> > - * upon signal receipt
> > * SysV needs the handler to be rearmed
> > */
> > signal(SIGCHLD, child_handler);
>
> I think from Chris' email that it is conventional to do this at the end of
> the handler.
It really depends on which flags the signal was expected to use. For maximum
portability it would seem better to have it at the beginning to minimize the
inherent race condition from when SA_RESETHAND is on effect (which was more
of a SysV signal() tradition).
Will move it to the end, regardless, as it won't be needed once the call is
setup using sigaction().
> As I said above we could add an additional commit that moves
> this into the event loop to fix the infinite recursion on AIX.
Not sure I understant what you mean by this. I think Chris made clear that
the AIX issue comes from the handler being expected to do the wait() calls,
so the only way to "solve" it would be to move the `check_dead_children()`
logic back to the handler (which will require a lot more changes), using
sigaction() seems like a simpler solution.
> > +
> > + if (poll_pipe[1] >= 0)
> > + write(poll_pipe[1], &signo, 1);
>
> write() might fail so we should save errno around it. Conventionally one
> would re-try on EINTR as well though in this case the most likely reason for
> that is another child exiting which means the pipe would be written to
> anyway.
EINTR shouldn't be possible here from SIGCHLD, because that signal is
blocked here, I wrote it this way because it was only meant to be used as
a fallback to the EINTR being triggered in poll() and so it wouldn't matter
if it was lost (for example because of a SIGPIPE).
Once we move to sigaction, SIGPIPE could be added to the mask for this
handler, but that code can't be implemented on the current base.
> > }
> > static int set_reuse_addr(int sockfd)
> > @@ -1121,20 +1124,43 @@ static void socksetup(struct string_list *listen_addr, int listen_port, struct s
> > static int service_loop(struct socketlist *socklist)
> > {
> > struct pollfd *pfd;
> > + unsigned long nfds = 1 + socklist->nr;
> > +
> > + ALLOC_ARRAY(pfd, nfds);
> > + if (!pipe(poll_pipe)) {
>
> If we cannot create a pipe here then things have gone pretty badly wrong and
> I think it is unlikely we're going to be able to accept incoming connections
> so it would be best to die().
True, but since this is "optional", there is no harm on letting this continue
even in failure.
> > + int flags;
> > +
> > + flags = fcntl(poll_pipe[i], F_GETFD, 0);
> > + if (flags >= 0)
> > + fcntl(poll_pipe[i], F_SETFD, flags | FD_CLOEXEC);
> I think we should probably close the pipes if we do not set FD_CLOEXEC.
Why?, worst case is that we leak a pipe to the children, that wouldn't know
of it and woule be able to work either way.
Carlo |
On the Git mailing list, Junio C Hamano wrote (reply to this): Phillip Wood <[email protected]> writes:
>> An obvious disadvantage (at least of this implementation), is that it
>> actually doubles the number of events that need to be handled for each
>> children process on most cases (ex: when `poll()` gets interrupted)
>> I suspect that if fixing that last race condition is so important
>> with
>> the current foundation, it might be better to reintroduce some sort of
>> timeout to poll(), so that they will be cleared periodically.
>> I had a prototype (only the bare minimum) that I thought was more
>> efficient and that would instead remove completely the need for a
>> signal handler which I would post (only for RFC) later.
>
> I'm not sure injecting an fd into each child process is a good direction.
Yeah, I thought that the "self pipe trick" (in the title) refers to
the technique to have a single pipe for the daemon to talk to
itself, so that it can write(2) into the pipe in non-blocking way
upon signal and expect its select/poll to be able to notice, where
it can reap the completed children, so having a pipe per child was a
surprise to me. |
This patch series was integrated into seen via 7ad7333. |
This series addresses and ambiguity that is at least visible in OpenBSD, where zombie proceses would only be cleared after a new connection is received.
The underlying problem is that when this code was originally introduced, SA_RESTART was not widely implemented, and the
signal()
call usually implemented SysV like semantics, at least until it started being reimplemented by callingsigaction()
internally.Changes since v2
Changes since v1
cc: Carlo Marcelo Arenas Belón [email protected]
cc: Chris Torek [email protected]
cc: Phillip Wood [email protected]