Skip to content

gh-91555: disable logger while handling log record #131812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 8, 2025
Merged

Conversation

duaneg
Copy link
Contributor

@duaneg duaneg commented Mar 27, 2025

Prevent the possibility of re-entrancy and deadlock or infinite recursion caused by logging triggered by logging by disabling logging while the logger is handling log messages.

Prevent the possibility of re-entrancy and deadlock or infinite recursion
caused by logging triggered by logging by disabling logging while the logger is
handling log messages.
@duaneg duaneg requested a review from vsajip as a code owner March 27, 2025 23:13
@bedevere-app
Copy link

bedevere-app bot commented Mar 27, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

return
maybe_record = self.filter(record)
if not maybe_record:
if self._is_disabled():
Copy link
Contributor

@graingert graingert Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than disabling the logging, can we instead append the record to a self._reentrant_records = collections.deque(), and then process all of the pending records:

            maybe_record = self.filter(record)
            if not maybe_record:
                return
            if isinstance(maybe_record, LogRecord):
                record = maybe_record
            was_calling_handlers = set_calling_handlers()
            try:
                if not was_calling_handlers:
                    self.callHandlers(record)
                    while True:
                        try:
                            record = self._reentrant_records.popleft()
                        except IndexError:
                            return
                        self.callHandlers(record)
                else:
                    self._reentrant_records.append(record)
            finally:
                set_not_calling_handlers()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still produce a stack overflow if handling the deferred log message itself logs another message

Copy link
Contributor

@graingert graingert Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how this would cause a stack overflow, if handling the log message logs another message it would go onto the _reentrant_records queue, and then be processed later once the stack returns all the way back to where set_calling_handlers() is first called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following how this would cause a stack overflow, if handling the log message logs another message it would go onto the _reentrant_records queue, and then be processed later once the stack returns all the way back to where set_calling_handlers() is first called.

Sorry, I should have said deadlock with the current example. The stack overflow is from a different way of triggering this (see the second unit test added).

The trouble is that when the first recursive logging call exits the finally block it clears the "calling handlers" flag, which means a subsequent (still recursive) one takes the wrong path and deadlocks/overflows.

That can be avoided for the original triggering example by only clearing the "handling" flag if it was initially unset (the deferred records collection also needs to be TLS not a member variable). It ends up looking something like this:

        if not hasattr(self._tls, 'reentrant_records'):
            self._tls.reentrant_records = deque()
        deferred = self._tls.reentrant_records                      
        was_calling_handlers = self.set_calling_handlers()
        try:
            if not was_calling_handlers:
                self.callHandlers(record)
                while deferred:
                    self.callHandlers(deferred.popleft())
            else:
                deferred.append(record)
        finally:
            if not was_calling_handlers:
                self.set_not_calling_handlers()

This fixes the two bugs, which only log the first time they try to process a log record (and means those recursive log messages are logged and not silently ignored, which is nice). However a different example which logs every time (such as the second unit test) will still live-lock and never exit that while loop.

Copy link
Contributor

@graingert graingert Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will still live-lock and never exit that while loop.

Does the system keep logging forever instead?

This seems acceptable as you'd easily track down this bug just by looking at the logs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the system keep logging forever instead?

Yep, it will just sit in a tight loop spamming the log forever, or at least until/unless that exhausts disk space or wherever the logs are actually going.

This seems acceptable as you'd easily track down this bug just by looking at the logs

IMO it is not a great failure mode, but it will certainly be obvious!

FWIW I think I prefer ignoring them: the code is much simpler and it prevents the issue in non-trivial handler implementations like Sentry's (that would otherwise trigger the live-lock failure). I was hoping this fix would mean they would be able to remove that nasty monkey-patching on supported versions.

OTOH it is definitely nice to actually handle instead of drop the recursive log messages, in cases where they don't always continue to recurse.

Copy link
Contributor

@graingert graingert Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could have a logging.(_)N_RECURSIVE_CALLS constant to limit this so it's not forever

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. Another alternative would be Victor Stinner's suggestion in the discourse discussion to raise an exception. That would bring it to the user's attention and force them to deal with it.

Ultimately, though, the way they will have to deal with it is by preventing, disabling, or otherwise intercepting and ignoring all such logging. That will be difficult to do reliably outside the core, will likely be overlooked unless/until it bites, and have to be done in every susceptible log handler or application that uses such.

IMO it would be better for us to do this once, centrally, with a small, simple, and robust fix.

@gpshead
Copy link
Member

gpshead commented Mar 29, 2025

TODO list:

  • unit test(s) for the errant behavior would be helpful while working on this and are needed to prevent future regression.

@bedevere-app
Copy link

bedevere-app bot commented Mar 30, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@duaneg
Copy link
Contributor Author

duaneg commented Mar 30, 2025

Would a news entry be appropriate for this?

@graingert
Copy link
Contributor

Would a news entry be appropriate for this?

yes this needs a news entry

@vsajip vsajip added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 14, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @vsajip for commit 7b68d12 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F131812%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 14, 2025
Copy link
Member

@vsajip vsajip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duaneg, from the issue:

It disables logging on a per-thread basis and only while the thread runs the message handlers (+filters). Which is to say, it is disabled only while running code that would trigger this bug if it logged a message. Or at least, that is my intent: if I've overlooked anything or there is a bug, please let me know!

It doesn't stop other threads from logging to the same logger and/or handlers.

I think a test with logging from multiple threads and other handlers should be added to confirm this.

@bedevere-app
Copy link

bedevere-app bot commented Apr 17, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

…andles a

message does not block a different thread from handling a message on the same
logger.
@duaneg
Copy link
Contributor Author

duaneg commented Apr 17, 2025

I think a test with logging from multiple threads and other handlers should be added to confirm this.

Good idea, will add, thanks!

@duaneg duaneg requested a review from vsajip April 17, 2025 11:48
@vsajip vsajip added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 18, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @vsajip for commit 99788f7 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F131812%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 18, 2025
@vsajip vsajip merged commit 2561e14 into python:main May 8, 2025
122 of 126 checks passed
@vsajip vsajip added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels May 11, 2025
@miss-islington-app
Copy link

Thanks @duaneg for the PR, and @vsajip for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @duaneg for the PR, and @vsajip for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 11, 2025
…1812)

Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages.
(cherry picked from commit 2561e14)

Co-authored-by: Duane Griffin <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 11, 2025
…1812)

Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages.
(cherry picked from commit 2561e14)

Co-authored-by: Duane Griffin <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented May 11, 2025

GH-133898 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label May 11, 2025
@bedevere-app
Copy link

bedevere-app bot commented May 11, 2025

GH-133899 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 11, 2025
vsajip pushed a commit that referenced this pull request May 12, 2025
vsajip pushed a commit that referenced this pull request May 12, 2025
…GH-133899)

Prevent the possibility of re-entrancy leading to deadlock or infinite recursion (caused by logging triggered by logging), by disabling logging while the logger is handling log messages.
(cherry picked from commit 2561e14)

Co-authored-by: Duane Griffin <[email protected]>
openstack-mirroring pushed a commit to openstack/swift that referenced this pull request Jun 6, 2025
It used to be that when a logger encountered an error trying to handle
/ emit a message, it would attempt to log about the error via the same
logger. Long ago, this could lead to infinite recursion in Swift's
LoggerFileObject, so we fixed it and added a test that included
showing the bad stdlib behavior.

Recently, stdlib did a similar fix so the recursion doesn't happen. See

 - python/cpython#91555 and
 - python/cpython#131812

Of course, now our test breaks (at least, on 3.13.4). Relax the
assertion a little.

Related-Change: Ia9ecffc88ce43616977e141498e5ee404f2c29c4
Change-Id: Id2f490d4204b8eaf07857cb84ed783bec19b8511
openstack-mirroring pushed a commit to openstack/openstack that referenced this pull request Jun 6, 2025
* Update swift from branch 'master'
  to 28db9bbcddc966ca69fe64604417a82a46e1c918
  - tests: Fix test_LoggerFileObject_recursion
    
    It used to be that when a logger encountered an error trying to handle
    / emit a message, it would attempt to log about the error via the same
    logger. Long ago, this could lead to infinite recursion in Swift's
    LoggerFileObject, so we fixed it and added a test that included
    showing the bad stdlib behavior.
    
    Recently, stdlib did a similar fix so the recursion doesn't happen. See
    
     - python/cpython#91555 and
     - python/cpython#131812
    
    Of course, now our test breaks (at least, on 3.13.4). Relax the
    assertion a little.
    
    Related-Change: Ia9ecffc88ce43616977e141498e5ee404f2c29c4
    Change-Id: Id2f490d4204b8eaf07857cb84ed783bec19b8511
vsajip added a commit to vsajip/cpython that referenced this pull request Jun 23, 2025
vsajip added a commit that referenced this pull request Jun 25, 2025
…35858)

Revert "gh-91555: disable logger while handling log record (GH-131812)"

This reverts commit 2561e14.
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2025
…pythonGH-135858)

Revert "pythongh-91555: disable logger while handling log record (pythonGH-131812)"

This reverts commit 2561e14.
(cherry picked from commit 18d32fb)

Co-authored-by: Vinay Sajip <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 25, 2025
…pythonGH-135858)

Revert "pythongh-91555: disable logger while handling log record (pythonGH-131812)"

This reverts commit 2561e14.
(cherry picked from commit 18d32fb)

Co-authored-by: Vinay Sajip <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants