component-base/logs: improve handling of re-applying a configuration #117108

pohly · 2023-04-05T14:06:22Z

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

Normal binaries should never have re-apply the configuration. It's not safe when there are already some goroutines running which might do logging. Therefore the new default is to return an error when a binary accidentally re-applies.

The unit tests of component-base/logs ensure that there are no goroutines and have to call the functions more than once.

Integration tests use the same code as the normal binaries. To make reuse of that code safe, component-base/logs can be configured to silently ignore any additional calls. This addresses data races that were found when enabling -race for integration tests.

To avoid having to modify all integration tests which start test servers, reconfiguring component-base/logs is done by the test server packages.

Which issue(s) this PR fixes:

Data race in k8s.io/kubernetes/test/integration/serving.TestComponentSecureServingAndAuth.

WARNING: DATA RACE
Write at 0x00000b345548 by goroutine 6459:
  k8s.io/kubernetes/vendor/k8s.io/klog/v2.ClearLogger()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/contextual.go:137 +0x524
  k8s.io/kubernetes/vendor/k8s.io/component-base/logs/api/v1.apply()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/logs/api/v1/options.go:194 +0x514
  k8s.io/kubernetes/vendor/k8s.io/component-base/logs/api/v1.validateAndApply()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/logs/api/v1/options.go:102 +0xab
  k8s.io/kubernetes/vendor/k8s.io/component-base/logs/api/v1.ValidateAndApply()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/component-base/logs/api/v1/options.go:68 +0xbb
  k8s.io/kubernetes/vendor/k8s.io/cloud-provider/app.NewCloudControllerManagerCommand.func1()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/cloud-provider/app/controllermanager.go:94 +0x91
  k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:916 +0xbc5
  k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:1040 +0x5da
  k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:968 +0x2e
  k8s.io/kubernetes/vendor/k8s.io/cloud-provider/app/testing.StartTestServer.func4()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/cloud-provider/app/testing/testserver.go:169 +0xe5

Previous read at 0x00000b345548 by goroutine 2137:
  k8s.io/kubernetes/vendor/k8s.io/klog/v2.newVerbose()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1295 +0x6aa
  k8s.io/kubernetes/vendor/k8s.io/klog/v2.VDepth()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1359 +0x6b7
  k8s.io/kubernetes/vendor/k8s.io/klog/v2.V()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1322 +0x72a
  k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi.(*AggregationController).processNextWorkItem()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi/controller.go:122 +0x737
  k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi.(*AggregationController).runWorker()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi/controller.go:92 +0x39
  k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi.(*AggregationController).runWorker-fm()
      <autogenerated>:1 +0x1d
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x48
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xce
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x10d
  k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.Until()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x48
  k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi.(*AggregationController).Run.func4()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi/controller.go:86 +0x4c

Goroutine 6459 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/cloud-provider/app/testing.StartTestServer()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/cloud-provider/app/testing/testserver.go:165 +0x1ac4
  k8s.io/kubernetes/test/integration/serving.cloudControllerManagerTester.StartTestServer()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/serving/serving_test.go:62 +0x87
  k8s.io/kubernetes/test/integration/serving.(*cloudControllerManagerTester).StartTestServer()
      <autogenerated>:1 +0x88
  k8s.io/kubernetes/test/integration/serving.testComponentWithSecureServing.func1()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/serving/serving_test.go:228 +0x231
  testing.tRunner()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/.gimme/versions/go1.20.2.linux.amd64/src/testing/testing.go:1576 +0x216
  testing.(*T).Run.func1()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/.gimme/versions/go1.20.2.linux.amd64/src/testing/testing.go:1629 +0x47

Goroutine 2137 (running) created at:
  k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi.(*AggregationController).Run()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/controllers/openapi/controller.go:86 +0x351
  k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/apiserver.(*APIAggregator).PrepareRun.func1.1()
      /home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/k8s.io/kube-aggregator/pkg/apiserver/apiserver.go:386 +0x47

Does this PR introduce a user-facing change?

component-base/logs is now more strict about not applying configurations multiple times and will return an error when that is attempted. Can be overridden by binaries which need to do that.

k8s-ci-robot · 2023-04-05T14:06:25Z

Please note that we're already in Test Freeze for the release-1.27 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.27.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Wed Apr 5 10:36:42 UTC 2023.

k8s-triage-robot · 2023-04-05T14:50:47Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

pohly · 2023-04-06T06:51:21Z

/retest

logicalhan · 2023-04-06T16:34:00Z

/triage accepted

cici37 · 2023-04-06T16:34:02Z

/triage accepted

pohly · 2023-05-23T10:41:49Z

/assign @liggitt

Can you perhaps take a look? You were involved in designing the k8s.io/component-base/logs/api/v1 before.

liggitt · 2023-05-23T15:13:28Z

staging/src/k8s.io/component-base/logs/api/v1/options.go

+ default:
+ return fmt.Errorf("invalid value %d for ReapplyHandling", ReapplyHandling)
+ }
+ }


would it make more sense to actually detect if the config being applied differed, and always error if so?

that would avoid false positives where identical config was specified, and avoid silently ignoring config changes in tests

I think it should continue to be an error in applications (iReapplyHandlingError) f they apply twice, even if the config is the same: the config might only be the same under the current circumstances and different under others. I think it's better to fail under all circumstances because it makes obvious that something is wrong.

Comparing configs might make sense for ReapplyHandlingIgnore: if an integration test tries to run with different configs, then it's better to raise that as an error instead of silently ignoring it.

For ReapplyHandlingAllow it's most likely a different config, so we can compare and skip if equal, it just won't change much.

collapsing the test-caller cases to ignore a reapply if identical and error if different seems good... if a test is reapplying a different config it is not testing what it thinks it is, and I would want to know so I could move my test to a different package or something to get an isolated test process

The unit tests where ReapplyHandlingAllow is used do need to apply different configs - that's their purpose 😅

So combining both cases doesn't work. I've pushed a version where the config get compared for the ReapplyHandlingIgnore case.

I'm saying we should not have ReapplyHandlingAllow, only ReapplyHandlingIgnoreIfEqual and ReapplyHandlingError

Some of those unit tests are elsewhere:

test/integration/logs/benchmark/load_test.go

staging/src/k8s.io/component-base/logs/json/register/register_test.go

Some of those unit tests are elsewhere:

I would move the unit tests, either into the package that can reset the state, or into separate integration packages, rather than enable an "ignore ineffective config" option

either into the package that can reset the state

That would be k8s.io/component-base/logs/api/v1. Then that package inherits new dependencies (like zapr) that are undesirable and/or we get circular dependencies.

separate integration packages

Isn't that what test/integration/logs/benchmark/load_test.go is? How would such a package reset the state in k8s.io/component-base/logs/api/v1 if we don't offer an API for that?

Hmm, perhaps we can add a k8s.io/component-base/logs/api/v1.Reset? That has to come with a big warning that it should only be used when no goroutines are running and is only meant for such unit tests - basically the same semantic as ReapplyHandlingAllow, except that it is a separate call.

that seems ok. we've done things like that before:

// SetServiceResolverForTests allows the service resolver to be overridden during tests. // Tests using this function must run serially as this function is not safe to call concurrently with server start. func SetServiceResolverForTests(resolver webhook.ServiceResolver) func() {

// SetForTests sets capabilities for tests. Convenience method for testing. This should only be called from tests. func SetForTests(c Capabilities) {

Done: https://github.com/kubernetes/kubernetes/compare/64e1273b0d3f9ef984b8022ac1c872fe4c4197d4..4c953f58ae8c66ddc8b4dc620297e45d43c284e4

pohly · 2023-06-26T12:35:12Z

Rebased due to a conflict.

pohly · 2023-06-29T19:03:50Z

This PR is getting tested in combination with enabled race detection over in #117108.

staging/src/k8s.io/component-base/logs/api/v1/options.go

liggitt · 2023-07-05T16:37:34Z

staging/src/k8s.io/component-base/logs/api/v1/options.go

+// Reset restores the default settings. This is not thread-safe and should only
+// be used when there are no goroutines running. The intented users are unit
+// tests in other packages.
+func Reset(featureGate featuregate.FeatureGate) error {


Suggested change

// Reset restores the default settings. This is not thread-safe and should only

// be used when there are no goroutines running. The intented users are unit

// tests in other packages.

func Reset(featureGate featuregate.FeatureGate) error {

// ResetForTest restores the default settings. This is not thread-safe and should only

// be used when there are no goroutines running. The intended users are unit

// tests in other packages.

func ResetForTest(featureGate featuregate.FeatureGate) error {

Normal binaries should never have to do this. It's not safe when there are already some goroutines running which might do logging. Therefore the new default is to return an error when a binary accidentally re-applies. A few unit ensure that there are no goroutines and have to call the functions more then once. The new ResetForTest API gets used by those to enable changing the logging settings more than once in the same process. Integration tests use the same code as the normal binaries. To make reuse of that code safe, component-base/logs can be configured to silently ignore any additional calls. This addresses data races that were found when enabling -race for integration tests. To catch cases where the integration test does want to modify the config, the old and new config get compared and an error is raised when it's not the same. To avoid having to modify all integration tests which start test servers, reconfiguring component-base/logs is done by the test server packages.

liggitt · 2023-07-05T17:10:27Z

/lgtm
/approve

k8s-ci-robot · 2023-07-05T17:10:34Z

LGTM label has been added.

Git tree hash: ce34b562b69c93999bd078d18803ceaffbb46fdb

k8s-ci-robot · 2023-07-05T17:10:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cmd/kube-apiserver/OWNERS~~ [liggitt]
~~cmd/kube-controller-manager/OWNERS~~ [liggitt]
~~cmd/kube-scheduler/OWNERS~~ [liggitt]
~~staging/src/k8s.io/apiextensions-apiserver/OWNERS~~ [liggitt]
~~staging/src/k8s.io/cloud-provider/OWNERS~~ [liggitt]
~~staging/src/k8s.io/component-base/logs/OWNERS~~ [liggitt,pohly]
~~staging/src/k8s.io/component-base/logs/api/OWNERS~~ [liggitt]
~~test/integration/logs/OWNERS~~ [liggitt,pohly]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-triage-robot · 2023-07-05T20:08:44Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

k8s-triage-robot · 2023-07-05T23:48:44Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

k8s-triage-robot · 2023-07-06T03:30:45Z

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

The PR does have any do-not-merge/* labels
The PR does not have the needs-ok-to-test label
The PR is mergeable (does not have a needs-rebase label)
The PR is approved (has cncf-cla: yes, lgtm, approved labels)
The PR is failing tests required for merge

You can:

Review the full test history for this PR
Prevent this bot from retesting with /lgtm cancel or /hold
Help make our tests less flaky by following our Flaky Tests Guide

/retest

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 5, 2023

k8s-ci-robot requested review from chendave and dashpole April 5, 2023 14:07

pohly mentioned this pull request Apr 5, 2023

WIP: integration: run with race detection enabled #116980

Draft

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 6, 2023

k8s-ci-robot assigned liggitt May 23, 2023

liggitt reviewed May 23, 2023

View reviewed changes

pohly force-pushed the test-integration-race-detection-component-base-logs branch from 0bbac15 to 47b086e Compare May 24, 2023 11:17

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 22, 2023

pohly force-pushed the test-integration-race-detection-component-base-logs branch from 47b086e to f9334fe Compare June 26, 2023 12:35

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 26, 2023

pohly force-pushed the test-integration-race-detection-component-base-logs branch from f9334fe to 64e1273 Compare June 29, 2023 15:35

pohly force-pushed the test-integration-race-detection-component-base-logs branch from 64e1273 to 4c953f5 Compare July 5, 2023 15:41

pohly commented Jul 5, 2023

View reviewed changes

staging/src/k8s.io/component-base/logs/api/v1/options.go Outdated Show resolved Hide resolved

pohly force-pushed the test-integration-race-detection-component-base-logs branch from 4c953f5 to c031805 Compare July 5, 2023 15:50

liggitt reviewed Jul 5, 2023

View reviewed changes

pohly force-pushed the test-integration-race-detection-component-base-logs branch from c031805 to bd4b902 Compare July 5, 2023 17:08

pohly force-pushed the test-integration-race-detection-component-base-logs branch from bd4b902 to 02efe09 Compare July 5, 2023 17:09

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 5, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 5, 2023

k8s-ci-robot merged commit e5efa0a into kubernetes:master Jul 6, 2023
13 checks passed

k8s-ci-robot added this to the v1.28 milestone Jul 6, 2023

jaskaransarkaria mentioned this pull request Jun 5, 2024

Planning upgrade to EKS 1.28 ministryofjustice/cloud-platform#5570

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

component-base/logs: improve handling of re-applying a configuration #117108

component-base/logs: improve handling of re-applying a configuration #117108

pohly commented Apr 5, 2023 •

edited

k8s-ci-robot commented Apr 5, 2023

k8s-triage-robot commented Apr 5, 2023

pohly commented Apr 6, 2023

logicalhan commented Apr 6, 2023

cici37 commented Apr 6, 2023

pohly commented May 23, 2023

liggitt May 23, 2023

pohly May 23, 2023

liggitt May 23, 2023 •

edited

pohly May 24, 2023

liggitt May 24, 2023

pohly May 24, 2023

liggitt Jul 5, 2023

pohly Jul 5, 2023

liggitt Jul 5, 2023

pohly Jul 5, 2023

pohly commented Jun 26, 2023

pohly commented Jun 29, 2023

liggitt Jul 5, 2023

pohly Jul 5, 2023

liggitt commented Jul 5, 2023

k8s-ci-robot commented Jul 5, 2023

k8s-ci-robot commented Jul 5, 2023

k8s-triage-robot commented Jul 5, 2023

k8s-triage-robot commented Jul 5, 2023

k8s-triage-robot commented Jul 6, 2023

component-base/logs: improve handling of re-applying a configuration #117108

component-base/logs: improve handling of re-applying a configuration #117108

Conversation

pohly commented Apr 5, 2023 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Apr 5, 2023

k8s-triage-robot commented Apr 5, 2023

pohly commented Apr 6, 2023

logicalhan commented Apr 6, 2023

cici37 commented Apr 6, 2023

pohly commented May 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt May 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Jun 26, 2023

pohly commented Jun 29, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt commented Jul 5, 2023

k8s-ci-robot commented Jul 5, 2023

k8s-ci-robot commented Jul 5, 2023

k8s-triage-robot commented Jul 5, 2023

k8s-triage-robot commented Jul 5, 2023

k8s-triage-robot commented Jul 6, 2023

pohly commented Apr 5, 2023 •

edited

liggitt May 23, 2023 •

edited