Fix sample weight handling in SAG(A) #31675

snath-xoc · 2025-06-28T15:17:32Z

Reference Issues/PRs

Fixes issue on sample weight handling within SAG(A) #31536.

What does this implement/fix? Explain your changes.

SAG(A) now accounts for sample weights by:

Applying a weighted sampling of random indices when sample weights are not None. This should be equivalent to sampling from a repeated dataset uniformly (i.e., frequency based weighting)
Calculates the step size by accounting for sample weights

TO DO:

Apply the same sample weight corrections in get_auto_step_size

github-actions · 2025-06-28T15:18:25Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 8615425. Link to the linter CI: here}

snath-xoc · 2025-06-28T15:20:50Z

Running the test as follows

import numpy as np
from scipy.stats import kstest
from sklearn.linear_model.tests.test_sag import sag, squared_dloss, get_step_size
from sklearn.datasets import make_regression, make_classification
from sklearn.utils._testing import assert_allclose_dense_sparse

alpha=1

n_features = 6

rng = np.random.RandomState(0)
X, y = make_classification(n_samples=1000,random_state=77,n_features=n_features)
weights = rng.randint(0,5,size=X.shape[0])

X_repeated = np.repeat(X,weights,axis=0)
y_repeated = np.repeat(y,weights,axis=0)

weights_w_all = np.zeros([n_features,50])
weights_r_all = np.zeros([n_features,50])

step_size_w=get_step_size(X,alpha,True,sample_weight=weights)
step_size_r= get_step_size(X_repeated,alpha,True)

for random_state in np.arange(50):

    weights_w, int_w = sag(X,y,step_size=step_size_w,sample_weight=weights,alpha=alpha,dloss=squared_dloss,random_state=random_state)
    weights_w_all[:,random_state] = weights_w
    weights_r, int_r = sag(X_repeated,y_repeated,step_size=step_size_r,alpha=alpha,dloss=squared_dloss,random_state=random_state)
    weights_r_all[:,random_state] = weights_r

print(kstest(weights_r_all[0],weights_w_all[0]))

Now gives the result

KstestResult(statistic=np.float64(0.2), pvalue=np.float64(0.2719135601522248), statistic_location=np.float64(-0.004336382594871251), statistic_sign=np.int8(1))

ogrisel · 2025-07-01T13:36:03Z

@snath-xoc at the meeting, you mentioned that the statistical test would not pass for some datasets. Could you please post an example and add a TODO item to the PR to investigate this problem.

Also, whenever penalty is non-zero, the problem is strictly convex and the solution show be unique. So it should be possible to write deterministic tests (with various random seed values) instead of statistical tests to:

craft a minimal reproducer that does not evolve running a KS-test;
check whether the proposed fixed can fix the bug for all possible seeds in the [0, 99] range for instance.

This might require setting tol to a low enough value, max_iter to a large enough value, and checking that no ConvergenceWarning is raised.

snath-xoc added 18 commits April 25, 2025 16:38

lazy test added

82706dd

Merge github.com:snath-xoc/scikit-learn

1908831

kerge branch 'main' of github.com:snath-xoc/scikit-learn

ffcf4d4

Merge branch 'main' of github.com:snath-xoc/scikit-learn

2184bff

modify tests

1fa4984

modify lml gradient tests

e11bc4e

modify lml grad tests

92f04fe

lazy test added

06d6139

Merge branch 'main' into add_gradient_check_gpr

33db8c2

merge main

68bf517

Merge branch 'main' into add_gradient_check_gpr

5da4237

restore sag

bbbc657

initial fix

ddff54d

implement preliminary sag fix

3a73e7a

lazy test added

0adb631

merge main

de36de3

Merge branch 'main' of github.com:snath-xoc/scikit-learn into sag_tests

d160736

Merge branch 'main' into sag_tests

9b78491

github-actions bot added module:gaussian_process module:linear_model labels Jun 28, 2025

revert gpr changes

8615425

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix sample weight handling in SAG(A) #31675

Fix sample weight handling in SAG(A) #31675

Uh oh!

snath-xoc commented Jun 28, 2025

Uh oh!

github-actions bot commented Jun 28, 2025 •

edited

Loading

Uh oh!

snath-xoc commented Jun 28, 2025

Uh oh!

ogrisel commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fix sample weight handling in SAG(A) #31675

Are you sure you want to change the base?

Fix sample weight handling in SAG(A) #31675

Uh oh!

Conversation

snath-xoc commented Jun 28, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

snath-xoc commented Jun 28, 2025

Uh oh!

ogrisel commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jun 28, 2025 •

edited

Loading

ogrisel commented Jul 1, 2025 •

edited

Loading