-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Fix sample weight handling in SAG(A) #31675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Running the test as follows import numpy as np
from scipy.stats import kstest
from sklearn.linear_model.tests.test_sag import sag, squared_dloss, get_step_size
from sklearn.datasets import make_regression, make_classification
from sklearn.utils._testing import assert_allclose_dense_sparse
alpha=1
n_features = 6
rng = np.random.RandomState(0)
X, y = make_classification(n_samples=1000,random_state=77,n_features=n_features)
weights = rng.randint(0,5,size=X.shape[0])
X_repeated = np.repeat(X,weights,axis=0)
y_repeated = np.repeat(y,weights,axis=0)
weights_w_all = np.zeros([n_features,50])
weights_r_all = np.zeros([n_features,50])
step_size_w=get_step_size(X,alpha,True,sample_weight=weights)
step_size_r= get_step_size(X_repeated,alpha,True)
for random_state in np.arange(50):
weights_w, int_w = sag(X,y,step_size=step_size_w,sample_weight=weights,alpha=alpha,dloss=squared_dloss,random_state=random_state)
weights_w_all[:,random_state] = weights_w
weights_r, int_r = sag(X_repeated,y_repeated,step_size=step_size_r,alpha=alpha,dloss=squared_dloss,random_state=random_state)
weights_r_all[:,random_state] = weights_r
print(kstest(weights_r_all[0],weights_w_all[0])) Now gives the result
|
@snath-xoc at the meeting, you mentioned that the statistical test would not pass for some datasets. Could you please post an example and add a TODO item to the PR to investigate this problem. Also, whenever penalty is non-zero, the problem is strictly convex and the solution show be unique. So it should be possible to write deterministic tests (with various random seed values) instead of statistical tests to:
This might require setting |
Reference Issues/PRs
Fixes issue on sample weight handling within SAG(A) #31536.
What does this implement/fix? Explain your changes.
SAG(A) now accounts for sample weights by:
TO DO:
get_auto_step_size