Skip to content

#31635 FIX ensure proper point dropping in roc_curve with drop_interm… #31647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

imran4444shaik
Copy link

Fixes #31635

BUG Fix two issues in roc_curve with drop_intermediate=True

  1. Fix incorrect ordering of point dropping vs prepending (0,0):

    • Now prepends (0,0) and inf threshold BEFORE dropping intermediates
    • Ensures full curve context for collinearity checks
  2. Replace faulty collinearity heuristic with geometric check:

    • Uses vector cross-products to properly detect collinear points
    • Fixes cases where redundant points weren't being dropped
    • Handles both vertical/horizontal/diagonal segments

Examples fixed:

  • y_true = [0,0,0,0,1,1,1,1], y_score = [0,1,2,3,4,5,6,7]
    Now correctly returns 3 points instead of 4
  • Cases with horizontal/vertical segments now properly simplified

Maintains array-api compatibility and numerical stability.

Copy link

github-actions bot commented Jun 24, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 44f9242. Link to the linter CI: here

@imran4444shaik imran4444shaik force-pushed the main branch 3 times, most recently from b131ea0 to 7bf5946 Compare June 24, 2025 17:59
@Vishal-sys-code
Copy link

Vishal-sys-code commented Jun 28, 2025

Thanks for this. The introduction of collinear_free_mask() is a particularly elegant solution. Using the geometric cross product to detect collinearity is a robust and generalizable method, and it correctly avoids the limitations of the previous second-difference heuristic, which could miss redundant points in many realistic ROC segments.

One minor suggestion: ensure that fps is also prepended with 0 before the drop_intermediate logic (as is done for tps and thresholds), in case it's not already handled earlier in the flow. This keeps all arrays aligned and avoids subtle mismatches during masking.

Some minor linting issues remain; once those are resolved, this should be good to go. Do squash your commits also...

Only drops truly redundant points where both FPR and TPR are unchanged

FIX ensure proper point dropping in roc_curve by prepending (0,0) before drop_intermediate and maintaining current heuristic for test compatibility

Updated the test case

Updated commit

Improve roc_curve's drop_intermediate with geometric collinearity
@imran4444shaik imran4444shaik force-pushed the main branch 3 times, most recently from a0af0e9 to 9f9154d Compare June 29, 2025 17:52
…d use cross‑product collinearity mask (always keeping the first finite threshold)
@imran4444shaik imran4444shaik force-pushed the main branch 5 times, most recently from cb076fb to ac436b1 Compare June 30, 2025 13:25
@imran4444shaik imran4444shaik force-pushed the main branch 3 times, most recently from 88e5bc2 to eaa8aed Compare July 1, 2025 07:51
@imran4444shaik imran4444shaik force-pushed the main branch 2 times, most recently from 5e3f77d to f098d43 Compare July 1, 2025 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two bugs in sklearn.metrics.roc_curve: drop_intermediate=True option
2 participants