Skip to content

ENH: add quantile #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from
Draft

ENH: add quantile #341

wants to merge 22 commits into from

Conversation

betatim
Copy link
Member

@betatim betatim commented Jun 30, 2025

Closes #340

This makes quantile available when the version of Scipy is not new enough to support array API inputs.

This is my first venture into array-api-extra land. The idea in #340 was to use the scipy implementation if available and otherwise use a copy of that implementation. This is what this PR implements.

Given that this was mostly a "copy&paste" job I thought it would be a good task to learn more about using "agents", so I asked Cursor to do the work, etc. From what I can tell reviewing the diff it looks sensible. But the caveat is that I've never thought about how to write quantile myself. So you might consider this "the blind leading the blind". There are a few differences to the scipy implementation, mostly more "make this an array" and dealing with squashing/maintaining dimension handling. I can't explain to you why these differences make sense :-/ - both implementations seem sensible when looked at individually. Maybe someone here can explain how that fits with the fact that the diff isn't zero :D

A few of the pre-commit checks related to numpydoc conventions don't understand the idea of only having a full docstring in _delegation.py and not also in _funcs.py - I turned off the checks when there were no other errors left. I'm not sure I get the point of some of the ruff checks that failed, hence the noqa comments on the raise statements (can change them to assign the message to a variable first)

betatim added 2 commits June 30, 2025 13:46
This makes quantile available when the version of Scipy is not new
enough to support array API inputs.
@lucascolley lucascolley changed the title Make quantile available for old scipy version ENH: add quantile Jun 30, 2025
@lucascolley lucascolley added enhancement New feature or request new function labels Jun 30, 2025
@lucascolley
Copy link
Member

lucascolley commented Jun 30, 2025

This is my first venture into array-api-extra land.

welcome :)

Given that this was mostly a "copy&paste" job I thought it would be a good task to learn more about using "agents", so I asked Cursor to do the work, etc. From what I can tell reviewing the diff it looks sensible. But the caveat is that I've never thought about how to write quantile myself. So you might consider this "the blind leading the blind". There are a few differences to the scipy implementation, mostly more "make this an array" and dealing with squashing/maintaining dimension handling. I can't explain to you why these differences make sense :-/ - both implementations seem sensible when looked at individually. Maybe someone here can explain how that fits with the fact that the diff isn't zero :D

I would rather merge any changes to the implementation into SciPy first, before diverging into uncharted territory with LLM magic here.

A few of the pre-commit checks related to numpydoc conventions don't understand the idea of only having a full docstring in _delegation.py and not also in _funcs.py - I turned off the checks when there were no other errors left. I'm not sure I get the point of some of the ruff checks that failed, hence the noqa comments on the raise statements (can change them to assign the message to a variable first)

Sounds fine, existing noqa and numpydoc: ignore comments should be a good guide to what we have typically been doing so far.

@lucascolley lucascolley marked this pull request as draft June 30, 2025 12:27
@lucascolley lucascolley added this to the 0.8.1 milestone Jun 30, 2025
[1., 3.]], dtype=array_api_strict.float64)
"""
xp = array_namespace(x, q) if xp is None else xp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-learn/scikit-learn#31671 (comment) suggests that delegation to some existing array libraries may be desirable here

@lucascolley
Copy link
Member

I appeased the linter for you

Comment on lines 36 to 39
# Promote to common dtype
x = xp.astype(x, xp.float64)
q_arr = xp.astype(q_arr, xp.float64)
q_arr = xp.asarray(q_arr, device=_compat.device(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "common dtype" is always float64?

elif not isinstance(axis, int): # pyright: ignore[reportUnnecessaryIsInstance]
raise ValueError("`axis` must be an integer or None.") # noqa: EM101
elif axis >= ndim or axis < -ndim:
raise ValueError("`axis` is not compatible with the shapes of the inputs.") # noqa: EM101
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could specify out of ndim bounds

@lucascolley
Copy link
Member

I am breaking out some of the lint changes into gh-344

@lucascolley
Copy link
Member

diff all tidied up

@betatim
Copy link
Member Author

betatim commented Jul 1, 2025

With the changes to the liniting/pre-commit setup: is https://github.com/evilmartians/lefthook the lefthook you are now using? I think the docs/pyproject.toml need tidying up because it still refers to pre-commit but I think that is completely gone now.

And/or: it would be nice to have instructions for how to run the required tools to make the CI checks happy that doesn't involve tools calling tools calling tools. I'm not sure I care enough about pixi (or you know one of the five other tools-that-call-tools that other projects use) to learn enough about it to figure out if I can safely install it in a conda env and it won't touch anything outside of that env - or if it needs installing "globally" or in parallel to conda env etc. This isn't specific to pixi, but more because five different repos use six different of these tools :-/

@lucascolley
Copy link
Member

I think the docs/pyproject.toml need tidying up because it still refers to pre-commit but I think that is completely gone now.

Please point out what seems to need tidying up! Bear in mind that the concept of a git pre-commit hook and the tool pre-commit are two different things.

And/or: it would be nice to have instructions for how to run the required tools to make the CI checks happy that doesn't involve tools calling tools calling tools.

If you want an easy workflow, everything you need should be described at https://data-apis.org/array-api-extra/contributing.html#development-workflow. It is effectively:

  1. install Pixi
  2. pixi run lint, or the other commands documented there.

It can't get much simpler than that (this is a huge improvement on many projects I have contributed to). I am keen to make this workflow the best it can be, so let me know if you have any ideas for improvements.

If you do not want to install Pixi, or otherwise want to use a different workflow, that is fine, and I am keen to accept contributions which accommodate for alternative workflows, while not being a detriment to other workflows.

I will not be volunteering my time to work on speculative accommodations for alternative workflows, though, as I strongly believe that the Pixi workflow is both essential to and sufficient for the project.

@lucascolley
Copy link
Member

With the changes to the liniting/pre-commit setup: is evilmartians/lefthook the lefthook you are now using?

Yes

@lucascolley
Copy link
Member

This isn't specific to pixi, but more because five different repos use six different of these tools :-/

I feel your pain, which is why I am so excited about Pixi, as a potential replacement for all of them.

@betatim
Copy link
Member Author

betatim commented Jul 1, 2025

I added a delegation to dask to solve some of the errors related to take_along_axis, and because it is what we want because it implements quantile (more delegation to torch, cupy, etc to follow). However now a basic test fails. We can reproduce this with just basic dask:

import array_api_compat.dask.array as xp

x = xp.asarray([1, 2, 3, 4, 5])
xp.quantile(x, 0.5)

this leads to the following error:

File ~/miniconda/envs/array-api-extra/lib/python3.13/site-packages/dask/array/reductions.py:1567, in quantile(a, q, axis, out, overwrite_input, method, keepdims, weights, interpolation)
   1562 """
   1563 This works by automatically chunking the reduced axes to a single chunk if necessary
   1564 and then calling ``numpy.quantile`` function across the remaining dimensions
   1565 """
   1566 if axis is None:
-> 1567     if any(n_blocks > 1 for n_blocks in a.numblocks):
   1568         raise NotImplementedError(
   1569             "The da.quantile function only works along an axis.  "
   1570             "The full algorithm is difficult to do in parallel"
   1571         )
   1572     else:

File ~/miniconda/envs/array-api-extra/lib/python3.13/site-packages/dask/array/reductions.py:151, in any(a, axis, keepdims, split_every, out)
    149 @derived_from(np)
    150 def any(a, axis=None, keepdims=False, split_every=None, out=None):
--> 151     return reduction(
    152         a,
    153         chunk.any,
    154         chunk.any,
    155         axis=axis,
    156         keepdims=keepdims,
    157         dtype="bool",
    158         split_every=split_every,
    159         out=out,
    160     )

File ~/miniconda/envs/array-api-extra/lib/python3.13/site-packages/dask/array/_reductions_generic.py:141, in reduction(x, chunk, aggregate, axis, keepdims, dtype, split_every, combine, name, out, concatenate, output_size, meta, weights)
     41 """General version of reductions
     42
     43 Parameters
   (...)    138
    139 """
    140 if axis is None:
--> 141     axis = tuple(range(x.ndim))
    142 if isinstance(axis, Integral):
    143     axis = (axis,)

AttributeError: 'generator' object has no attribute 'ndim'

What is the normal procedure for dealing with something like this? Can we skip the test for dask?

@lucascolley
Copy link
Member

What is the normal procedure for dealing with something like this? Can we skip the test for dask?

Yep, the normal procedure would be:

  • open an upstream issue
  • add a skip or xfail linking to the upstream issue, like
    @pytest.mark.xfail_xp_backend(Backend.SPARSE, reason="no argsort")
    @pytest.mark.skip_xp_backend(Backend.ARRAY_API_STRICTEST, reason="no unique_values")
    class TestSetDiff1D:
    @pytest.mark.xfail_xp_backend(Backend.DASK, reason="NaN-shaped arrays")
    @pytest.mark.xfail_xp_backend(
    Backend.TORCH, reason="index_select not implemented for uint32"
    )
    @pytest.mark.xfail_xp_backend(
    Backend.TORCH_GPU, reason="index_select not implemented for uint32"
    )
    def test_setdiff1d(self, xp: ModuleType):

@betatim
Copy link
Member Author

betatim commented Jul 1, 2025

This isn't specific to pixi, but more because five different repos use six different of these tools :-/

I feel your pain, which is why I am so excited about Pixi, as a potential replacement for all of them.

Can I install pixi in an existing conda env and it won't touch or modify things outside of that conda env? I don't want to blindly install something that (potentially) messes with my shell or might end up breaking things for other projects. And figuring out the answer to that question isn't easy, I think. Especially because most of the instructions are "just install pixi" - which if I was starting from scratch would be fine, but I want the "install pixi the hard way" instructions so I know what it might (or might not) break

@lucascolley
Copy link
Member

Can I install pixi in an existing conda env and it won't touch or modify things outside of that conda env?

I don't think that is possible, but I will ask and let you know if it is.

@lucascolley
Copy link
Member

I don't want to blindly install something that (potentially) messes with my shell or might end up breaking things for other projects. And figuring out the answer to that question isn't easy, I think.

The best I can do is say that I have had no such problems 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new function
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: add quantile
3 participants