Skip to content

feat: type-check-only dtype protocol #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nstarman
Copy link
Collaborator

@nstarman nstarman commented Jun 22, 2025

An alternative to and inspired by https://github.com/data-apis/array-api-typing/blob/25412da51481363924c284f772926394d602d449/src/array_api_typing/_misc_objects.py
that provides a little more detail, but should hopefully still be safe behind a type_check_only barrier.

Ping @NeilGirdhar for thoughts.

@nstarman nstarman requested a review from jorenham June 22, 2025 18:17
Co-authored-by: NeilGirdhar
Signed-off-by: Nathaniel Starkman <[email protected]>
@NeilGirdhar
Copy link
Contributor

Looks good to me. It satisfies the main thing I want, which is a placeholder until we can have the real protocols, whether it's this or something else.

@@ -0,0 +1,21 @@
__all__ = ("DType",)

from typing import Protocol, type_check_only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_check_only does not exist at runtime

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decorator is itself not available at runtime. It is mainly intended to mark classes that are defined in type stub files if an implementation returns an instance of a private class:

Dagnabbit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have no idea why that is...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dagnabbit.

I agree

"FBT", # flake8-boolean-trap
"FIX", # flake8-fixme
"ISC001", # Conflicts with formatter
"PLW1641", # Object does not implement `__hash__` method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pretty good check actually:

>>> class A:
...     def __eq__(self, other):
...         return True
...         
>>> hash(A())
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    hash(A())
    ~~~~^^^^^
TypeError: unhashable type: 'A'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in, we should add specific ignores for this rather a blanket ignore, like I'm doing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not prohibit dtypes from being unhashable, which is what this does (but not on mypy: python/mypy#18622)

Comment on lines +6 to +21
@type_check_only
class DType(Protocol):
"""Protocol for classes that represent a data type.

This `typing.Protocol` is `typing.type_check_only` and cannot be used at
runtime. This limitation is intentional since the array API structurally
defines a ``dtype`` object as anything with an ``__eq__`` method that
compares to another ``dtype`` object. This broad definition means that most
Python objects will satisfy this protocol and can be erroneously considered
a ``dtype``.

"""

def __eq__(self, other: object, /) -> bool:
"""Computes the truth value of ``self == other`` in order to test for data type object equality.""" # noqa: E501
...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is equivalent to

class DType(Protocol):
    __hash__ = None

The __eq__ method is already implemented by builtins.object. But if you override it, and don't also provide a __hash__ method, then __hash__ = None (I hate it, but it's a python quirk). Pyright understands this, but mypy does not. So this will lead to inconsistent outcomes between type-checkers.

If you were to add a __hash__ method here that is identical to object.__hash__, then the resulting protocol would be exactly equivalent to a builtins.object. So given the current array api spec, it's not possible to structurally type a dtype in a meaningful way, I'm afraid.

And just to be clear, I'm not saying that we should replace this protocol with a type alias to object or Any or something. There are no problems that it would solve.

So although I certainly appreciate your work here, I just don't think this is a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are dtypes not guaranteed to be hashable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They neither are nor aren't guaranteed to be hashable. But there's no way to express that in Python's type system.

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 23, 2025

And just to be clear, I'm not saying that we should replace this protocol with a type alias to object or Any or something. There are no problems that it would solve.

So although I certainly appreciate your work here, I just don't think this is a good idea.

Fair points.
I do agree with @NeilGirdhar that we need to have some type of named reference — typalias, protocol, virtual subclass.
What do you suggest?

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jun 23, 2025

There are no problems that it would solve.

Hold on a minute. There is a problem that it solves, which is that it gives users something to point to. Right now, users are all creating their own type aliases.

As I'm sure you know, catching type errors isn't the only purpose of type annotations. They also act as structured documentation. Having everyone use the same DType type alias reduces the mental load when interpreting code. This is especially true for dtypes, which are convoluted (there are dtype objects and dtype types, and dtype names, and dtype categories).

I think we should be working towards providing stubs for users to point to. You will still be able to engineer whatever solution you want. Nothing is carved in stone.

What do you think?

@jorenham
Copy link
Member

Right now, users are all creating their own type aliases.

That's not a problem.

As I'm sure you know, catching type errors isn't the only purpose of type annotations. They also act as structured documentation.

Yes, but that is secondary. Type safety significantly more important.

By providing this fake dtype alias, we are pretending that it does something. Once we actually are able to annotate dtypes (which is possible, but in a different way than that you're picturing), then most people won't use it. Because after all, they can already do dtype: xpt.DType, which looks like it does the same thing, and is a lot easier to write. So why use that new complicated way?

Also, I don't see the documentation value of dtype: DType. Having a dtype: Incomplete or something is a lot more informative. And if users decide to write type DType = object so that they can pretend to annotate dtypes, then that's on them.

Nothing is carved in stone.

In an ideal world, yes. But in reality, everything we do will be carved in stone. One reason for that is backwards compatibility. The other reason is that most users don't want to (or aren't able to) change their code: Many corporate banks are still using Cobol, many data scientists still use Python 2.7, and the webserver of my family company still runs on ubuntu 16.04.
So if xpt.DType will exist, it will likely exist forever, unless we put a lot of effort into correcting that mistake.

Also, I think we can do better. So let's.

@nstarman
Copy link
Collaborator Author

nstarman commented Jun 30, 2025

which is possible, but in a different way than that you're picturing... So why use that new complicated way?

...

@jorenham
Copy link
Member

which is possible, but in a different way than that you're picturing... So why use that new complicated way?

...

What I was trying to say there, is that there will be a way to annotate a dtype, but that will look something like this:

def dtype_of_array[T](array: xpt.HasDType[T]) -> T: ...

Which annotates the dtype in the best possible way.

But if your code already looks like this:

def dtype_of_array(array: xpt.Array) -> xpt.DType: ...

Then I expect many users to keep it that way, as it looks a lot cleaner.

Without this alias, it would've have looked something like

def dtype_of_array(array: xpt.Array) -> Any: ...

or

type DType = Any  # TODO: replace once array-api-typing is released

def dtype_of_array(array: xpt.Array) -> DType: ...

Then more users will change this to (array: xpt.HasDType[T]) -> T.

@NeilGirdhar
Copy link
Contributor

NeilGirdhar commented Jun 30, 2025

What I was trying to say there, is that there will be a way to annotate a dtype, but that will look something like this:

Got it. Thanks for taking the time to explain. Your explanation satisfies me, not sure what the others think.

One reason for that is backwards compatibility. The other reason is that most users don't want to (or aren't able to) change their code: Many corporate banks are still using Cobol, many data scientists still use Python 2.7, and the webserver of my family company still runs on ubuntu 16.04.
So if xpt.DType will exist, it will likely exist forever, unless we put a lot of effort into correcting that mistake.

I see your point, but we are not even talking about version 0.0.1. People using array-api-typing cannot count on any backwards compatibility—even in practice. And these are just typing annotations. Worst case, we break someone's type checking. Even in the standard library, there are typing proposals that will break checking and the typing committee members have said that typing doesn't have the same guarantees as runtime code.

I think it would help to remember the old maxim: Don't let the perfect be the enemy of the good. In this case, you have a good argument why a mediocre solution doesn't have the benefits we thought it would because the final solution will be so different in form. That makes sense.

But in general, we may want to implement mediocre solutions while we wait for people to have the time to craft and review more perfect solutions. I think it would be nice to get those mediocre solutions out because it allows people to start using array-api-typing sooner. This can identify typing issues sooner, and some people may even contribute to the project.

Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project).

What do you think?

@lucascolley
Copy link
Member

Agreed with your thoughts @NeilGirdhar !

@nstarman
Copy link
Collaborator Author

@jorenham

What about making a DTypeT typevar now?
And also making Array[DTypeT]

def dtype_of_array(array: xpt.Array[DTypeT]) -> xpt.DTypeT: ...

That should work both now and with

def dtype_of_array(array: xpt.HasDType[DTypeT]) -> DTypeT: ...

@jorenham
Copy link
Member

jorenham commented Jun 30, 2025

People using array-api-typing cannot count on any backwards compatibility—even in practice.

That's news to me. I kinda like it, though.
But even so, if we were to just delete xpt.DType at some point, then some might indeed update their codebase. Others, however, will not upgrade because of that, which is arguably a bigger problem.

Even in the standard library, there are typing proposals that will break checking and the typing committee members have said that typing doesn't have the same guarantees as runtime code.

But that's pretty rare, no? I can only think of one example (related to synthesized __replace__ methods that influence covariance inference), which wasn't an intentional breaking change, and the plan is to correct it.

Don't let the perfect be the enemy of the good.

I completely agree. However, in this case, I wouldn't consider this "good" or "mediocre" solution. Because f(dtype: xpt.DType): ... is precisely the same thing as f(dtype): ..., as far as static type-checkers are concerned. So in that sense, this DType alias is not even a solution — or as Shakespeare once put it: "'tis but a no-op".

Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project).

You already can; they're no-ops, after all. So no need to wait for array-api-typing.

@jorenham
Copy link
Member

What about making a DTypeT typevar now?

Do you mean as xpt.DTypeT?

@lucascolley
Copy link
Member

Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project).

You already can; they're no-ops, after all. So no need to wait for array-api-typing.

That sounds like a bad idea to me. The point isn't that we gain anything from a static type-checking perspective, but rather that it forces developers to consider the implications of their decisions w.r.t. typing (even if there is no unavoidable enforcement with a linter). It is a lot easier to ignore the "types" when they aren't staring you in the face, right?

@jorenham
Copy link
Member

The point isn't that we gain anything from a static type-checking perspective, but rather that it forces developers to consider the implications of their decisions w.r.t. typing (even if there is no unavoidable enforcement with a linter). It is a lot easier to ignore the "types" when they aren't staring you in the face, right?

I see what you're saying, but I just don't think that we should design our API based purely on speculative subjective interpretations of type annotations, and rather base ourselves on things that can be measured and verified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants