feat: type-check-only dtype protocol #31

nstarman · 2025-06-22T18:15:56Z

An alternative to and inspired by https://github.com/data-apis/array-api-typing/blob/25412da51481363924c284f772926394d602d449/src/array_api_typing/_misc_objects.py
that provides a little more detail, but should hopefully still be safe behind a type_check_only barrier.

Ping @NeilGirdhar for thoughts.

Co-authored-by: NeilGirdhar Signed-off-by: Nathaniel Starkman <[email protected]>

NeilGirdhar · 2025-06-23T17:03:29Z

Looks good to me. It satisfies the main thing I want, which is a placeholder until we can have the real protocols, whether it's this or something else.

jorenham · 2025-06-23T17:48:28Z

src/array_api_typing/_misc.py

@@ -0,0 +1,21 @@
+__all__ = ("DType",)
+
+from typing import Protocol, type_check_only


type_check_only does not exist at runtime

This decorator is itself not available at runtime. It is mainly intended to mark classes that are defined in type stub files if an implementation returns an instance of a private class:

Dagnabbit.

I honestly have no idea why that is...

Dagnabbit.

I agree

jorenham · 2025-06-23T17:49:45Z

pyproject.toml

      "FBT",    # flake8-boolean-trap
      "FIX",    # flake8-fixme
      "ISC001", # Conflicts with formatter
+      "PLW1641", # Object does not implement `__hash__` method


this is a pretty good check actually:

>>> class A: ... def __eq__(self, other): ... return True ... >>> hash(A()) Traceback (most recent call last): File "<python-input-1>", line 1, in <module> hash(A()) ~~~~^^^^^ TypeError: unhashable type: 'A'

As in, we should add specific ignores for this rather a blanket ignore, like I'm doing here?

We should not prohibit dtypes from being unhashable, which is what this does (but not on mypy: python/mypy#18622)

jorenham · 2025-06-23T18:02:08Z

src/array_api_typing/_misc.py

+@type_check_only
+class DType(Protocol):
+    """Protocol for classes that represent a data type.
+
+    This `typing.Protocol` is `typing.type_check_only` and cannot be used at
+    runtime.  This limitation is intentional since the array API structurally
+    defines a ``dtype`` object as anything with an ``__eq__`` method that
+    compares to another ``dtype`` object. This broad definition means that most
+    Python objects will satisfy this protocol and can be erroneously considered
+    a ``dtype``.
+
+    """
+
+    def __eq__(self, other: object, /) -> bool:
+        """Computes the truth value of ``self == other`` in order to test for data type object equality."""  # noqa: E501
+        ...


This is equivalent to

class DType(Protocol): __hash__ = None

The __eq__ method is already implemented by builtins.object. But if you override it, and don't also provide a __hash__ method, then __hash__ = None (I hate it, but it's a python quirk). Pyright understands this, but mypy does not. So this will lead to inconsistent outcomes between type-checkers.

If you were to add a __hash__ method here that is identical to object.__hash__, then the resulting protocol would be exactly equivalent to a builtins.object. So given the current array api spec, it's not possible to structurally type a dtype in a meaningful way, I'm afraid.

And just to be clear, I'm not saying that we should replace this protocol with a type alias to object or Any or something. There are no problems that it would solve.

So although I certainly appreciate your work here, I just don't think this is a good idea.

Are dtypes not guaranteed to be hashable?

They neither are nor aren't guaranteed to be hashable. But there's no way to express that in Python's type system.

nstarman · 2025-06-23T18:36:25Z

And just to be clear, I'm not saying that we should replace this protocol with a type alias to object or Any or something. There are no problems that it would solve.

So although I certainly appreciate your work here, I just don't think this is a good idea.

Fair points.
I do agree with @NeilGirdhar that we need to have some type of named reference — typalias, protocol, virtual subclass.
What do you suggest?

NeilGirdhar · 2025-06-23T18:55:57Z

There are no problems that it would solve.

Hold on a minute. There is a problem that it solves, which is that it gives users something to point to. Right now, users are all creating their own type aliases.

As I'm sure you know, catching type errors isn't the only purpose of type annotations. They also act as structured documentation. Having everyone use the same DType type alias reduces the mental load when interpreting code. This is especially true for dtypes, which are convoluted (there are dtype objects and dtype types, and dtype names, and dtype categories).

I think we should be working towards providing stubs for users to point to. You will still be able to engineer whatever solution you want. Nothing is carved in stone.

What do you think?

jorenham · 2025-06-29T14:15:06Z

Right now, users are all creating their own type aliases.

That's not a problem.

As I'm sure you know, catching type errors isn't the only purpose of type annotations. They also act as structured documentation.

Yes, but that is secondary. Type safety significantly more important.

By providing this fake dtype alias, we are pretending that it does something. Once we actually are able to annotate dtypes (which is possible, but in a different way than that you're picturing), then most people won't use it. Because after all, they can already do dtype: xpt.DType, which looks like it does the same thing, and is a lot easier to write. So why use that new complicated way?

Also, I don't see the documentation value of dtype: DType. Having a dtype: Incomplete or something is a lot more informative. And if users decide to write type DType = object so that they can pretend to annotate dtypes, then that's on them.

Nothing is carved in stone.

In an ideal world, yes. But in reality, everything we do will be carved in stone. One reason for that is backwards compatibility. The other reason is that most users don't want to (or aren't able to) change their code: Many corporate banks are still using Cobol, many data scientists still use Python 2.7, and the webserver of my family company still runs on ubuntu 16.04.
So if xpt.DType will exist, it will likely exist forever, unless we put a lot of effort into correcting that mistake.

Also, I think we can do better. So let's.

nstarman · 2025-06-30T19:26:09Z

which is possible, but in a different way than that you're picturing... So why use that new complicated way?

...

jorenham · 2025-06-30T19:49:37Z

which is possible, but in a different way than that you're picturing... So why use that new complicated way?

...

What I was trying to say there, is that there will be a way to annotate a dtype, but that will look something like this:

def dtype_of_array[T](array: xpt.HasDType[T]) -> T: ...

Which annotates the dtype in the best possible way.

But if your code already looks like this:

def dtype_of_array(array: xpt.Array) -> xpt.DType: ...

Then I expect many users to keep it that way, as it looks a lot cleaner.

Without this alias, it would've have looked something like

def dtype_of_array(array: xpt.Array) -> Any: ...

or

type DType = Any  # TODO: replace once array-api-typing is released

def dtype_of_array(array: xpt.Array) -> DType: ...

Then more users will change this to (array: xpt.HasDType[T]) -> T.

NeilGirdhar · 2025-06-30T21:05:37Z

What I was trying to say there, is that there will be a way to annotate a dtype, but that will look something like this:

Got it. Thanks for taking the time to explain. Your explanation satisfies me, not sure what the others think.

One reason for that is backwards compatibility. The other reason is that most users don't want to (or aren't able to) change their code: Many corporate banks are still using Cobol, many data scientists still use Python 2.7, and the webserver of my family company still runs on ubuntu 16.04.
So if xpt.DType will exist, it will likely exist forever, unless we put a lot of effort into correcting that mistake.

I see your point, but we are not even talking about version 0.0.1. People using array-api-typing cannot count on any backwards compatibility—even in practice. And these are just typing annotations. Worst case, we break someone's type checking. Even in the standard library, there are typing proposals that will break checking and the typing committee members have said that typing doesn't have the same guarantees as runtime code.

I think it would help to remember the old maxim: Don't let the perfect be the enemy of the good. In this case, you have a good argument why a mediocre solution doesn't have the benefits we thought it would because the final solution will be so different in form. That makes sense.

But in general, we may want to implement mediocre solutions while we wait for people to have the time to craft and review more perfect solutions. I think it would be nice to get those mediocre solutions out because it allows people to start using array-api-typing sooner. This can identify typing issues sooner, and some people may even contribute to the project.

Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project).

What do you think?

lucascolley · 2025-06-30T21:14:21Z

Agreed with your thoughts @NeilGirdhar !

nstarman · 2025-06-30T21:26:55Z

@jorenham

What about making a DTypeT typevar now?
And also making Array[DTypeT]

def dtype_of_array(array: xpt.Array[DTypeT]) -> xpt.DTypeT: ...

That should work both now and with

def dtype_of_array(array: xpt.HasDType[DTypeT]) -> DTypeT: ...

jorenham · 2025-06-30T22:08:51Z

People using array-api-typing cannot count on any backwards compatibility—even in practice.

That's news to me. I kinda like it, though.
But even so, if we were to just delete xpt.DType at some point, then some might indeed update their codebase. Others, however, will not upgrade because of that, which is arguably a bigger problem.

Even in the standard library, there are typing proposals that will break checking and the typing committee members have said that typing doesn't have the same guarantees as runtime code.

But that's pretty rare, no? I can only think of one example (related to synthesized __replace__ methods that influence covariance inference), which wasn't an intentional breaking change, and the plan is to correct it.

Don't let the perfect be the enemy of the good.

I completely agree. However, in this case, I wouldn't consider this "good" or "mediocre" solution. Because f(dtype: xpt.DType): ... is precisely the same thing as f(dtype): ..., as far as static type-checkers are concerned. So in that sense, this DType alias is not even a solution — or as Shakespeare once put it: "'tis but a no-op".

Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project).

You already can; they're no-ops, after all. So no need to wait for array-api-typing.

jorenham · 2025-06-30T22:10:40Z

What about making a DTypeT typevar now?

Do you mean as xpt.DTypeT?

lucascolley · 2025-06-30T22:13:52Z

Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project).

You already can; they're no-ops, after all. So no need to wait for array-api-typing.

That sounds like a bad idea to me. The point isn't that we gain anything from a static type-checking perspective, but rather that it forces developers to consider the implications of their decisions w.r.t. typing (even if there is no unavoidable enforcement with a linter). It is a lot easier to ignore the "types" when they aren't staring you in the face, right?

jorenham · 2025-06-30T22:18:55Z

The point isn't that we gain anything from a static type-checking perspective, but rather that it forces developers to consider the implications of their decisions w.r.t. typing (even if there is no unavoidable enforcement with a linter). It is a lot easier to ignore the "types" when they aren't staring you in the face, right?

I see what you're saying, but I just don't think that we should design our API based purely on speculative subjective interpretations of type annotations, and rather base ourselves on things that can be measured and verified.

nstarman requested a review from jorenham June 22, 2025 18:17

feat: type-check-only dtype protocol

5e8294d

Co-authored-by: NeilGirdhar Signed-off-by: Nathaniel Starkman <[email protected]>

nstarman force-pushed the dtype branch from 6fe2333 to 5e8294d Compare June 22, 2025 18:56

jorenham reviewed Jun 23, 2025

View reviewed changes

		@@ -0,0 +1,21 @@
		__all__ = ("DType",)

		from typing import Protocol, type_check_only

feat: type-check-only dtype protocol #31

Are you sure you want to change the base?

feat: type-check-only dtype protocol #31

Conversation

nstarman commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeilGirdhar commented Jun 23, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nstarman commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeilGirdhar commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Jun 29, 2025

Uh oh!

nstarman commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Jun 30, 2025

Uh oh!

NeilGirdhar commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lucascolley commented Jun 30, 2025

Uh oh!

nstarman commented Jun 30, 2025

Uh oh!

jorenham commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorenham commented Jun 30, 2025

Uh oh!

lucascolley commented Jun 30, 2025

Uh oh!

jorenham commented Jun 30, 2025

Uh oh!

Uh oh!

nstarman commented Jun 22, 2025 •

edited

Loading

nstarman commented Jun 23, 2025 •

edited

Loading

NeilGirdhar commented Jun 23, 2025 •

edited

Loading

nstarman commented Jun 30, 2025 •

edited

Loading

NeilGirdhar commented Jun 30, 2025 •

edited

Loading

jorenham commented Jun 30, 2025 •

edited

Loading