-
Notifications
You must be signed in to change notification settings - Fork 5
feat: type-check-only dtype protocol #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: NeilGirdhar Signed-off-by: Nathaniel Starkman <[email protected]>
Looks good to me. It satisfies the main thing I want, which is a placeholder until we can have the real protocols, whether it's this or something else. |
@@ -0,0 +1,21 @@ | |||
__all__ = ("DType",) | |||
|
|||
from typing import Protocol, type_check_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type_check_only
does not exist at runtime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decorator is itself not available at runtime. It is mainly intended to mark classes that are defined in type stub files if an implementation returns an instance of a private class:
Dagnabbit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have no idea why that is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dagnabbit.
I agree
"FBT", # flake8-boolean-trap | ||
"FIX", # flake8-fixme | ||
"ISC001", # Conflicts with formatter | ||
"PLW1641", # Object does not implement `__hash__` method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pretty good check actually:
>>> class A:
... def __eq__(self, other):
... return True
...
>>> hash(A())
Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
hash(A())
~~~~^^^^^
TypeError: unhashable type: 'A'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in, we should add specific ignores for this rather a blanket ignore, like I'm doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not prohibit dtypes from being unhashable, which is what this does (but not on mypy: python/mypy#18622)
@type_check_only | ||
class DType(Protocol): | ||
"""Protocol for classes that represent a data type. | ||
|
||
This `typing.Protocol` is `typing.type_check_only` and cannot be used at | ||
runtime. This limitation is intentional since the array API structurally | ||
defines a ``dtype`` object as anything with an ``__eq__`` method that | ||
compares to another ``dtype`` object. This broad definition means that most | ||
Python objects will satisfy this protocol and can be erroneously considered | ||
a ``dtype``. | ||
|
||
""" | ||
|
||
def __eq__(self, other: object, /) -> bool: | ||
"""Computes the truth value of ``self == other`` in order to test for data type object equality.""" # noqa: E501 | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to
class DType(Protocol):
__hash__ = None
The __eq__
method is already implemented by builtins.object
. But if you override it, and don't also provide a __hash__
method, then __hash__ = None
(I hate it, but it's a python quirk). Pyright understands this, but mypy does not. So this will lead to inconsistent outcomes between type-checkers.
If you were to add a __hash__
method here that is identical to object.__hash__
, then the resulting protocol would be exactly equivalent to a builtins.object
. So given the current array api spec, it's not possible to structurally type a dtype in a meaningful way, I'm afraid.
And just to be clear, I'm not saying that we should replace this protocol with a type alias to object
or Any
or something. There are no problems that it would solve.
So although I certainly appreciate your work here, I just don't think this is a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are dtypes not guaranteed to be hashable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They neither are nor aren't guaranteed to be hashable. But there's no way to express that in Python's type system.
Fair points. |
Hold on a minute. There is a problem that it solves, which is that it gives users something to point to. Right now, users are all creating their own type aliases. As I'm sure you know, catching type errors isn't the only purpose of type annotations. They also act as structured documentation. Having everyone use the same I think we should be working towards providing stubs for users to point to. You will still be able to engineer whatever solution you want. Nothing is carved in stone. What do you think? |
That's not a problem.
Yes, but that is secondary. Type safety significantly more important. By providing this fake dtype alias, we are pretending that it does something. Once we actually are able to annotate dtypes (which is possible, but in a different way than that you're picturing), then most people won't use it. Because after all, they can already do Also, I don't see the documentation value of
In an ideal world, yes. But in reality, everything we do will be carved in stone. One reason for that is backwards compatibility. The other reason is that most users don't want to (or aren't able to) change their code: Many corporate banks are still using Cobol, many data scientists still use Python 2.7, and the webserver of my family company still runs on ubuntu 16.04. Also, I think we can do better. So let's. |
... |
What I was trying to say there, is that there will be a way to annotate a dtype, but that will look something like this: def dtype_of_array[T](array: xpt.HasDType[T]) -> T: ... Which annotates the dtype in the best possible way. But if your code already looks like this: def dtype_of_array(array: xpt.Array) -> xpt.DType: ... Then I expect many users to keep it that way, as it looks a lot cleaner. Without this alias, it would've have looked something like def dtype_of_array(array: xpt.Array) -> Any: ... or type DType = Any # TODO: replace once array-api-typing is released
def dtype_of_array(array: xpt.Array) -> DType: ... Then more users will change this to |
Got it. Thanks for taking the time to explain. Your explanation satisfies me, not sure what the others think.
I see your point, but we are not even talking about version 0.0.1. People using array-api-typing cannot count on any backwards compatibility—even in practice. And these are just typing annotations. Worst case, we break someone's type checking. Even in the standard library, there are typing proposals that will break checking and the typing committee members have said that typing doesn't have the same guarantees as runtime code. I think it would help to remember the old maxim: Don't let the perfect be the enemy of the good. In this case, you have a good argument why a mediocre solution doesn't have the benefits we thought it would because the final solution will be so different in form. That makes sense. But in general, we may want to implement mediocre solutions while we wait for people to have the time to craft and review more perfect solutions. I think it would be nice to get those mediocre solutions out because it allows people to start using array-api-typing sooner. This can identify typing issues sooner, and some people may even contribute to the project. Also, it will mean that we can remove the typing code from array-api-compat (removing a maintenance burden from that project). What do you think? |
Agreed with your thoughts @NeilGirdhar ! |
What about making a
That should work both now and with def dtype_of_array(array: xpt.HasDType[DTypeT]) -> DTypeT: ... |
That's news to me. I kinda like it, though.
But that's pretty rare, no? I can only think of one example (related to synthesized
I completely agree. However, in this case, I wouldn't consider this "good" or "mediocre" solution. Because
You already can; they're no-ops, after all. So no need to wait for array-api-typing. |
Do you mean as |
That sounds like a bad idea to me. The point isn't that we gain anything from a static type-checking perspective, but rather that it forces developers to consider the implications of their decisions w.r.t. typing (even if there is no unavoidable enforcement with a linter). It is a lot easier to ignore the "types" when they aren't staring you in the face, right? |
I see what you're saying, but I just don't think that we should design our API based purely on speculative subjective interpretations of type annotations, and rather base ourselves on things that can be measured and verified. |
An alternative to and inspired by https://github.com/data-apis/array-api-typing/blob/25412da51481363924c284f772926394d602d449/src/array_api_typing/_misc_objects.py
that provides a little more detail, but should hopefully still be safe behind a
type_check_only
barrier.Ping @NeilGirdhar for thoughts.