-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Add type annotations to S3 hook module #10164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type annotations to S3 hook module #10164
Conversation
When I ran the $ MP_DIR=$(mktemp -d); mypy --linecount-report ${MP_DIR} airflow/providers/amazon/aws/hooks; \
cat "${MP_DIR}/linecount.txt" | grep providers.amazon.aws.hooks.s3 | grep -v example_dags | \
awk '$4 != 0 {print 100.00 * ($3/$4), $3, $4, $5}'| sort -g However, it looks like when I add annotations to def __init__(self, *args: str, **kwargs: str):
super().__init__(client_type='s3', *args, **kwargs) I can get to 26/26, but get this error: airflow/providers/amazon/aws/hooks/s3.py:110: error: "__init__" of "AwsBaseHook" gets multiple
values for keyword argument "client_type"
super().__init__(client_type='s3', *args, **kwargs)
^ I'm guessing it doesn't like something about how Either way, I just pushed up 573d9a6d73cf565967598b806d411a4a84b6e567 to add these and ignore the type check. That gets us to 26/26. That said, any ideas what might be going on here? I would be happy to revert and change this if we can explain better what's going on or if we are perfectly content to have 25/26 methods covered, which I'm guessing we are =) |
Can't tell if the CI failure is related or not. |
@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook): | |||
:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook` | |||
""" | |||
|
|||
def __init__(self, *args, **kwargs): | |||
super().__init__(client_type='s3', *args, **kwargs) | |||
def __init__(self, *args: str, **kwargs: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__(self, *args: str, **kwargs: str): | |
def __init__(self, *args, **kwargs): |
We do not add type hints to those (usually it is tuple
and dictionary
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, that makes sense. Any ideas on how to get the type hint to work here? Okay to ignore like I did in 573d9a6? Or are we content to just leave it be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at some other examples, it seems like you can only get the type hinting to work if and when there are other arguments. Also kinda makes me wonder if we need the def __init__
at all or if we could just call the super
right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to just revert 573d9a6
.
prefix: str = '', | ||
delimiter: str = '', | ||
page_size: Optional[int] = None, | ||
max_items: Optional[int] = None) -> Optional[list]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefix: str = '', | |
delimiter: str = '', | |
page_size: Optional[int] = None, | |
max_items: Optional[int] = None) -> Optional[list]: | |
prefix: Optional[str] = None, | |
delimiter: Optional[str] = None, | |
page_size: Optional[int] = None, | |
max_items: Optional[int] = None) -> Optional[list]: | |
prefix = prefix or "" | |
delimiter = delimiter or "" |
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid strings as default values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes more sense if the idea behind Optional[str]
is that it can be a str
or None
, yep.
Import S3Transfer object from boto3 as well as BytesIO to add type annotations.
This reverts commit 573d9a6d73cf565967598b806d411a4a84b6e567.
06ea37b
to
138b73c
Compare
@turbaszek - just made a few changes and rebased on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! 🚀
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None): | |||
|
|||
@provide_bucket_name | |||
@unify_bucket_name_and_key | |||
def get_key(self, key, bucket_name=None): | |||
def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems incorrect to have annotated with return type S3Transfer though actual return type is boto3.s3.Object. This is giving pycharm fits due to the mismatched interface. Hopefully no impact at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Could you share the output you are seeing from pycharm? I cannot figure out where the boto3.s3.Object
is defined. I have been through the boto3
, botocore
and airflow
code as well as some docs but I have not been able to.
I think we need the boto3-type-annotations
to solve this issue. Check out this Stack Overflow question also.
This is something @mik-laj
once brought up in #11297 it looks like. I don't have time to tackle it right now, but I could in the near future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just dropped a comment over in #11297 as well: #11297 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Cooper. Saying it's giving pycharm fits might have been hyperbole by me. It's showing an error message due to the confusion over types:
Unresolved attribute reference 'download_fileobj' for class 'S3Transfer'
When I step into the get_key function it's showing rtype as Object:
:rtype: boto3.s3.Object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://pypi.org/project/boto3-stubs/ might be a helpful project here.
What
Add type annotations to the
S3Hook
module.Import
S3Transfer
object fromboto3
as well asBytesIO
to add type annotations.Why
Related: #9708
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.