Skip to content

Add type annotations to S3 hook module #10164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

coopergillan
Copy link
Contributor

What

Add type annotations to the S3Hook module.

Import S3Transfer object from boto3 as well as BytesIO to add type annotations.

Why

Related: #9708


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@boring-cyborg boring-cyborg bot added the provider:amazon AWS/Amazon - related issues label Aug 5, 2020
@coopergillan
Copy link
Contributor Author

coopergillan commented Aug 5, 2020

When I ran the mypy report for this module using the command from #9708 I kept getting 25 out of 26 checks. I can't couldn't figure out which check ~is` was missing:

$ MP_DIR=$(mktemp -d); mypy --linecount-report ${MP_DIR} airflow/providers/amazon/aws/hooks; \
cat "${MP_DIR}/linecount.txt" | grep providers.amazon.aws.hooks.s3 | grep -v example_dags | \
awk '$4 != 0 {print 100.00 * ($3/$4), $3, $4, $5}'| sort -g

However, it looks like when I add annotations to __init__:

def __init__(self, *args: str, **kwargs: str):
    super().__init__(client_type='s3', *args, **kwargs)

I can get to 26/26, but get this error:

airflow/providers/amazon/aws/hooks/s3.py:110: error: "__init__" of "AwsBaseHook" gets multiple
values for keyword argument "client_type"
            super().__init__(client_type='s3', *args, **kwargs)
            ^

I'm guessing it doesn't like something about how client_type is not specified in the args, but that seems like it should be fine due to the super.

Either way, I just pushed up 573d9a6d73cf565967598b806d411a4a84b6e567 to add these and ignore the type check. That gets us to 26/26. That said, any ideas what might be going on here?

I would be happy to revert and change this if we can explain better what's going on or if we are perfectly content to have 25/26 methods covered, which I'm guessing we are =)

@coopergillan
Copy link
Contributor Author

Can't tell if the CI failure is related or not.

@@ -104,11 +106,11 @@ class S3Hook(AwsBaseHook):
:class:`~airflow.providers.amazon.aws.hooks.base_aws.AwsBaseHook`
"""

def __init__(self, *args, **kwargs):
super().__init__(client_type='s3', *args, **kwargs)
def __init__(self, *args: str, **kwargs: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(self, *args: str, **kwargs: str):
def __init__(self, *args, **kwargs):

We do not add type hints to those (usually it is tuple and dictionary)

Copy link
Contributor Author

@coopergillan coopergillan Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, that makes sense. Any ideas on how to get the type hint to work here? Okay to ignore like I did in 573d9a6? Or are we content to just leave it be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at some other examples, it seems like you can only get the type hinting to work if and when there are other arguments. Also kinda makes me wonder if we need the def __init__ at all or if we could just call the super right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to just revert 573d9a6.

Comment on lines 210 to 213
prefix: str = '',
delimiter: str = '',
page_size: Optional[int] = None,
max_items: Optional[int] = None) -> Optional[list]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
prefix: str = '',
delimiter: str = '',
page_size: Optional[int] = None,
max_items: Optional[int] = None) -> Optional[list]:
prefix: Optional[str] = None,
delimiter: Optional[str] = None,
page_size: Optional[int] = None,
max_items: Optional[int] = None) -> Optional[list]:
prefix = prefix or ""
delimiter = delimiter or ""

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid strings as default values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense if the idea behind Optional[str] is that it can be a str or None, yep.

Import S3Transfer object from boto3 as well as BytesIO to add type
annotations.
This reverts commit 573d9a6d73cf565967598b806d411a4a84b6e567.
@coopergillan coopergillan force-pushed the add-type-annotations-to-s3-hook branch from 06ea37b to 138b73c Compare August 6, 2020 14:55
@coopergillan
Copy link
Contributor Author

@turbaszek - just made a few changes and rebased on apache/master. Let me know if anything else is needed here. I'll keep an eye on the builds also.

Copy link
Member

@turbaszek turbaszek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! 🚀

@turbaszek turbaszek merged commit 0c77ea8 into apache:master Aug 6, 2020
@coopergillan coopergillan deleted the add-type-annotations-to-s3-hook branch August 6, 2020 16:22
@@ -304,7 +323,7 @@ def check_for_key(self, key, bucket_name=None):

@provide_bucket_name
@unify_bucket_name_and_key
def get_key(self, key, bucket_name=None):
def get_key(self, key: str, bucket_name: Optional[str] = None) -> S3Transfer:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect to have annotated with return type S3Transfer though actual return type is boto3.s3.Object. This is giving pycharm fits due to the mismatched interface. Hopefully no impact at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Could you share the output you are seeing from pycharm? I cannot figure out where the boto3.s3.Object is defined. I have been through the boto3, botocore and airflow code as well as some docs but I have not been able to.

I think we need the boto3-type-annotations to solve this issue. Check out this Stack Overflow question also.

This is something @mik-laj once brought up in #11297 it looks like. I don't have time to tackle it right now, but I could in the near future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just dropped a comment over in #11297 as well: #11297 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Cooper. Saying it's giving pycharm fits might have been hyperbole by me. It's showing an error message due to the confusion over types:

Unresolved attribute reference 'download_fileobj' for class 'S3Transfer'

When I step into the get_key function it's showing rtype as Object:

:rtype: boto3.s3.Object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pypi.org/project/boto3-stubs/ might be a helpful project here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants