-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Add support for S3 dag bundle #46621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
moved it to provider package. its ready for review. any chance to trigger the test workflows? |
f79f69f
to
d7073b2
Compare
Triggered workflows again, will review soon 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me (though I'm no expert on Bundles). There are some failing tests and static checks though that need looking at.
974a4a3
to
ed17613
Compare
code related tests are passing now. remaining test is related to airflow version. |
@o-nikolas @jedcunningham its ready for review. |
5780ad3
to
a1c5c1c
Compare
Looks reasonable to me, but I'm no Bundles specialist. I talked to Jed before he left on leave and he mentioned to loop in @dstandish on Bundles stuff. Does this look reasonable to you as well Daniel? |
Did you try https://podman-desktop.io/docs/migrating-from-docker/managing-docker-compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay on reviewing this. Happy to see it!
providers/amazon/src/airflow/providers/amazon/aws/bundles/s3.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/bundles/s3.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/bundles/s3.py
Outdated
Show resolved
Hide resolved
providers/amazon/src/airflow/providers/amazon/aws/bundles/s3.py
Outdated
Show resolved
Hide resolved
) | ||
local_s3_objects = [] | ||
s3_bucket = self.s3_hook.get_bucket(self.bucket_name) | ||
for obj in s3_bucket.objects.filter(Prefix=self.prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive my s3 ignorance, but is there no higher level construct we can use to bring the local side up to date? Having to iterate through and do our own sync feels a bit odd.
If we do need to do this, it feels like this might be better in the hook - the concept of syncing a directory isn't necessarily s3 bundle specific, but could be more generally useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately nothing in existing libs. There are some cli packages. Moved to S3 hook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, S3 is very odd in that the API and CLI diverge in that CLI supports a sync function and the API does not 😢
You could potentially shell out to the CLI command, but I don't love that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. I'll let you make that call then @o-nikolas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing this. I say we stick with this code, so that we don't have a direct dependency to the CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking good, some naming and test refactors and, from my side at least, I'm happy with this.
Don’t use caplog in general. I think it’s not too difficult to rewrite the tests. |
Would really like to see the pr merged 😍 |
Co-authored-by: Ephraim Anierobi <[email protected]>
Co-authored-by: Jed Cunningham <[email protected]>
Co-authored-by: Jed Cunningham <[email protected]>
The branch is 600 commits behind. I rebased it. lets see what CI says. |
I would love to see this merged as well. @ismailsimsek the use of caplog needs to be addressed (i.e. removed). Do you need any help with this? |
That would be a great help, thank you! I don't have much time to work on this recently, so I'd really appreciate it. |
implements S3 dag bundle, with this end user could set an
S3 bucket
andprefix
(sub folder) as a dag bundle source.related wiki AIP-66: DAG Bundles & Parsing
related project: AIP-66: DAG Bundles & Parsing
closes #9555
closes #8657