Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataproc submit job operator async #25302

Merged
merged 31 commits into from
Aug 22, 2022

Conversation

bjankie1
Copy link
Contributor

Add deferrable capability to existing DataprocJobBaseOperator and DataprocSubmitJobOperator operators.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@bjankie1 bjankie1 requested a review from turbaszek as a code owner July 26, 2022 10:00
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Jul 26, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Jul 26, 2022

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: [email protected]
    Slack: https://s.apache.org/airflow-slack

@bjankie1 bjankie1 force-pushed the DataprocSubmitJobOperator-async branch from e90505e to 1ffba7c Compare July 27, 2022 06:35
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty coll - but we also need some examples and likely entries in the documentation describing the usage and mentioning deferrable options. Otherwise it will not be discoverable enough.

@bjankie1 bjankie1 requested a review from mik-laj as a code owner July 29, 2022 08:49
@bjankie1
Copy link
Contributor Author

Looks pretty coll - but we also need some examples and likely entries in the documentation describing the usage and mentioning deferrable options. Otherwise it will not be discoverable enough.

Thank you for pointing it out. I've fixed with #1bf260a

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks cool now! Thanks for adding the doc entry !

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

Some tewsts failing

@rajaths010494
Copy link
Contributor

Test cases for triggerer are missing.

@bjankie1 bjankie1 force-pushed the DataprocSubmitJobOperator-async branch from 470611f to 641810e Compare August 1, 2022 13:32
@potiuk
Copy link
Member

potiuk commented Aug 2, 2022

Some errors to fix.

@pankajastro
Copy link
Member

pankajastro commented Aug 2, 2022

@potiuk @bjankie1 Previous we have created two different operators/sensors one for synchronous and one asynchronous for example DatabricksSubmitRunDeferrableOperator is for async and DatabricksSubmitRunOperator for sync but in this PR we have added a flag for async in the existing operator. does it would be considered as an inconsistency and do we need to worry about it.? https://github.com/apache/airflow/blob/main/airflow/providers/databricks/operators/databricks.py#L368

@potiuk
Copy link
Member

potiuk commented Aug 2, 2022

I am perfectly ok with per-provider consistency, rather than "per-airflow" consistency. There is no reason why we should "force" one way or the other cross-providers. And there might be reaons why it's easier for one provider to do it this way and for another provider - different way.

Traditionally each provider should follow their own "standards" and be consistent - for all things except the Airflow "performace" and best practices. We are gearing up (in our tooling for now but soon in the code) to splitting providers to individual repos and then it will be even less important is such level of consistency is required.

I personally think of this in the very way Apache Software Foundation way does with their projects. There is a very, very small but super-strict interface of the "distributed" component (project in ASF and provider in Airlfow) should follow and it should be strict and followed - but all the rest should be left to decide internally (by project in ASF and by provider in Airlfow).

The things that we care about at the airlfow level should very well described in https://github.com/apache/airflow/blob/main/README.md and strictly regulated by the processes/automation. All the rest - people who are most active in the providers should decide.

async def create_cluster(
self,
region: str,
project_id: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter should be set to None by default.

@potiuk
Copy link
Member

potiuk commented Aug 3, 2022

And static/docs need to be fixed too

@bjankie1 bjankie1 force-pushed the DataprocSubmitJobOperator-async branch 2 times, most recently from e85304a to ed00303 Compare August 5, 2022 08:52
@potiuk
Copy link
Member

potiuk commented Aug 5, 2022

Test failing :(

@potiuk
Copy link
Member

potiuk commented Aug 8, 2022

One doc failure left.

@bjankie1 bjankie1 force-pushed the DataprocSubmitJobOperator-async branch from abacb5d to 39b535e Compare August 8, 2022 15:29
@potiuk potiuk force-pushed the DataprocSubmitJobOperator-async branch from 39b535e to 407d5e6 Compare August 8, 2022 21:00
@bjankie1 bjankie1 force-pushed the DataprocSubmitJobOperator-async branch from e3e559a to 9679ca1 Compare August 16, 2022 08:22
@potiuk potiuk merged commit ecf0460 into apache:main Aug 22, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 22, 2022

Awesome work, congrats on your first merged pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants