Skip to content

[AIRFLOW-5944] Display rendered template_fields without accessing DAG files #6788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 12, 2020

Conversation

kaxil
Copy link
Member

@kaxil kaxil commented Dec 11, 2019

One of the limitations with the current implementation of DAG serialization is that Templated fields still need access to DAG Files.

Potential solution 1: Store rendered version in the DB somewhere.
Limitation: Tasks that are not yet run won't be able to render the templated fields (this is a useful debugging aid, especially for those without CLI access to run airflow tasks render) .

Potential solution 2: Store both rendered & unrendered version in the DB somewhere. This will allow getting past the Limitation in Solution (1). We should store only the current unrendered version and store the last X number of rendered versions. This will also have an advantage that old task would have the rendered version that was correct at that time (which is not possible currently).
Limitation/Issue: Maintaining two separate tables without much benefit

Potential solution 3: Store rendered version in TI table and unrendered version in a separate column in DAG serialization table. This means that each TI would have an associated rendered field without creating an extra table and duplicating dag_id, task_id & execution_date. Once we start storing different version of serialized DAGs we can have each DAG row have it's unrendered version.


Issue link: AIRFLOW-5944

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@kaxil kaxil requested a review from ashb December 11, 2019 10:32
)

if conn.dialect.name == "mysql":
conn.execute("SET time_zone = '+00:00'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've got this in earlier migrations - I don't think we need it here too do we? Also: how about we just don't use a "TIMESTAMP" type, but "DATETIME" instead?

Copy link
Member Author

@kaxil kaxil Jan 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't contain the timezones though if we just use DATETIME! It might have undesirable effects when comparing current time with this time!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? https://www.eversql.com/mysql-datetime-vs-timestamp-column-types-which-one-i-should-use/ would imply otherwise.

(Also TIMESTAMP only supports dates up to 2038.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the same article :

To summarize, if you want to serve your date and time data the same way regardless of timezones, you can use the DATETIME type (which will also allow you to use all the date & type functions built in MySQL). Otherwise, you can use TIMESTAMP and serve the data on a per-timezone basis.

Also, from https://docs.sqlalchemy.org/en/13/core/type_basics.html#sqlalchemy.types.DateTime

For timezone support, use at least the TIMESTAMP datatype, if not the dialect-specific datatype object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read that and my conclusion is we should use DATETIME for mysql - Airflow puts in checks to ensure that we only ever store UTC datetimes in the DB.

By using a datetime column the migrations become a lot simpler.

@kaxil kaxil force-pushed the AIRFLOW-5944-rendering-templated-fields branch from ef008b5 to a61fc97 Compare December 12, 2019 00:35
@github-actions github-actions bot added the area:webserver Webserver related Issues label Jan 5, 2020
@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jan 26, 2020
@kaxil kaxil added pinned Protect from Stalebot auto closing and removed stale Stale PRs per the .github/workflows/stale.yml policy file labels Jan 26, 2020
@apache apache deleted a comment from stale bot Jan 26, 2020
@kaxil kaxil force-pushed the AIRFLOW-5944-rendering-templated-fields branch from a61fc97 to aa27c4a Compare January 29, 2020 06:56
@anitakar
Copy link
Contributor

anitakar commented Jan 29, 2020

Hey @kaxil. Do we have any ETA on when it will be merged?
And will it solve the problem of showing source code in web UI when webserver does not have access to dag folder?

@mik-laj
Copy link
Member

mik-laj commented Jan 29, 2020

I am also very interested in changes regarding DAG serialization. If you could share your work progress, I would be very grateful. WIP PR is better than no PR.

@kaxil
Copy link
Member Author

kaxil commented Jan 29, 2020

Hi @anitakar @mik-laj - I am on holidays this week but would be fulltime on it from next week. Will keep you guys posted.

@kaxil kaxil force-pushed the AIRFLOW-5944-rendering-templated-fields branch 4 times, most recently from 70be2e7 to 4639f07 Compare February 2, 2020 22:02
@kaxil
Copy link
Member Author

kaxil commented Feb 3, 2020

2 tests are failing:

FAILED tests/operators/test_dagrun_operator.py::TestDagRunOperator::test_trigger_dagrun_operator_conf
FAILED tests/operators/test_dagrun_operator.py::TestDagRunOperator::test_trigger_dagrun_with_execution_date
= 2 failed, 4823 passed, 180 skipped, 7 xfailed, 1 xpassed, 130 warnings in 1623.07s (0:27:03) =
self = <_mysql.connection closed at 559364d105d0>
query = b"INSERT INTO rendered_task_instance_fields (dag_id, task_id, execution_date, rendered_fields) VALUES ('triggerdag', 'test', '2020-02-03 10:48:04.120893', '{}')"
    def query(self, query):
        # Since _mysql releases GIL while querying, we need immutable buffer.
        if isinstance(query, bytearray):
            query = bytes(query)
        if self.waiter is not None:
            self.send_query(query)
            self.waiter(self.fileno())
            self.read_query_result()
        else:
>           _mysql.connection.query(self, query)
E           sqlalchemy.exc.IntegrityError: (_mysql_exceptions.IntegrityError) (1062, "Duplicate entry 'triggerdag-test-2020-02-03 10:48:04' for key 'PRIMARY'")
E           [SQL: INSERT INTO rendered_task_instance_fields (dag_id, task_id, execution_date, rendered_fields) VALUES (%s, %s, %s, %s)]
E           [parameters: ('triggerdag', 'test', datetime.datetime(2020, 2, 3, 10, 48, 4, 120893, tzinfo=<Timezone [UTC]>), '{}')]
E           (Background on this error at: http://sqlalche.me/e/gkpj)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test (with a json blob in the .py file) showing how this is stored.

Unless I'm blind you aren't reading RenderedTaskInstanceFields yet?

@kaxil
Copy link
Member Author

kaxil commented Feb 3, 2020

Needs a test (with a json blob in the .py file) showing how this is stored.

Unless I'm blind you aren't reading RenderedTaskInstanceFields yet?

We are reading it here: https://github.com/apache/airflow/pull/6788/files#diff-948e87b4f8f644b3ad8c7950958df033R590-R602

kaxil added a commit that referenced this pull request Apr 2, 2020
…8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in #6788 (comment)

(cherry picked from commit 99370fe)
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Feb 11, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Feb 23, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Feb 24, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 3, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 4, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 8, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

(cherry picked from commit 99370fe770ad32f01cb9bcb38c97024dfe31ff51)

GitOrigin-RevId: 5499302cc3c7fb49affe83e6ee08f8e40b11712c
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 15, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 17, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 23, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 25, 2021
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Mar 9, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 3, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jun 6, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jul 8, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Aug 26, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 3, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
aglipska pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 7, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Dec 7, 2022
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
leahecole pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Jan 27, 2023
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this pull request Sep 11, 2024
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to kosteev/composer-airflow-test-copybara that referenced this pull request Sep 12, 2024
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 16, 2024
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Nov 6, 2024
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Apr 30, 2025
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 21, 2025
…(#8051)

This is because "The composite IN construct is not supported by all backends"

Based on discussion in apache/airflow#6788 (comment)

GitOrigin-RevId: 99370fe770ad32f01cb9bcb38c97024dfe31ff51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler area:serialization area:webserver Webserver related Issues pinned Protect from Stalebot auto closing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants