Skip to content

Fix Certain DAG import errors ("role does not exist") don't persist in Airflow #51511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sjyangkevin
Copy link
Contributor

Motivation

As described in #49651, when the access control for a DAG is set to an non-exist role, the DAG import error show up in Airflow UI for a while and then disappear. The update is to fix this issue, and let the import error persist in the metadata DB until the DAG is updated with a correct access control setting.

Close #49651

What is the issue

def _serialize_dag_capturing_errors(

When the DAG's access control is set to a non-exist role, the following process will raise an Exception "Failed to write serialized DAG dag_id=...". So, how this exception is triggered?

  1. dag_was_updated will be True when the first time SerializedDagModel.write_dag write the serialized DAG to the database.
  2. when dag_was_updated is True, _sync_dag_perms will be triggered to sync DAG specific permissions. At the moment, it detects that the role doesn't exist, and raise an error, resulting in the exception.
  3. This exception will be captured, and being logged as an import error temporary in the DB, and show up in the UI.

From my understanding, this sync process will run for every MIN_SERIALIZED_DAG_UPDATE_INTERVAL. So, what happen in the second run.

  1. dag_was_updated will be False since the DAG code is not updated.
  2. In this case, _sync_dag_perms will NOT BE TRIGGERED even though in the access control is set incorrectly in the DAG code.
  3. Therefore, no exception will be raised, and no import error will be logged. Therefore, the import error is removed from the DB, as well as from the UI.

What is the fix

In the current state, _sync_dag_perms runs only when the DAG is updated (i.e., dag_was_updated is True). This can be more performant because it doesn't run for all the DAGs. However, it cannot properly handle the sync for permissions. Therefore, the current fix is to make _sync_dag_perms run for all the DAGs during the DAG sync process. I understand it might not be an ideal fix, but I wasn't able to find a better solution due to my limited understanding on the code. I would really appreciate if anyone could suggest some ideas to further improve it.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@eladkal eladkal added this to the Airflow 3.0.3 milestone Jun 8, 2025
@eladkal eladkal added type:bug-fix Changelog: Bug Fixes backport-to-v3-0-test Mark PR with this label to backport to v3-0-test branch labels Jun 8, 2025
@sjyangkevin sjyangkevin changed the title Solve Certain DAG import errors ("role does not exist") don't persist in Airflow Fix Certain DAG import errors ("role does not exist") don't persist in Airflow Jun 9, 2025
@uranusjr uranusjr self-requested a review June 11, 2025 06:35
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Could you add the corresponding unit test?

@sjyangkevin
Copy link
Contributor Author

Thanks for the fix! Could you add the corresponding unit test?

Thanks for the feedback. I am wondering if it’s a proper fix but will add unit tests for it.

@jason810496
Copy link
Member

Thanks for the fix! Could you add the corresponding unit test?

Thanks for the feedback. I am wondering if it’s a proper fix but will add unit tests for it.

The investigation seems reasonable to me, but still need second eye on it to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:DAG-processing backport-to-v3-0-test Mark PR with this label to backport to v3-0-test branch type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Certain DAG import errors ("role does not exist") don't persist in Airflow
3 participants