Skip to content

Cleanup parsing_pre_import_modules setting #49839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Conversation

eladkal
Copy link
Contributor

@eladkal eladkal commented Apr 27, 2025

Remove usage of AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES from code base.
This optimization added in 2.6.0 #30495
it was not ported with Airflow 3 and isn't used anymore.

Note: this is not a breaking change since Airflow 3 is ignoring this setting anyway thus this is only a cleanup.

@eladkal eladkal added this to the Airflow 3.0.1 milestone Apr 27, 2025
@eladkal eladkal added the type:misc/internal Changelog: Misc changes that should appear in change log label Apr 27, 2025
@ashb
Copy link
Member

ashb commented Apr 27, 2025

I wonder if we should respect it instead of?

@potiuk
Copy link
Member

potiuk commented Apr 27, 2025

I wonder if we should respect it instead of?

I have not looked at how we we are parsing the dags now but if we are still forking Dag File processor processes, it might make sense to pre-import the modules before we actually fork.

Though probably what would make sense we could import ALL "airflow" modules this way, not only those that are present in DAGs when we start DAG file processor - this way we could just delay the dag file processor startup a "little". but all airflow modules would be imported before we fork. That could be IMHO a bit better optimisation.

As I've learned forking does not give that much of memory sharing in Python as I thought (due to reference counting) , the elapsed time neeed to run the import for each DAG will be still smaller.

@kaxil
Copy link
Member

kaxil commented May 2, 2025

Ah, I didn’t see Ash’s and Jarek’s valid points/comments. It is worth adding support for them or just creating a GH issue to take care of it. and then improve further (with Jarek's points) it in a separate PR

@ashb
Copy link
Member

ashb commented May 6, 2025

As I've learned forking does not give that much of memory sharing in Python as I thought (due to reference counting) , the elapsed time neeed to run the import for each DAG will be still smaller.

The main benefit to forking here is (or at least was, when I mage this change around v1.10.7 period) due to two things - 1. not having to spawn an entirely fresh python process, and 2. Not having to import airflow, all of its deps, parse all it's config and connect to DB etc. Some of that is lessened these days, but imports are still surprisingly slow

@eladkal
Copy link
Contributor Author

eladkal commented May 8, 2025

Closing. will track progress on #50348

@eladkal eladkal closed this May 8, 2025
@eladkal eladkal deleted the import branch May 8, 2025 12:09
@eladkal eladkal removed this from the Airflow 3.0.2 milestone May 8, 2025
Lzzz666 added a commit to Lzzz666/airflow that referenced this pull request May 8, 2025
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue.

This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3.

This helps to reduce the cumulative time spent on imports during serial DAG file parsing.

Refs apache#50348
Addresses discussion in apache#49839
Refs apache#30495
Lzzz666 added a commit to Lzzz666/airflow that referenced this pull request Jun 24, 2025
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue.

This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3.

This helps to reduce the cumulative time spent on imports during serial DAG file parsing.

Refs apache#50348
Addresses discussion in apache#49839
Refs apache#30495
Lzzz666 added a commit to Lzzz666/airflow that referenced this pull request Jun 24, 2025
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue.

This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3.

This helps to reduce the cumulative time spent on imports during serial DAG file parsing.

Refs apache#50348
Addresses discussion in apache#49839
Refs apache#30495
Lzzz666 added a commit to Lzzz666/airflow that referenced this pull request Jun 24, 2025
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue.

This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3.

This helps to reduce the cumulative time spent on imports during serial DAG file parsing.

Refs apache#50348
Addresses discussion in apache#49839
Refs apache#30495
Lzzz666 added a commit to Lzzz666/airflow that referenced this pull request Jun 24, 2025
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue.

This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3.

This helps to reduce the cumulative time spent on imports during serial DAG file parsing.

Refs apache#50348
Addresses discussion in apache#49839
Refs apache#30495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:ConfigTemplates type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants