-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Cleanup parsing_pre_import_modules setting #49839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wonder if we should respect it instead of? |
I have not looked at how we we are parsing the dags now but if we are still forking Dag File processor processes, it might make sense to pre-import the modules before we actually fork. Though probably what would make sense we could import ALL "airflow" modules this way, not only those that are present in DAGs when we start DAG file processor - this way we could just delay the dag file processor startup a "little". but all airflow modules would be imported before we fork. That could be IMHO a bit better optimisation. As I've learned forking does not give that much of memory sharing in Python as I thought (due to reference counting) , the elapsed time neeed to run the import for each DAG will be still smaller. |
Ah, I didn’t see Ash’s and Jarek’s valid points/comments. It is worth adding support for them or just creating a GH issue to take care of it. and then improve further (with Jarek's points) it in a separate PR |
The main benefit to forking here is (or at least was, when I mage this change around v1.10.7 period) due to two things - 1. not having to spawn an entirely fresh python process, and 2. Not having to import |
Closing. will track progress on #50348 |
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue. This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3. This helps to reduce the cumulative time spent on imports during serial DAG file parsing. Refs apache#50348 Addresses discussion in apache#49839 Refs apache#30495
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue. This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3. This helps to reduce the cumulative time spent on imports during serial DAG file parsing. Refs apache#50348 Addresses discussion in apache#49839 Refs apache#30495
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue. This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3. This helps to reduce the cumulative time spent on imports during serial DAG file parsing. Refs apache#50348 Addresses discussion in apache#49839 Refs apache#30495
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue. This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3. This helps to reduce the cumulative time spent on imports during serial DAG file parsing. Refs apache#50348 Addresses discussion in apache#49839 Refs apache#30495
Airflow 2.6.0 introduced the `AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES` setting (apache#30495) to pre-import commonly used modules in the DAG File Processor parent process before forking, which mitigated this performance issue. This optimization logic and the corresponding setting were not carried over to Airflow 3 and the setting is currently ignored (as noted in apache#49839). While apache#49839 proposed removing the setting as unused, this commit re-implements the underlying pre-import optimization functionality to restore this performance benefit in Airflow 3. This helps to reduce the cumulative time spent on imports during serial DAG file parsing. Refs apache#50348 Addresses discussion in apache#49839 Refs apache#30495
Remove usage of
AIRFLOW__SCHEDULER__PARSING_PRE_IMPORT_MODULES
from code base.This optimization added in 2.6.0 #30495
it was not ported with Airflow 3 and isn't used anymore.
Note: this is not a breaking change since Airflow 3 is ignoring this setting anyway thus this is only a cleanup.