Skip to content

Grid view optimization #51805

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 26, 2025
Merged

Conversation

dstandish
Copy link
Contributor

@dstandish dstandish commented Jun 16, 2025

Work in progress branch for grid optimization

I break up the monolith that is grid_data.

The headline here is, with 6k tasks in a dag, loading time for 10 runs drops from 1.5m to < 10s in a quick local test.

I split it into smaller more purpose-specific requests that each do less. So we have one request for just the structure, and another one for TI states (per dag run). I also find ways to stop refreshing when there's no active dag run (or the particuler dag run is not active and its tis don't need refreshing. I also changed the "latest dag run" query (which checks for a new run triggered externally to be simpler dedicated endpoint. It runs ever couple seconds even when there is nothing going on and now it takes 10ms instead of 300ms.

In order to have the grid/structure request stop refreshing when nothing is active i had to add a new context provider so the state could be propagated from the grid/runs request (which knows if there's active runs). There may be a better way to do this, and the linter may not like it.

Here's an dag you can test this with. Before, if you had 10 runs, it would take 1.5 minutes to load. When I tried, it was 6 seconds. It's faster with non-dev mode than with dev mode presumably because of parallelism.

from __future__ import annotations

from airflow.providers.standard.operators.empty import EmptyOperator
from airflow.sdk import DAG, TaskGroup, chain

with DAG("bighello_deeper_only_normal"):
    for i in range(10):
        with TaskGroup(f"group_{i}"):
            EmptyOperator(task_id="hello")
            with TaskGroup(f"group_{i}2"):
                chain([EmptyOperator(task_id=f"empty_{j}") for j in range(100)])
                chain([EmptyOperator(task_id=f"empty2_{j}") for j in range(100)])
                # EmptyOperator.partial(task_id=f"hello2").expand(doc=list(range(100)))
            with TaskGroup(f"group_{i}3"):
                chain([EmptyOperator(task_id=f"empty_{j}") for j in range(100)])
                chain([EmptyOperator(task_id=f"empty2_{j}") for j in range(100)])
                # EmptyOperator.partial(task_id=f"hello2").expand(doc=list(range(100)))
            with TaskGroup(f"group_{i}4"):
                chain([EmptyOperator(task_id=f"empty_{j}") for j in range(100)])
                chain([EmptyOperator(task_id=f"empty2_{j}") for j in range(100)])
                # EmptyOperator.partial(task_id=f"hello2").expand(doc=list(range(100)))

before (1.6m)

before-grid-optimization

after (6s)

after-grid-optimization

in action

refresh.of.big.dag.mov

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers. labels Jun 16, 2025
Copy link
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the general approach by splitting things into smaller request to not block UI rendering. Results are impressive!

It needs some polishing but I think we are going in the right direction.

That's a big effort, thanks Daniel for taking this one.

@dstandish dstandish force-pushed the grid-view-optimization branch from 3cf7428 to a8106bd Compare June 17, 2025 16:39
@dstandish dstandish force-pushed the grid-view-optimization branch 2 times, most recently from 375116d to 337c898 Compare June 24, 2025 17:02
dstandish and others added 7 commits June 24, 2025 15:36
The headline here is, with 3k tasks in a dag, loading time for 10 runs drops from 1.5m to < 10s in a quick local test.

I split it into smaller more purpose-specific requests that each do less. So we have one request for just the structure, and another one for TI states (per dag run). I also find ways to stop refreshing when there's no active dag run (or the particuler dag run is not active and its tis don't need refreshing. I also changed the "latest dag run" query (which checks for a new run triggered externally to be simpler dedicated endpoint. It runs ever couple seconds even when there is nothing going on and now it takes 10ms instead of 300ms.

In order to have the grid/structure request stop refreshing when nothing is active I had to add a new context provider so the state could be propagated from the grid/runs request (which knows if there's active runs). There may be a better way to do this, and the linter may not like it.

Co-authored-by: Jed Cunningham <[email protected]>
@dstandish dstandish force-pushed the grid-view-optimization branch from 83ff71a to 2fddc03 Compare June 24, 2025 22:37
@dstandish dstandish requested a review from bbovenzi June 25, 2025 20:29
@dstandish dstandish requested a review from pierrejeambrun June 25, 2025 21:15
@dstandish dstandish requested a review from jedcunningham June 25, 2025 21:15
@kaxil
Copy link
Member

kaxil commented Jun 26, 2025

#protm

@dstandish dstandish merged commit eaa8ca0 into apache:main Jun 26, 2025
99 checks passed
@dstandish dstandish deleted the grid-view-optimization branch June 26, 2025 15:53
@pierrejeambrun
Copy link
Member

Nice one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:UI Related to UI/UX. For Frontend Developers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants