-
Notifications
You must be signed in to change notification settings - Fork 15.3k
feat(api/ui)#50779 : add asset group filtering and asset group views #51682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API side looks good! Could you please fix static checks? It is failing for Compile / format / lint UI
Runned precommit locally and got zero errors how can I replicate this setup and make sure it passes consistently? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this feature is great. Feel free to ask anything.
<HStack gap={4} mb={2}> | ||
<Box> | ||
<Text color="chakra-fg" fontSize="xs"> | ||
{/* i18n: Producing Tasks label */}Producing Tasks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently disable literal string in ui due to i18n thus you need to update the translation file like en/assets.json
or en/common.json
and use useTranslation
hook to access them. You could also run npm run lint
in ui folder to see your lint error.
Yes, I have a question. I created a new file called sidebar.tsx, but I'm not able to catch the errors in it , even though the runner here on Git is marking them. Do you know what might be going on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice effort!
I believe there are a few things to update before we can merge but this can be a great feature and addition for Airflow 3.1.0
all_nodes = [] | ||
all_edges = [] | ||
seen_nodes = set() | ||
seen_edges = set() | ||
for nid in ids_to_fetch: | ||
comp = extract_single_connected_component(nid, data["nodes"], data["edges"]) | ||
for n in comp["nodes"]: | ||
if n["id"] not in seen_nodes: | ||
all_nodes.append(n) | ||
seen_nodes.add(n["id"]) | ||
for e in comp["edges"]: | ||
edge_tuple = (e["source_id"], e["target_id"]) | ||
if edge_tuple not in seen_edges: | ||
all_edges.append(e) | ||
seen_edges.add(edge_tuple) | ||
data = {"nodes": all_nodes, "edges": all_edges} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this manually.
extract_single_connected_component
is actually calling extract_connected_components
which holds all the connected components of the graph. We could directly use all of those and merge them to avoid computing multiple times extract_connected_components
(for each node id).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since extract_connected_components returns all components, should I use it even for single node IDs (by finding the one containing the ID), or should I keep using extract_single_connected_component in that case for simplicity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can rework it a little bit to have:
- extract_connected_components (return all)
- rework and rename extract_single_connected component to instead take a list of node, and extrat the relevant coponent IDs. So you can always use this function everywhere, whether you want a single connected component or multiple of them filtered out.
# If node_id contains commas, treat as multiple IDs (extra protection) | ||
if node_ids: | ||
ids_to_fetch = [nid.strip() for nid in node_ids.split(",") if nid.strip()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, FastAPI handles node_ids: list[str] | None = None
natively. FastAPI chose exploded form query params, i.e ?node_ids=1&node_ids=2
you will directly receive `node_ids=["1","2"] in your function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose to send the parameters as a string specifically to avoid the exploded format in the URL. node_ids=1&node_ids=2, I send something like ?node_ids=1,2 (or another delimited format).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stay consistent with the rest of the API, we don't want some endpoints doing that and some other doing something else, also this will remove the need for you to manually parse that string.
const ChevronDownIcon = () => ( | ||
<svg fill="currentColor" height="1em" viewBox="0 0 20 20" width="1em"> | ||
<path | ||
clipRule="evenodd" | ||
d="M5.23 7.21a.75.75 0 011.06.02L10 11.085l3.71-3.855a.75.75 0 111.08 1.04l-4.24 4.4a.75.75 0 01-1.08 0l-4.24-4.4a.75.75 0 01.02-1.06z" | ||
fillRule="evenodd" | ||
/> | ||
</svg> | ||
); | ||
|
||
const ChevronRightIcon = () => ( | ||
<svg fill="currentColor" height="1em" viewBox="0 0 20 20" width="1em"> | ||
<path | ||
clipRule="evenodd" | ||
d="M7.21 5.23a.75.75 0 011.06-.02l4.4 4.24a.75.75 0 010 1.08l-4.4 4.24a.75.75 0 11-1.04-1.08L11.085 10 7.23 6.29a.75.75 0 01-.02-1.06z" | ||
fillRule="evenodd" | ||
/> | ||
</svg> | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do this at different places in the code. (We use chevron icons from react-icon lib instead), you can do something similar.
const { data, error, isLoading } = useAssetServiceGetAssets({ | ||
limit: 1000, | ||
offset: 0, | ||
orderBy, | ||
}); | ||
|
||
const [search, setSearch] = useState(""); | ||
const [sortAsc, setSortAsc] = useState(true); | ||
|
||
// Group assets by group | ||
const grouped = useMemo(() => { | ||
const filtered = (data?.assets ?? []).filter((asset) => | ||
asset.group.toLowerCase().includes(search.toLowerCase()), | ||
); | ||
const groupedObj = filtered.reduce<Record<string, Array<AssetResponse>>>((acc, asset) => { | ||
const groupName = asset.group; | ||
|
||
acc[groupName] ??= []; | ||
acc[groupName].push(asset); | ||
|
||
return acc; | ||
}, {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this logic can holds a few problems, first of the endpoint useAssetServiceGetAssets
is paginated. It means that you'll only retrieve the first 50 items, and you need a mechanism to display the remaining items (cf tables pagination component).
In addition, there is no guarantee at this point that all assets for a specified group will be on the same page, you could end up with a group missing some assets because those were not returned yet.
I think the easiest approach would be to add a backend ui endpoints to list/retrieve asset groups directly. (I group would list all of it's child asset and we would paginate on groups directly). Maybe there are other approaches possible but I would need to think about it.
const { | ||
data: dataByName, | ||
error: errorByName, | ||
isLoading: isLoadingByName, | ||
} = useAssetServiceGetAssets({ | ||
limit: pagination.pageSize, | ||
namePattern, | ||
namePattern: searchValue ?? undefined, | ||
offset: pagination.pageIndex * pagination.pageSize, | ||
orderBy, | ||
}); | ||
|
||
const { | ||
data: dataByGroup, | ||
error: errorByGroup, | ||
isLoading: isLoadingByGroup, | ||
} = useAssetServiceGetAssets({ | ||
groupPattern: searchValue ?? undefined, | ||
limit: pagination.pageSize, | ||
offset: pagination.pageIndex * pagination.pageSize, | ||
orderBy, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't do that manually in the UI. (call the endpoints with two different parameters and then merge results in the UI).
The backend should support this. (You can write a custom filter that will do the search accross both 'asset.nameand
asset.group, or expand the
search_param_factoryand
_SearchParamto be able to take a
list[ColumnElement]` as attribute. And perform a search across multiple columns.
query.data?.nodes.forEach((node) => { | ||
const key = UseDependenciesServiceGetDependenciesKeyFn({ nodeId: node.id }); | ||
const queryData = queryClient.getQueryData(key); | ||
|
||
if (!Boolean(queryData)) { | ||
queryClient.setQueryData(key, query.data); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice idea but I think in its current form it could induce some bugs. Setting the cache for a particular node_id, with the result of useDependenciesServiceGetDependencies
with many node ids. Because the response returned for many node ids will hold more connected components than just the 1 single connected component that we should retrieve from the nodeId: node.id
filter.
This could therefore display unrelated dependencies when displaying dependencies of a particular node.
Be sure to be rebased on latest main with latest devtools updates (We recently merged a few improvement to this part). You can then run pre-commit hooks or eslint:
|
I'm getting a mypy error when running the pre-commit hook for airflow-core. Here's the stack trace: AssertionError: Should never get here in normal mode, got TypeAlias:numpy.bool_ instead of TypeInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what is wrong in your setup. You can run pre-commit on main to be sure that everything is in order. It is really hard for me to help you without more context and iterating. Everything should be detailed in the contribution docs where you can find all the steps and troubleshooting help.
all_nodes = [] | ||
all_edges = [] | ||
seen_nodes = set() | ||
seen_edges = set() | ||
for nid in ids_to_fetch: | ||
comp = extract_single_connected_component(nid, data["nodes"], data["edges"]) | ||
for n in comp["nodes"]: | ||
if n["id"] not in seen_nodes: | ||
all_nodes.append(n) | ||
seen_nodes.add(n["id"]) | ||
for e in comp["edges"]: | ||
edge_tuple = (e["source_id"], e["target_id"]) | ||
if edge_tuple not in seen_edges: | ||
all_edges.append(e) | ||
seen_edges.add(edge_tuple) | ||
data = {"nodes": all_nodes, "edges": all_edges} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can rework it a little bit to have:
- extract_connected_components (return all)
- rework and rename extract_single_connected component to instead take a list of node, and extrat the relevant coponent IDs. So you can always use this function everywhere, whether you want a single connected component or multiple of them filtered out.
47ce735
to
f71a6f5
Compare
f71a6f5
to
7bd5da7
Compare
Hi, the pipeline is currently failing on Tests AMD / CI image checks / Static checks (pull_request), specifically at the step "Generate SVG from Airflow CTL Commands". However, when I run the checks locally: pre-commit run (on modified files only) skips this check, as expected, since I haven't modified any related files; pre-commit run --all-files passes the check successfully, including the "Generate SVG from Airflow CTL Commands" step. Please let me know if there's anything else I should look into or adjust. Thanks! |
That was temporary. As usual - rebase. Always Be Rebased. There is another issue in main being fixed as we speak though so you might wait a few minutes. |
This is a PR that targets fixing the current main "generate constraints" failure #52464 - once merged, rebasing your PR should fix unrelated issue. |
…views Add backend and OpenAPI support for filtering assets by group (`group_pattern`) in /assets endpoint. Implement multi-asset dependency queries in /ui/dependencies endpoint with node_ids parameter. Add new frontend pages and components for asset group views, including group graph and sidebar. Refactor asset list and group list to support group filtering, merging, and deduplication. Add and update tests for group filtering, group views, and multi-asset dependency queries. Co-authored-by: Francisco Núncio <[email protected]>
- Added `/assets/groups` endpoint to return asset groups with their assets and counts. - Added `AssetGroupResponse` and `AssetGroupCollectionResponse` models to API and OpenAPI schemas. - Updated backend, OpenAPI YAML, and TypeScript types to support asset group listing and filtering. - Updated frontend to use new `useAssetServiceGetAssetGroups` hook for grouped asset views and sidebar. - Refactored asset list and group pages to use new group API and improved i18n. - Updated dependencies API and frontend to use exploded `node_ids` query parameter (array, not comma-separated). - Updated backend logic for `/dependencies` to return 404 if any requested node_id is missing. - Updated and fixed unit tests for dependencies and asset groups to match new API and error handling. - Improved search and filtering for assets and asset groups in UI.
Rebased it for you after merging my PR. |
Add backend and OpenAPI support for filtering assets by group (
group_pattern
) in /assets endpoint.Implement multi-asset dependency queries in /ui/dependencies endpoint with node_ids parameter.
Add new frontend pages and components for asset group views, including group graph and sidebar.
Refactor asset list and group list to support group filtering, merging, and deduplication.
Add and update tests for group filtering, group views, and multi-asset dependency queries.
introduced small bug in caching on the asset_grath.
https://youtu.be/ybpRUjEhdoQ
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in airflow-core/newsfragments.