-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Add tracking of triggering user to Dag runs #51738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add tracking of triggering user to Dag runs #51738
Conversation
Note: Marking as DRAFT as I assume some pytests fail and need to be adjusted. |
21f2921
to
53e8254
Compare
nice! Highly requested feature. |
airflow-core/src/airflow/api_fastapi/core_api/datamodels/dag_run.py
Outdated
Show resolved
Hide resolved
To be perfectly honest, the audit log concept of ours should be completely reworked if we want to continue naming it "audit log" . The way how it is currently designed it has nothing to do with "audit". There is no way to have audit log that can be deleted without leaving a trace. As it is implemented now, the only real audit that you can make out of it is if you subscribe to the events of the table and move the entries to an "append only" storage. Otherwise it's just a "log" - but it's never "audit". I think it might be vere confusing to the users, personally because "audit log" means really something that you are not able to modify once written. Of course - we have no such storage available readily and we do not want to complicate "open-source" airflow with such an option. So .. If we realy want to "fix" the audit log, add the run_id and keep the user id in it we should:
If we do it this way and make "log" regular part of our data model (but not pretend it's an audit and tell our users that they should rather use the events exposed via the API to keep the audit) - then i am fine with having to lookup user in the log. But mixing "data model" table and "audit log" functionality is just plain wrong almost by definition. |
Basically my "security" hat is raised by the reminder of my hair every time when I see "audit log" with the thing that we have implemented here. |
Definition by Gemini- "tamper-evident", "verifiable" are crucial part of the definition: An audit log is a chronological record of system activities, capturing events, actions, and changes within a system or application. It serves as a detailed, tamper-evident history used for accountability, security, compliance, and troubleshooting. Here's a more detailed explanation: Purpose: Audit logs are crucial for maintaining a clear and verifiable record of system behavior. This helps organizations track who did what, when, and where within their systems, which is vital for security, compliance, and troubleshooting. |
Reworking the current audit log table to make it useful could be nice indeed. (Maybe rename it too while we are at it) Real auditing logs records is I think a different subject all by itself, so I won’t comment on that but Jarek idea makes sense if we don’t want the burden on airflow 3 shoulders |
I've always been a fan of changing from Audit Log. But then calling it Events or Log was confusing to users. Happy to think about a better one |
Maybe |
@bbovenzi We have cases where we may have more than 200+ manual triggers for a dag in a day and the only other alternative is to create 1 dag per user (dynamically create it per user). which is unnecessary stress on the dag-processer |
4fe66fc
to
bd8ebb8
Compare
A lot of debate about the triggering user ... actually just because adding one field to the schema which consumes a few bytes... many more fields on task instance which also would be candidated to optimize DB. We never had such a long discussion in other PRs adding e.g. "triggered_by" which then is falling into the same category. I understand the idea of putting all to a "nice" action log, but this would be major rework. That will take much more effort than I have capacity. At least currently. Also this would add DB overhead as in filters for user you'd always need to join... Looking at the problem from a pragmatic view we can merge it now and plan a rework for audit -> structured action log in the future. Or if there is a majority to dis-like we can block this demanded feature. (As of missing DB access having the same workarounds like @dheerajturaga this actually would be a loss of function same to us if we would like to move to Airflow 3 from 2.10) Needless to say I am belonging to the pragmatic party. |
100% agree, This is mostly bike-shedding on a feature that adds almost zero overhead and could be just merged and saves a lot of hassle for the users. |
3ebb1ad
to
2390c87
Compare
Just for reference, in Airflow 3, retrieving the triggering user from event logs requires the following approach. I've encapsulated the access_token handling within the That said, there may be scenarios where the logical date does not align as expected, which introduces additional complexity—particularly in our unit tests where we need to mock API responses. Given that over 50% of our DAGs rely on this functionality, its absence could significantly delay our adoption of Airflow 3. I hope this concern is understandable and that accommodating this request is feasible. I truly appreciate your consideration. def _find_owner_v3(dag_run=None) -> str | None:
"""
This is only for Airflow3, use the Airflow Client API
to fetch the event logs
"""
# Only run for manual runs
if dag_run.run_type.name.upper() != 'MANUAL':
logger.error(f"Not manually triggered. run_type: {dag_run.run_type}")
return
# Cant co-relate if logical date is missing for dag run
if not dag_run.logical_date:
logger.error(f"No logical date available for this run, cant find owner")
return
logical_date = dag_run.logical_date.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'
import airflow_client.client
from airflow_client.client.rest import ApiException
from custom.api.utils import get_airflow_client_configuration
configuration = get_airflow_client_configuration()
# Enter a context with an instance of the API client
with airflow_client.client.ApiClient(configuration) as api_client:
# Create an instance of the API class
api_instance = airflow_client.client.EventLogApi(api_client)
event = 'trigger_dag_run'
try:
logger.info("#################################################")
logger.info(f"DAG ID: {dag_run.dag_id}")
logger.info(f"RUN ID: {dag_run.run_id}")
logger.info(f"Logical Date: {logical_date}")
logger.info("#################################################")
# Get Event Logs
api_response = api_instance.get_event_logs(dag_id=dag_run.dag_id, event=event)
if not api_response:
logger.error("No trigger events found!")
return
for event in reversed(api_response.event_logs):
logger.info(event)
if event.extra:
event_info = {}
event_info = json.loads(event.extra)
if "logical_date" in event_info:
if logical_date == event_info["logical_date"]:
logger.info(f"Matching Event: {event}")
logger.info(f"Dag triggered by: {event.owner}")
return event.owner
except Exception as e:
raise AirflowException("Exception when calling EventLogApi->get_event_logs: %s\n" % e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits, questions, but looks good to me.
2390c87
to
1ad52d8
Compare
That's because it's not about adding a column/it's not about what the code itself but how we do the feature, and it's about what is the most sustainable way to develop Airflow for the long term; is this short term approach right (which we will likely have in place for years), or should we spend more time to build a longer term and more generic approach. |
I think we are all well aware of that. And we have different views what would be more generic way and apparently slightly different design assumptions. Nobody who wants to add user here does it because they think it's "short term gain" - the arguments here are that this is a good design decision. There are clearly voices that audit log is a differen thing than data model of the app and they should not be mixed. That's one of the design views here. I think we should jointly make a dcieison based on those different design assumptions. IMHO design where 'triggering user" is part of the task instance model is a good design decision. |
This PR adds a feature to Airflow that trigering user is tracked on the Dag run level. So far if you have usages where users trigger manual you needed to find-out which user it was by looking into audit log.
The user is tracked with unix user name when using CLI or airflowctl, in web UI or REST cases the authenticated user is used.
In case a backfill is started via UI the user who started the backfill propagates the runs of Dags.
FYI @clellmann @wolfdn @AutomationDev85