-
Notifications
You must be signed in to change notification settings - Fork 15.3k
Refactor serve_logs with FastAPI #52581
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Refactor serve_logs with FastAPI #52581
Conversation
8be599f
to
99651ed
Compare
Fix test_invalid_characters_handled Refactor with StaticFiles
99651ed
to
b5facdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though there isn't Flask'ssend_from_directory
alternative in FastAPI.
So I implement the validation for file path in first try ( and the security check is angry ).
Fortunately I found the gist for having authorization for FastAPI's StaticFiles
. ( linked in the PR description )
raise ImportError(f"Unable to load {log_config_class} due to error: {e}") | ||
|
||
fastapi_app = FastAPI() | ||
fastapi_app.state.signer = JWTValidator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to set the signer
instance in app.state
just like what we do in core-api for dag_bag
to make it singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! TIL a bit more about fast_api by reviewing this PR.
|
||
options = [bind_option, GunicornOption("workers", 2)] | ||
StandaloneGunicornApplication(wsgi_app, options).run() | ||
# Use Uvicorn worker class for ASGI applications | ||
options = [ | ||
bind_option, | ||
GunicornOption("workers", 2), | ||
GunicornOption("worker_class", "uvicorn.workers.UvicornWorker"), | ||
] | ||
StandaloneGunicornApplication(asgi_app, options).run() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I'm also think about replace StandaloneGunicornApplication
with uvicorn.run
. Since the api_server_command
use uvicorn.run
to start the whole core-api.
Any comment for this ?
airflow/airflow-core/src/airflow/cli/commands/api_server_command.py
Lines 106 to 117 in b5facdf
uvicorn.run( | |
"airflow.api_fastapi.main:app", | |
host=args.host, | |
port=args.port, | |
workers=num_workers, | |
timeout_keep_alive=worker_timeout, | |
timeout_graceful_shutdown=worker_timeout, | |
ssl_keyfile=ssl_key, | |
ssl_certfile=ssl_cert, | |
access_log=access_logfile, | |
proxy_headers=proxy_headers, | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the difference is. But we have to remember that "serve_logs" is run in "celery" worker - and we do not have "api_server" running there - serve_log is the only thing that celery workers are exposing. So I think that's the reason we had "Standalone server". I do not know too much about those.
@pierrejeambrun -> maybe you can help here?
@@ -43,74 +44,55 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
def create_app(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
if token_filename is None: | ||
logger.warning("The payload does not contain 'filename' key: %s.", payload) | ||
abort(403) | ||
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid token payload") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not actually want to provide more details to client on why we rejected the request. This is an important security principle - never explain why you fail to the client side if the reason is authentication problem, just return "403" without any details - and log details on the server side.
Otherwise it might make easier for potential attacker to see what is wrong and they can adjust their attack - including leveraging some of the "timing" attacks for example to see if the tokens are partially matching and things like that.
The lest we tell the client about reasons, the more secure we are.
if token_filename != request_filename: | ||
logger.warning( | ||
"The payload log_relative_path key is different than the one in token:" | ||
"Request path: %s. Token path: %s.", | ||
request_filename, | ||
token_filename, | ||
) | ||
abort(403) | ||
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Token filename mismatch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This one for example is pretty informative to the attacker - so we should just return 403 and keep all the details in the server log for diagnostics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we can do in this case is to get a random id associated with such request and report it (and log on the server side) - this makes it easier to correlate client side requests with errors for legitimate errors.
except HTTPException: | ||
raise | ||
except InvalidAudienceError: | ||
logger.warning("Invalid audience for the request", exc_info=True) | ||
abort(403) | ||
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid audience") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here and all other cases. It was pretty deliberate to just return 403 here.
We could actually make a comment here to metion that it's deliberate - otherwise future contributors might try to "fix" it in the same way.
|
||
import gunicorn.app.base | ||
from flask import Flask, abort, request, send_from_directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ ❤️
One other thing - I think we can get rid of those dependencies from # We could get rid of flask and gunicorn if we replace serve_logs with a starlette + unicorn
"flask>=2.1.1",
# We could get rid of flask and gunicorn if we replace serve_logs with a starlette + unicorn
"gunicorn>=20.1.0", |
closes: #52526
related: https://lists.apache.org/thread/hfr8q85rgr6knpp5wblbz301ysnmzhht
Why
What
Replace Flask's
send_from_directory
fastapi_app.mount
withJWTAuthStaticFiles
( which inherent from FastAPI'sStaticFiles
and extend the existed authorization. reference from fastapi/fastapi#858 (comment) )