Skip to content

Ensure HttpHook.run() does not alter extra_options passed to it #51893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 26, 2025

Conversation

jroachgolf84
Copy link
Collaborator

Previously, running HttpHook.run() altered the dictionary that was passed via extra_options. This was illustrated and reproduced in #51686 using the following logic within a Task.

...

extra_options = {"verify_ssl": None, "proxies": None}
original_extra_options = deepcopy(extra_options)

# jsonplaceholder Connection has
# * Host = https://jsonplaceholder.typicode.com
# * Extra = {"foo": "bar"}
# Note: some Extra value is necessary, otherwise the extra_options passed to run are not used, see https://github.com/apache/airflow/issues/51685
http_hook = HttpHook(method='GET', http_conn_id='jsonplaceholder')

for _ in range(2):
    http_hook.run(
        endpoint=f"/posts/1",
        extra_options=extra_options,
    )
    assert extra_options == original_extra_options, \
        f"extra_options: {extra_options}, original_extra_options: {original_extra_options}"

To fix this, two major changes were made. First, the dictionary that was passed into hook via the run or get_conn method was "deep copied" before being transformed. Since the previous logic relied on that dictionary being transformed in different scopes, a merged_extra was added as an instance-level attribute to the HttpHook class. The appropriate changes were also made to its async counterpart.

These changes were tested using the logic that was provided by @brki, as well as via the unit tests for this hook.

closes: #51686

Copy link

@Nataneljpwd Nataneljpwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I have added a few comments, the main concern being the use of copy instead of just dict.get, and a check for whether to run a specific function

@jroachgolf84 jroachgolf84 changed the title Ensuring HttpHook.run() does not alter extra_options passed to it Ensure HttpHook.run() does not alter extra_options passed to it Jun 19, 2025
@jroachgolf84
Copy link
Collaborator Author

Hello, I have added a few comments, the main concern being the use of copy instead of just dict.get, and a check for whether to run a specific function

I've addressed this concerns in each of your comments, thanks for the feedback!

@Nataneljpwd
Copy link

Checked it out again, answered all the comments.
Looks good to me!

Copy link

@Nataneljpwd Nataneljpwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just mark all the comments as resolved

@jroachgolf84 jroachgolf84 requested a review from uranusjr June 20, 2025 12:09
Copy link

@Nataneljpwd Nataneljpwd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!
Great job!

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the idea of storing self.merged_extra and would like to see if we can cleanly avoid doing it

If I'm following the logic right (and there's a decent chance I'm not) all the stuff we store in self.merge_extra is already set as a property on the session object which we return.

If that is the case isn't session already configured and we don't need merged_extra again after this?

@jroachgolf84
Copy link
Collaborator Author

jroachgolf84 commented Jun 24, 2025

I don't love the idea of storing self.merged_extra and would like to see if we can cleanly avoid doing it

If I'm following the logic right (and there's a decent chance I'm not) all the stuff we store in self.merge_extra is already set as a property on the session object which we return.

If that is the case isn't session already configured and we don't need merged_extra again after this?

@ashb - here is the explanation as to why this was implemented. It boils down to the get_conn method. In run, we call get_conn. This is where the session is created via the _configure_session_from_extra. However, there are additional fields NOT included in the session that is returned that must be used later, meaning that we need to hold onto some form of extra_options.

Eventually, some form of extra_options needs to be passed to run_and_check, which is looking for the timeout, allow_redirects, and check_response fields. This mapping should be the merged mapping, which will include the timeout, allow_redirects, and check_response fields that could have been either in the Connection "extras" or the original extra_options. If we didn't create something like self.merged_extra, we'd need to change the return type of get_conn, which would be a breaking change.

@RNHTTR RNHTTR merged commit 885e40c into apache:main Jun 26, 2025
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HttpHook.run() alters the extra_options dict passed to it
5 participants