Skip to content

gh-135640: Adds type checking to ElementTree.ElementTree constructor #135643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

abstractedfox
Copy link
Contributor

@abstractedfox abstractedfox commented Jun 17, 2025

Addresses issue #135640 by adding type checking to the constructor and to ElementTree._setroot, since both have the opportunity to assign _root. In addition to addressing the primary issue (possible data loss), this makes relevant tracebacks more descriptive.

It looks like there had once been asserts that did these checks. I couldn't find rationale for why they were commented out in blame (it looks like this was done on svn, and I'm not sure where to look to find those commit messages). I opted to use raise instead as I felt it would be more descriptive.

At present, the way iselement is written still allows an object of the wrong type to pass through if it has a tag attribute. Commit a72a98f shows there were once comments indicating that the current implementation was done for a reason, but it isn't clear to me why that is. I left it alone for now, but changing it to this:

def iselement(element):
    return isinstance(element, Element)

...also passed tests. If there's no disagreement, I think it would be good to change that too.

abstractedfox and others added 4 commits June 17, 2025 13:42
Need to switch machines to test on Linux
Add tests verifying the new type checking behavior
@python-cla-bot
Copy link

python-cla-bot bot commented Jun 17, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

vstinner
vstinner previously approved these changes Jun 18, 2025
@efimov-mikhail
Copy link
Contributor

efimov-mikhail commented Jun 18, 2025

Code changes and test look good to me.
But IMO, it's better to provide some news entry with blurb tool, since it slightly changes code behavior.

And I'd suggest to add test on _setroot function too.

@vstinner
Copy link
Member

@abstractedfox: Can you add a NEWS entry? You can also try https://blurb-it.herokuapp.com/

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the tests are expecting more attributes than the runtime for an "Element": compare check_element() and iselement().

@@ -527,7 +527,9 @@ class ElementTree:
"""
def __init__(self, element=None, file=None):
# assert element is None or iselement(element)
if element is not None and not iselement(element):
raise TypeError(f"element must be etree.Element, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate either writing xml.etree.Element or Element only.

I also suspect that the isinstance() check was removed so that elements not inheriting from xml.Element still work, or elements that implement the XML interface without directly inheriting from it.

And the minimal element interface is "tag".

@@ -247,6 +247,13 @@ def check_element(element):
self.assertRegex(repr(element), r"^<Element 't\xe4g' at 0x.*>$")
element = ET.Element("tag", key="value")

# Verify type checking for ElementTree constructor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a separate test method, say "test_constructor" and put that test before test_interface? TiA

@@ -247,6 +247,13 @@ def check_element(element):
self.assertRegex(repr(element), r"^<Element 't\xe4g' at 0x.*>$")
element = ET.Element("tag", key="value")

# Verify type checking for ElementTree constructor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add also a test where only an Element-like class is tested (only with a tag attribute, nothing else, not even inheriting from xml.etree.Element).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I tried this in a boilerplate file:

class element impl:
    def __init__(self, tag):
        self.tag = tag

..but an instance of that wasn't enough to be able to pass as a root element to an ElementTree instance, call write, and have it work as normal (assuming I understood you properly). I ended up needing to add iter(), items(), __len__(), text, and tail before it would. Was that what you meant?

As a side thought, if ElementTree supports custom element implementations, should I remove the type check and change iselement to check for all of those attributes instead?

@picnixz picnixz changed the title gh-135640: Adds type checking to ElementTree.ElementTree constructor, plus relevant tests gh-135640: Adds type checking to ElementTree.ElementTree constructor Jun 19, 2025
Adjusts iselement() based off feedback from picnixz. Since ElementTree
needs to be compatible with element-like objects, the function now checks
for all attributes that are necessary to instantiate an ElementTree object
and successfully write it to a file (preventing issue python#135640).
This commit also incorporates other feedback requesting moving the
tests and adding more tests, and adds a news entry.
@serhiy-storchaka serhiy-storchaka self-requested a review June 22, 2025 08:47
@serhiy-storchaka
Copy link
Member

iselement() was initially only used in assertions, and they were commented out in 3e8c189 (bpo-6472/gh-50721). I guess the reason of both was performance. Many of assertions were replaced with _assert_is_element() in 396e8fc (bpo-13782/gh-57991).

It is unlikely the the ElementTree constructor is used in tight loops, so restoring check for it should not hurm performance. Although, this is not the only way to set the root element -- in parse() it is set to what the parser returned.

If you are going to make the check more strict, note that other places require more Element methods: find(), findtext(), findall(), iterfind(). Which is write() more important than findtext()? But adding more attribute checks will make iselement() more slow and can break code which only uses the part of the Element interface. Even the current version of the PR can break a working code that does not need to call write().

@abstractedfox
Copy link
Contributor Author

Actually, the reason I encountered the original issue was because find() worked with a root object that was the wrong type; I'd passed the constructor an ElementTree.ElementTree instead of a root element, and was using .find() to interface with it (was expecting Element.find(), but they behaved the same way in my code), so it worked as normal until .write() was called. In my particular case, this pattern only happened when opening an existing file, so at that point the file would be destroyed.

I suppose the question now is, is this type of check the correct way to handle this? My opinion at the beginning was that the ElementTree class shouldn't demonstrate behavior that works sometimes and might break in an undocumented way when used with objects of a correct-ish type, but if it needs to support this type of usage (ie, to avoid breaking code that already depends on this) perhaps .write() itself should check for its minimum required attributes, and everything else should be left as it was? What do you think?


return (hasattr(element, 'tag') and hasattr(element, 'text') and
hasattr(element, 'tail') and callable(element.iter) and
callable(element.items) and callable(element.__len__))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function raise an exception if iter(), items() or __len__() method doesn't exist?

try:
tree = ET.ElementTree(element_like)
except Exception as err:
self.fail(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe test also _setroot() here?

@serhiy-storchaka
Copy link
Member

Note that the ElementTree constructor accepts also element=None (and it is default), but most of its methods don't work in this case. There are comment-out # assert self._root is not None in these methods, but they are not particularly useful -- they would simply change an AttributeError to AssertionError.

I suggest to leave iselement() as is for now. Testing only the "tag" attribute will catch most errors and reduce the risk of breaking existing code.

If something does not work -- user will get error in any case. It is a programming error, not an input data error, so its type is less important -- it is not supposed to be caught.

@abstractedfox
Copy link
Contributor Author

Could we at least put these checks in write itself so it will fail a little more gracefully in that condition? It's going to error in both cases, but it still feels less than ideal for it to damage a file opened for writing, especially if it would take a relatively small change to prevent that

@vstinner
Copy link
Member

If the PR changes iselement(), I don't think that it's safe to backport it to stable branches (3.13 and 3.14).

@vstinner vstinner dismissed their stale review June 24, 2025 16:25

PR (deeply) changed after my approval

@abstractedfox
Copy link
Contributor Author

At this point I suppose I'd like some guidance on which path is most acceptable; put the PR back in its original state (iselement unaltered, keep the type check I added; allows element-like objects but could still enable edge cases with .write()), leave constructor entirely untouched and add the checks to .write() itself (would not break any code that wasn't broken already, but prevents it from damaging an existing file), or change iselement to be more strict (seems like we don't want to do this for aforementioned reasons, which I agree with)

@vstinner
Copy link
Member

I would prefer to leave iselement() as it is:

def iselement(element):
    """Return True if *element* appears to be an Element."""
    return hasattr(element, 'tag')

And add iselement() checks in write() and _setroot().

@@ -0,0 +1,3 @@
Address bug where calling :func:`xml.etree.ElementTree.ElementTree.write` on
an ElementTree object with an invalid root element would blank the file
passed to ``write`` if it already existed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rewrite this in more simple manner? Split one sentence into two, for example.

self.tag = "tag"
self.text = None
self.tail = None
def iter(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's iter, not __iter__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of testing what .write() needs to complete successfully, it is (used in _namespaces in ElementTree.py). But actually, that test should be changed anyway since it was made in anticipation of the changes to iselement that have been undone

abstractedfox and others added 3 commits June 26, 2025 15:22
Co-authored-by: Mikhail Efimov <[email protected]>
Test was written for a change to `iselement` that has since been undone
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This does not completely prevent failure during writing. Even if you check for all attributes used by write(), as in your previous versions, you do not check that attributes has correct type, that methods are callable, that they have correct signature and semantic. You do not check that subelements has correct types. Checking all this would be expensive and will not guarantee that write() will not fail.
  2. This is not a problem that requires expensive solution. If your program contains such bug, it will be caught in the first test run. The difference between AttributeError and TypeError is not so important. You don't run your program on production files without testing, right?

The only way to guarantee that the output file will not be corrupted due to errors is to write to temporary file or in-memory file, and then atomically replace the content.

@@ -527,7 +527,10 @@ class ElementTree:
"""
def __init__(self, element=None, file=None):
# assert element is None or iselement(element)
if element is not None and not iselement(element):
raise TypeError(f"element must be xml.etree.Element or "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is xml.etree.ElementTree.Element.

Look at _assert_is_element(). We need a similar error message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my bad, I forgot the ElementTree part in my suggestion

@abstractedfox
Copy link
Contributor Author

@serhiy-storchaka hmm I see, do you suggest doing something differently overall? It does narrow the opportunities for the initial problem in a fairly low-overhead way in its present state, which I feel is an improvement

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@efimov-mikhail
Copy link
Contributor

LGTM. But it's better to add test for _setroot with ElementLike, IMO.

@@ -716,7 +716,8 @@ def write(self, file_or_filename,
"""
if not iselement(self._root):
raise TypeError(f"Root element must be of type xml.etree.Element "
raise TypeError(f"Root element must be "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for f-strings.

@serhiy-storchaka
Copy link
Member

Most likely reason of raising this exception is if the user forget to call parse(). Error message is wrong in that case.

Other possible cases -- when the parser returned wrong object or user set the _root attribute (omitting _setroot()) are much less common, and there is no reason to check this in write().

So the code raises wrong exception in most common case and raises unnecessary exception in less common cases. I suggest to remove check in write(). Or replace it with a check for self._root is None with relevant error message (uninitialized ElementTree or missed parse() call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting merge needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants