gh-135640: Adds type checking to ElementTree.ElementTree constructor #135643

abstractedfox · 2025-06-17T20:39:14Z

Addresses issue #135640 by adding type checking to the constructor and to ElementTree._setroot, since both have the opportunity to assign _root. In addition to addressing the primary issue (possible data loss), this makes relevant tracebacks more descriptive.

It looks like there had once been asserts that did these checks. I couldn't find rationale for why they were commented out in blame (it looks like this was done on svn, and I'm not sure where to look to find those commit messages). I opted to use raise instead as I felt it would be more descriptive.

At present, the way iselement is written still allows an object of the wrong type to pass through if it has a tag attribute. Commit a72a98f shows there were once comments indicating that the current implementation was done for a reason, but it isn't clear to me why that is. I left it alone for now, but changing it to this:

def iselement(element):
    return isinstance(element, Element)

...also passed tests. If there's no disagreement, I think it would be good to change that too.

Issue: Calling ElementTree.write on an ElementTree instance where ElementTree._root is the wrong type causes possible data loss #135640

Need to switch machines to test on Linux

Add tests verifying the new type checking behavior

python-cla-bot · 2025-06-17T20:39:19Z

All commit authors signed the Contributor License Agreement.

efimov-mikhail · 2025-06-18T20:06:16Z

Code changes and test look good to me.
But IMO, it's better to provide some news entry with blurb tool, since it slightly changes code behavior.

And I'd suggest to add test on _setroot function too.

vstinner · 2025-06-19T09:19:18Z

@abstractedfox: Can you add a NEWS entry? You can also try https://blurb-it.herokuapp.com/

picnixz

I think the tests are expecting more attributes than the runtime for an "Element": compare check_element() and iselement().

picnixz · 2025-06-19T09:54:35Z

Lib/xml/etree/ElementTree.py

@@ -527,7 +527,9 @@ class ElementTree:

    """
    def __init__(self, element=None, file=None):
-        # assert element is None or iselement(element)
+        if element is not None and not iselement(element):
+            raise TypeError(f"element must be etree.Element, "


I would appreciate either writing xml.etree.Element or Element only.

I also suspect that the isinstance() check was removed so that elements not inheriting from xml.Element still work, or elements that implement the XML interface without directly inheriting from it.

And the minimal element interface is "tag".

picnixz · 2025-06-19T09:57:51Z

Lib/test/test_xml_etree.py

@@ -247,6 +247,13 @@ def check_element(element):
        self.assertRegex(repr(element), r"^<Element 't\xe4g' at 0x.*>$")
        element = ET.Element("tag", key="value")

+        # Verify type checking for ElementTree constructor


Can you make a separate test method, say "test_constructor" and put that test before test_interface? TiA

picnixz · 2025-06-19T10:03:30Z

Lib/test/test_xml_etree.py

@@ -247,6 +247,13 @@ def check_element(element):
        self.assertRegex(repr(element), r"^<Element 't\xe4g' at 0x.*>$")
        element = ET.Element("tag", key="value")

+        # Verify type checking for ElementTree constructor
+


Add also a test where only an Element-like class is tested (only with a tag attribute, nothing else, not even inheriting from xml.etree.Element).

Hmm, I tried this in a boilerplate file:

class element impl: def __init__(self, tag): self.tag = tag

..but an instance of that wasn't enough to be able to pass as a root element to an ElementTree instance, call write, and have it work as normal (assuming I understood you properly). I ended up needing to add iter(), items(), __len__(), text, and tail before it would. Was that what you meant?

As a side thought, if ElementTree supports custom element implementations, should I remove the type check and change iselement to check for all of those attributes instead?

Adjusts iselement() based off feedback from picnixz. Since ElementTree needs to be compatible with element-like objects, the function now checks for all attributes that are necessary to instantiate an ElementTree object and successfully write it to a file (preventing issue python#135640). This commit also incorporates other feedback requesting moving the tests and adding more tests, and adds a news entry.

serhiy-storchaka · 2025-06-22T09:27:57Z

iselement() was initially only used in assertions, and they were commented out in 3e8c189 (bpo-6472/gh-50721). I guess the reason of both was performance. Many of assertions were replaced with _assert_is_element() in 396e8fc (bpo-13782/gh-57991).

It is unlikely the the ElementTree constructor is used in tight loops, so restoring check for it should not hurm performance. Although, this is not the only way to set the root element -- in parse() it is set to what the parser returned.

If you are going to make the check more strict, note that other places require more Element methods: find(), findtext(), findall(), iterfind(). Which is write() more important than findtext()? But adding more attribute checks will make iselement() more slow and can break code which only uses the part of the Element interface. Even the current version of the PR can break a working code that does not need to call write().

abstractedfox · 2025-06-22T17:07:55Z

Actually, the reason I encountered the original issue was because find() worked with a root object that was the wrong type; I'd passed the constructor an ElementTree.ElementTree instead of a root element, and was using .find() to interface with it (was expecting Element.find(), but they behaved the same way in my code), so it worked as normal until .write() was called. In my particular case, this pattern only happened when opening an existing file, so at that point the file would be destroyed.

I suppose the question now is, is this type of check the correct way to handle this? My opinion at the beginning was that the ElementTree class shouldn't demonstrate behavior that works sometimes and might break in an undocumented way when used with objects of a correct-ish type, but if it needs to support this type of usage (ie, to avoid breaking code that already depends on this) perhaps .write() itself should check for its minimum required attributes, and everything else should be left as it was? What do you think?

vstinner · 2025-06-23T10:33:38Z

Lib/xml/etree/ElementTree.py

-
+    return (hasattr(element, 'tag') and hasattr(element, 'text') and
+        hasattr(element, 'tail') and callable(element.iter) and
+        callable(element.items) and callable(element.__len__))


Does this function raise an exception if iter(), items() or __len__() method doesn't exist?

vstinner · 2025-06-23T10:34:50Z

Lib/test/test_xml_etree.py

+        try:
+            tree = ET.ElementTree(element_like)
+        except Exception as err:
+            self.fail(err)


Maybe test also _setroot() here?

serhiy-storchaka · 2025-06-23T15:50:40Z

Note that the ElementTree constructor accepts also element=None (and it is default), but most of its methods don't work in this case. There are comment-out # assert self._root is not None in these methods, but they are not particularly useful -- they would simply change an AttributeError to AssertionError.

I suggest to leave iselement() as is for now. Testing only the "tag" attribute will catch most errors and reduce the risk of breaking existing code.

If something does not work -- user will get error in any case. It is a programming error, not an input data error, so its type is less important -- it is not supposed to be caught.

abstractedfox · 2025-06-23T16:50:53Z

Could we at least put these checks in write itself so it will fail a little more gracefully in that condition? It's going to error in both cases, but it still feels less than ideal for it to damage a file opened for writing, especially if it would take a relatively small change to prevent that

vstinner · 2025-06-24T16:24:55Z

If the PR changes iselement(), I don't think that it's safe to backport it to stable branches (3.13 and 3.14).

PR (deeply) changed after my approval

abstractedfox · 2025-06-24T18:10:11Z

At this point I suppose I'd like some guidance on which path is most acceptable; put the PR back in its original state (iselement unaltered, keep the type check I added; allows element-like objects but could still enable edge cases with .write()), leave constructor entirely untouched and add the checks to .write() itself (would not break any code that wasn't broken already, but prevents it from damaging an existing file), or change iselement to be more strict (seems like we don't want to do this for aforementioned reasons, which I agree with)

vstinner · 2025-06-26T13:26:27Z

I would prefer to leave iselement() as it is:

def iselement(element):
    """Return True if *element* appears to be an Element."""
    return hasattr(element, 'tag')

And add iselement() checks in write() and _setroot().

Lib/xml/etree/ElementTree.py

efimov-mikhail · 2025-06-26T18:57:51Z

Misc/NEWS.d/next/Library/2025-06-22-02-16-17.gh-issue-135640.FXyFL6.rst

@@ -0,0 +1,3 @@
+Address bug where calling :func:`xml.etree.ElementTree.ElementTree.write` on
+an ElementTree object with an invalid root element would blank the file
+passed to ``write`` if it already existed.


Maybe rewrite this in more simple manner? Split one sentence into two, for example.

efimov-mikhail · 2025-06-26T19:00:18Z

Lib/test/test_xml_etree.py

+                self.tag = "tag"
+                self.text = None
+                self.tail = None
+            def iter(self):


It's iter, not __iter__?

For the sake of testing what .write() needs to complete successfully, it is (used in _namespaces in ElementTree.py). But actually, that test should be changed anyway since it was made in anticipation of the changes to iselement that have been undone

Co-authored-by: Mikhail Efimov <[email protected]>

Test was written for a change to `iselement` that has since been undone

serhiy-storchaka

This does not completely prevent failure during writing. Even if you check for all attributes used by write(), as in your previous versions, you do not check that attributes has correct type, that methods are callable, that they have correct signature and semantic. You do not check that subelements has correct types. Checking all this would be expensive and will not guarantee that write() will not fail.
This is not a problem that requires expensive solution. If your program contains such bug, it will be caught in the first test run. The difference between AttributeError and TypeError is not so important. You don't run your program on production files without testing, right?

The only way to guarantee that the output file will not be corrupted due to errors is to write to temporary file or in-memory file, and then atomically replace the content.

serhiy-storchaka · 2025-06-27T07:11:50Z

Lib/xml/etree/ElementTree.py

@@ -527,7 +527,10 @@ class ElementTree:

    """
    def __init__(self, element=None, file=None):
-        # assert element is None or iselement(element)
+        if element is not None and not iselement(element):
+            raise TypeError(f"element must be xml.etree.Element or "


It is xml.etree.ElementTree.Element.

Look at _assert_is_element(). We need a similar error message.

That's my bad, I forgot the ElementTree part in my suggestion

abstractedfox · 2025-06-27T17:39:26Z

@serhiy-storchaka hmm I see, do you suggest doing something differently overall? It does narrow the opportunities for the initial problem in a fairly low-overhead way in its present state, which I feel is an improvement

vstinner

LGTM

efimov-mikhail · 2025-07-01T16:32:48Z

LGTM. But it's better to add test for _setroot with ElementLike, IMO.

efimov-mikhail · 2025-07-01T16:33:41Z

Lib/xml/etree/ElementTree.py

@@ -716,7 +716,8 @@ def write(self, file_or_filename,

        """
        if not iselement(self._root):
-            raise TypeError(f"Root element must be of type xml.etree.Element "
+            raise TypeError(f"Root element must be "


There is no need for f-strings.

serhiy-storchaka · 2025-07-01T16:37:44Z

Most likely reason of raising this exception is if the user forget to call parse(). Error message is wrong in that case.

Other possible cases -- when the parser returned wrong object or user set the _root attribute (omitting _setroot()) are much less common, and there is no reason to check this in write().

So the code raises wrong exception in most common case and raises unnecessary exception in less common cases. I suggest to remove check in write(). Or replace it with a check for self._root is None with relevant error message (uninitialized ElementTree or missed parse() call).

abstractedfox and others added 4 commits June 17, 2025 13:42

Tenuous first fix

b0a8a64

Need to switch machines to test on Linux

Remove commented assertions

f0356c8

Add tests for ElementTree constructor

06eaff5

Add tests verifying the new type checking behavior

Add appropriate text to an exception

6d4d65f

bedevere-app bot added the awaiting review label Jun 17, 2025

bedevere-app bot mentioned this pull request Jun 17, 2025

Calling ElementTree.write on an ElementTree instance where ElementTree._root is the wrong type causes possible data loss #135640

Open

vstinner added needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes labels Jun 18, 2025

vstinner previously approved these changes Jun 18, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jun 18, 2025

picnixz reviewed Jun 19, 2025

View reviewed changes

picnixz changed the title ~~gh-135640: Adds type checking to ElementTree.ElementTree constructor, plus relevant tests~~ gh-135640: Adds type checking to ElementTree.ElementTree constructor Jun 19, 2025

serhiy-storchaka self-requested a review June 22, 2025 08:47

vstinner reviewed Jun 23, 2025

View reviewed changes

bedevere-app bot added awaiting review and removed awaiting merge labels Jun 24, 2025

Revert iselement to previous state

1d2f0d9

abstractedfox added 2 commits June 26, 2025 14:00

Add iselement check to ElementTree.write

d5bbe23

Make errors more descriptive

4e2f822

efimov-mikhail reviewed Jun 26, 2025

View reviewed changes

Lib/xml/etree/ElementTree.py Outdated Show resolved Hide resolved

efimov-mikhail reviewed Jun 26, 2025

View reviewed changes

abstractedfox and others added 3 commits June 26, 2025 15:22

Fix whitespace

8b588f3

Co-authored-by: Mikhail Efimov <[email protected]>

Rephrase news entry

e1e489e

Test: remove unnecessary fields for element-like

28fcc69

Test was written for a change to `iselement` that has since been undone

serhiy-storchaka reviewed Jun 27, 2025

View reviewed changes

Change error messages

78118ea

vstinner approved these changes Jul 1, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jul 1, 2025

efimov-mikhail reviewed Jul 1, 2025

View reviewed changes

Uh oh!

gh-135640: Adds type checking to ElementTree.ElementTree constructor #135643

Are you sure you want to change the base?

gh-135640: Adds type checking to ElementTree.ElementTree constructor #135643

Conversation

abstractedfox commented Jun 17, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot bot commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

efimov-mikhail commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented Jun 19, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Jun 22, 2025

Uh oh!

abstractedfox commented Jun 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Jun 23, 2025

Uh oh!

abstractedfox commented Jun 23, 2025

Uh oh!

vstinner commented Jun 24, 2025

Uh oh!

abstractedfox commented Jun 24, 2025

Uh oh!

vstinner commented Jun 26, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abstractedfox commented Jun 27, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

efimov-mikhail commented Jul 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Jul 1, 2025

Uh oh!

Uh oh!

abstractedfox commented Jun 17, 2025 •

edited by bedevere-app bot

Loading

python-cla-bot bot commented Jun 17, 2025 •

edited

Loading

efimov-mikhail commented Jun 18, 2025 •

edited

Loading