-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-135640: Adds type checking to ElementTree.ElementTree constructor #135643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Need to switch machines to test on Linux
Add tests verifying the new type checking behavior
Code changes and test look good to me. And I'd suggest to add test on |
@abstractedfox: Can you add a NEWS entry? You can also try https://blurb-it.herokuapp.com/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests are expecting more attributes than the runtime for an "Element": compare check_element() and iselement().
Lib/xml/etree/ElementTree.py
Outdated
@@ -527,7 +527,9 @@ class ElementTree: | |||
""" | |||
def __init__(self, element=None, file=None): | |||
# assert element is None or iselement(element) | |||
if element is not None and not iselement(element): | |||
raise TypeError(f"element must be etree.Element, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate either writing xml.etree.Element or Element only.
I also suspect that the isinstance() check was removed so that elements not inheriting from xml.Element still work, or elements that implement the XML interface without directly inheriting from it.
And the minimal element interface is "tag".
Lib/test/test_xml_etree.py
Outdated
@@ -247,6 +247,13 @@ def check_element(element): | |||
self.assertRegex(repr(element), r"^<Element 't\xe4g' at 0x.*>$") | |||
element = ET.Element("tag", key="value") | |||
|
|||
# Verify type checking for ElementTree constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a separate test method, say "test_constructor" and put that test before test_interface? TiA
Lib/test/test_xml_etree.py
Outdated
@@ -247,6 +247,13 @@ def check_element(element): | |||
self.assertRegex(repr(element), r"^<Element 't\xe4g' at 0x.*>$") | |||
element = ET.Element("tag", key="value") | |||
|
|||
# Verify type checking for ElementTree constructor | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add also a test where only an Element-like class is tested (only with a tag attribute, nothing else, not even inheriting from xml.etree.Element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I tried this in a boilerplate file:
class element impl:
def __init__(self, tag):
self.tag = tag
..but an instance of that wasn't enough to be able to pass as a root element to an ElementTree instance, call write
, and have it work as normal (assuming I understood you properly). I ended up needing to add iter()
, items()
, __len__()
, text
, and tail
before it would. Was that what you meant?
As a side thought, if ElementTree supports custom element implementations, should I remove the type check and change iselement
to check for all of those attributes instead?
Adjusts iselement() based off feedback from picnixz. Since ElementTree needs to be compatible with element-like objects, the function now checks for all attributes that are necessary to instantiate an ElementTree object and successfully write it to a file (preventing issue python#135640). This commit also incorporates other feedback requesting moving the tests and adding more tests, and adds a news entry.
It is unlikely the the If you are going to make the check more strict, note that other places require more Element methods: |
Actually, the reason I encountered the original issue was because I suppose the question now is, is this type of check the correct way to handle this? My opinion at the beginning was that the ElementTree class shouldn't demonstrate behavior that works sometimes and might break in an undocumented way when used with objects of a correct-ish type, but if it needs to support this type of usage (ie, to avoid breaking code that already depends on this) perhaps |
Lib/xml/etree/ElementTree.py
Outdated
|
||
return (hasattr(element, 'tag') and hasattr(element, 'text') and | ||
hasattr(element, 'tail') and callable(element.iter) and | ||
callable(element.items) and callable(element.__len__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function raise an exception if iter(), items() or __len__()
method doesn't exist?
try: | ||
tree = ET.ElementTree(element_like) | ||
except Exception as err: | ||
self.fail(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe test also _setroot() here?
Note that the I suggest to leave If something does not work -- user will get error in any case. It is a programming error, not an input data error, so its type is less important -- it is not supposed to be caught. |
Could we at least put these checks in |
If the PR changes |
At this point I suppose I'd like some guidance on which path is most acceptable; put the PR back in its original state ( |
I would prefer to leave iselement() as it is: def iselement(element):
"""Return True if *element* appears to be an Element."""
return hasattr(element, 'tag') And add iselement() checks in |
@@ -0,0 +1,3 @@ | |||
Address bug where calling :func:`xml.etree.ElementTree.ElementTree.write` on | |||
an ElementTree object with an invalid root element would blank the file | |||
passed to ``write`` if it already existed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rewrite this in more simple manner? Split one sentence into two, for example.
Lib/test/test_xml_etree.py
Outdated
self.tag = "tag" | ||
self.text = None | ||
self.tail = None | ||
def iter(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's iter
, not __iter__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of testing what .write() needs to complete successfully, it is (used in _namespaces
in ElementTree.py). But actually, that test should be changed anyway since it was made in anticipation of the changes to iselement
that have been undone
Co-authored-by: Mikhail Efimov <[email protected]>
Test was written for a change to `iselement` that has since been undone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This does not completely prevent failure during writing. Even if you check for all attributes used by
write()
, as in your previous versions, you do not check that attributes has correct type, that methods are callable, that they have correct signature and semantic. You do not check that subelements has correct types. Checking all this would be expensive and will not guarantee thatwrite()
will not fail. - This is not a problem that requires expensive solution. If your program contains such bug, it will be caught in the first test run. The difference between AttributeError and TypeError is not so important. You don't run your program on production files without testing, right?
The only way to guarantee that the output file will not be corrupted due to errors is to write to temporary file or in-memory file, and then atomically replace the content.
Lib/xml/etree/ElementTree.py
Outdated
@@ -527,7 +527,10 @@ class ElementTree: | |||
""" | |||
def __init__(self, element=None, file=None): | |||
# assert element is None or iselement(element) | |||
if element is not None and not iselement(element): | |||
raise TypeError(f"element must be xml.etree.Element or " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is xml.etree.ElementTree.Element
.
Look at _assert_is_element()
. We need a similar error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my bad, I forgot the ElementTree
part in my suggestion
@serhiy-storchaka hmm I see, do you suggest doing something differently overall? It does narrow the opportunities for the initial problem in a fairly low-overhead way in its present state, which I feel is an improvement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM. But it's better to add test for |
@@ -716,7 +716,8 @@ def write(self, file_or_filename, | |||
""" | |||
if not iselement(self._root): | |||
raise TypeError(f"Root element must be of type xml.etree.Element " | |||
raise TypeError(f"Root element must be " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for f-strings.
Most likely reason of raising this exception is if the user forget to call Other possible cases -- when the parser returned wrong object or user set the So the code raises wrong exception in most common case and raises unnecessary exception in less common cases. I suggest to remove check in |
Addresses issue #135640 by adding type checking to the constructor and to
ElementTree._setroot
, since both have the opportunity to assign _root. In addition to addressing the primary issue (possible data loss), this makes relevant tracebacks more descriptive.It looks like there had once been asserts that did these checks. I couldn't find rationale for why they were commented out in blame (it looks like this was done on svn, and I'm not sure where to look to find those commit messages). I opted to use
raise
instead as I felt it would be more descriptive.At present, the way
iselement
is written still allows an object of the wrong type to pass through if it has atag
attribute. Commit a72a98f shows there were once comments indicating that the current implementation was done for a reason, but it isn't clear to me why that is. I left it alone for now, but changing it to this:...also passed tests. If there's no disagreement, I think it would be good to change that too.