Escaping special XML characters in the API

There is a bug in the XML out put on the ranking systems. It should escape these characters like this.

" to  "
' to  '
< to  &lt;
> to  &gt;
& to  &amp

If these characters appear in the name or formula of a factor, it breaks.
I can change them by hand before parsing and it works fine but I need to do it programmatically, or have it fixed during the XML creation…
I have tried every possible way I can find online.
The most popular seems to be using the escape() function of the SAX library.
I get errors trying to do it. I think the solution needs occur before I attempt to parse the file since the etree parser is choking on it. Every possible solution I could find loads the xml from a string instead of a file, which is what I am doing. The best solution is to fix it at the source so the xml is well formatted at creation.

import lxml.etree as ET
from xml.sax.saxutils import escape

tree = ET.parse("P123_XML.xml")   
root = tree.getroot()


Yes, it’s probably best to make sure it’s a well-formed XML document at rest in the file for the parser to properly handle it. Note that if you were to create the XML programmatically using etree and then serialize it, the api would automatically escape special syntax characters so that it would be well-formed:

root = etree.Element("StockFormula", Name="A&B to C")
formula = etree.Element("Formula")
root.text = "A + B < C"
b'<StockFormula Name="A&amp;B to C">A + B &lt; C<Formula/></StockFormula>'

Right. But I am not creating the XML. I am copying it from the ranking system “text” area.
If this something that P123 is willing to fix on that end, then I’ll stop searching for a solution.


In that case you could use python’s open/read() calls to read in the xml text as a string without parsing it, then pass to the sax escape api.

But I would agree that the xml representation yielded by p123 should be well-formed, including syntax character escaping.

I’ve changed the raw editor to provide properly escaped text in XML output. Hopefully this resolves your issue.

Yes! Thank you Aaron.