There is a bug in the XML out put on the ranking systems. It should escape these characters like this.
" to "
' to '
< to <
> to >
& to &
If these characters appear in the name or formula of a factor, it breaks.
I can change them by hand before parsing and it works fine but I need to do it programmatically, or have it fixed during the XML creation…
I have tried every possible way I can find online.
The most popular seems to be using the escape() function of the SAX library.
I get errors trying to do it. I think the solution needs occur before I attempt to parse the file since the etree parser is choking on it. Every possible solution I could find loads the xml from a string instead of a file, which is what I am doing. The best solution is to fix it at the source so the xml is well formatted at creation.
import lxml.etree as ET
from xml.sax.saxutils import escape
tree = ET.parse("P123_XML.xml")
root = tree.getroot()
Yes, it’s probably best to make sure it’s a well-formed XML document at rest in the file for the parser to properly handle it. Note that if you were to create the XML programmatically using etree and then serialize it, the api would automatically escape special syntax characters so that it would be well-formed:
root = etree.Element("StockFormula", Name="A&B to C")
formula = etree.Element("Formula")
root.text = "A + B < C"
root.append(formula)
etree.tostring(root)
b'<StockFormula Name="A&B to C">A + B < C<Formula/></StockFormula>'
Right. But I am not creating the XML. I am copying it from the ranking system “text” area.
If this something that P123 is willing to fix on that end, then I’ll stop searching for a solution.