I have to digest an XML file that arrived with a UTF-8-BOM encoding. When I am outside of Rhino - using standard Python at a command prompt - I can open the file using this command:
FileHandle = open(Filename,"r", encoding='utf-8')
When I try the same command in Rhino it responds with the following error:
Message: open() got an unexpected keyword argument ‘encoding’
On the left side panel of the Python editor, if I click on <Python>, open it shows that the encoding parameter is not allowed. Although, at the very the bottom of the editor it provides a helpful link “more help on open can be found at:” which opens a web page showing the encoding parameter.
What is the story with the missing parameter? Is there a workaround for this?
This does load and creates a System.Xml.XmlDocument object. ElementTree and some other modules also open it without problems.
Originally, I wanted to use the xmltodict module which requires opening the file first. Instead, I just wrote a function to do the XML to dictionary conversion myself. It uses ElementTree, so I skirted the issue.
Still, it seems to be an issue that the open function doesn’t support the standard parameters. It could make life difficult for somebody someday.
Hi Henry,
It’s definitely possible, you just need to use the old school Python 2 method of opening the file in bytes mode, and decoding the bytes manually (note this dumps it all into one variable which will break if the file is bigger than available memory. If you know something about the encoding, you can work out which particular byte or ascii character invariant under the encoding (e.g. new lines or null bytes) to split the file on):
with open(Filename,'rb') as f:
bytes_ = f.read()
str_ = bytes_.decode(encoding = 'utf8')
If the Xml library needs it as a file object or stream you can do:
import io
io.StringIo(str_)
In CPython 2.7 you should be aware of str is bytes (i.e. ascii only) and unicode (plus unicode literals and even bytes literals for future proofing). As opposed to Python 3 where str is not bytes (and str supports unicode code points independent of encoding). Luckily in IronPython (and hence Rhino) str is a .Net System.String (which is actually UTF-16 encoded internally, but that’s an implementation detail that needn’t concern you). Most other operations can then be done in IronPython much like in Python 3.
This is a great solution. I hope that anyone looking on the forum for encoding problems finds your post and sample code. That would have solved my problem and saved some time.