Discussion:
[Generateds-users] Export of complex elements as unicode strings
Dave Kuhlman
2017-01-12 00:22:05 UTC
Permalink
Andrii,

Your suggestion sounds reasonable. We definitely do not want an
exception. I'll take a look. I need to try to make sure that the
change you suggest would not cause problems with Python 2 or Python 3
or some other data.

Thanks for the report.

By the way, I took a quick look at your Web site
(http://www.ebi.ac.uk/, right?) and it looks like you do incredible
and fascinating work.

Dave
Dear Dave,
We have encountered the following issue: when an export generated for a
outfile.write((quote_xml(self.valueOf_) if type(self.valueOf_) is str else
self.gds_encode(str(self.valueOf_))))
However, if this complex type element is a unicode string, then
str(self.valueOf_) results in UnicodeEncodeError. Please could you tell if
it would be possible to add a condition that, in addition to checking
whether type(self.valueOf_) is str, will check if type(self.valueOf_) is
unicode, and if so, then apply self.gds_encode directly to self.valueOf_?
Or, perhaps, there is a better solution to this?
Many thanks and best regards,
Andrii
--
Dave Kuhlman
http://www.davekuhlman.org
Dave Kuhlman
2017-01-17 00:17:23 UTC
Permalink
Andrii,

I believe that I have a fix for this, IFUICWITID (if I understand
it correctly which I think I do). I've tested with both Python 2
and Python 3 and it does what I expect with both.

I used this schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="container" type="containerType"/>

<xs:complexType name="containerType">
<xs:sequence>
<xs:element name="field1" type="xs:string" minOccurs="0"/>
<xs:element name="field2" type="xs:string" minOccurs="0"/>
<xs:element name="field3" type="simpleStringType" minOccurs="0"/>
<xs:element name="field4" type="simpleStringType" minOccurs="0"/>
</xs:sequence>
</xs:complexType>

<xs:complexType name="simpleStringType">
<xs:simpleContent>
<xs:extension base="xs:string">
</xs:extension>
</xs:simpleContent>
</xs:complexType>

</xs:schema>

And, I used the following XML instance doc: <?xml version="1.0"?> <container> <field3>an ascii string</field3> <field4>Selçuk &lt; &gt; İstanbul</field4>
</container>

Am I testing against the correct problem?

If so the fix is at Bitbucket:
https://bitbucket.org/dkuhlman/generateds

I'll also attach generateDS.py to a separate message.

Let me know if it fixes things for you.

Dave
Dear Dave,
We have encountered the following issue: when an export generated for a
outfile.write((quote_xml(self.valueOf_) if type(self.valueOf_) is str else
self.gds_encode(str(self.valueOf_))))
However, if this complex type element is a unicode string, then
str(self.valueOf_) results in UnicodeEncodeError. Please could you tell if
it would be possible to add a condition that, in addition to checking
whether type(self.valueOf_) is str, will check if type(self.valueOf_) is
unicode, and if so, then apply self.gds_encode directly to self.valueOf_?
Or, perhaps, there is a better solution to this?
Many thanks and best regards,
Andrii
--
Dave Kuhlman
http://www.davekuhlman.org
Dave Kuhlman
2017-01-17 21:33:39 UTC
Permalink
Andrii,

Thanks for catching this.

That fix seems good to me. I've added it. It's basically your fix,
I believe, with a little bit of extra caution added to protect
against converting a string that is not unicode. Perhaps that
cannot occur, but I worry. Oh, and I used isinstance to check the
type, which I should have done to begin with.

Attached is a patch. And, the change is also at the Bitbucket repo:
https://bitbucket.org/dkuhlman/generateds

Sigh. I finally figured out why my tests did not exhibit this bug.
I was exporting with the standard, unmodified module generated by
generateDS.py. That code uses sys.stdout to write its output, and
apparently the file sys.stdout does not throw an exception when you
write non-ascii, unicode characters to it, whereas a file opened
with open('xxx', 'w') does cause an exception. I wonder how many
more weird things there are for me to learn about unicode?

Thanks again for your help with this.

Dave
Dear Dave,
Many thanks for implementing and providing the fix. It has solved the
problem with the export part. However, now there is a problem with
outfile.write(self.convert_unicode(self.valueOf_))
Python seems to be refusing to write unicode into a file. It may be solved
by encoding to utf-8 in
...
result = quote_xml(instring).encode('utf8')
Please tell if this looks like a reasonable solution.
Also, thank you for praising the EBI site! Even though we do only a part of
the work (there are many teams here responsible for different services), it
is very nice to hear.
Best regards,
Andrii
generateDS.py is attached.
Dave
--
Dave Kuhlman
http://www.davekuhlman.org
Andrii Iudin
2017-01-18 10:20:03 UTC
Permalink
Dear Dave,

Thank you very much for adding another fix!
I agree, unicode can get quite tricky. We had to do quite a few
adjustments in our services to handle it.

Best regards,
Andrii
Post by Dave Kuhlman
Andrii,
Thanks for catching this.
That fix seems good to me. I've added it. It's basically your fix,
I believe, with a little bit of extra caution added to protect
against converting a string that is not unicode. Perhaps that
cannot occur, but I worry. Oh, and I used isinstance to check the
type, which I should have done to begin with.
https://bitbucket.org/dkuhlman/generateds
Sigh. I finally figured out why my tests did not exhibit this bug.
I was exporting with the standard, unmodified module generated by
generateDS.py. That code uses sys.stdout to write its output, and
apparently the file sys.stdout does not throw an exception when you
write non-ascii, unicode characters to it, whereas a file opened
with open('xxx', 'w') does cause an exception. I wonder how many
more weird things there are for me to learn about unicode?
Thanks again for your help with this.
Dave
Dear Dave,
Many thanks for implementing and providing the fix. It has solved the
problem with the export part. However, now there is a problem with
outfile.write(self.convert_unicode(self.valueOf_))
Python seems to be refusing to write unicode into a file. It may be solved
by encoding to utf-8 in
...
result = quote_xml(instring).encode('utf8')
Please tell if this looks like a reasonable solution.
Also, thank you for praising the EBI site! Even though we do only a part of
the work (there are many teams here responsible for different services), it
is very nice to hear.
Best regards,
Andrii
generateDS.py is attached.
Dave
Loading...