[Python] Python 2.6 和 2.7 在處理 ElementTree.write() 例外處理的差異

2015-12-21 ephrain Comments 0 Comment

今天專案的 python 程式遇到了個奇怪的問題，

用 ElementTree 寫出一個 XML 檔案時，遇到了 exception..

下面是一個測試程式：

import xml.etree.cElementTree as ET
root = ET.fromstring('<?xml version="1.0" encoding="UTF-8" ?><REPORT></REPORT>')
elem = ET.Element("S")
elem.text = "aaa\xb3\\\xa5\\\xbb\\ccc"
root.append(elem)
ET.ElementTree(root).write("test.xml", "UTF-8")

在 CentOS 7 上執行上面的程式，就會出現 UnicodeDecodeError 的 exception：

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 820, in write
serialize(write, self._root, encoding, qnames, namespaces)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 939, in _serialize_xml
_serialize_xml(write, e, encoding, qnames, None)
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 937, in _serialize_xml
write(_escape_cdata(text, encoding))
File "/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1073, in _escape_cdata
return text.encode(encoding, "xmlcharrefreplace")
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb3 in position 3: ordinal not in range(128)

奇怪的是同樣的一段程式在 CentOS 5.4 上跑，並沒有問題…

查了一下，原來是 python 版本的問題，

CentOS 5.4 上我們用的是 python 2.6，而在 CentOS 7 上用的是 python 2.7，

這兩個版本的 ElementTre 在處理字串時，處理方式是不同的～

Python 2.6

ElemenTree.write() -> _write() -> _escape_cdata()，

遇到 UnicodeError 時，會呼叫 _encode_entity()，

而在這函式中，如果遇到轉碼失敗的字元，會用 “&#%d;” 的方式寫出 XML，

因此不會遇到 exception：

def _encode_entity(text, pattern=_escape):
# map reserved and non-ascii characters to numerical entities
def escape_entities(m, map=_escape_map):
out = []
append = out.append
for char in m.group():
text = map.get(char)
if text is None:
text = "&#%d;" % ord(char)
append(text)
return string.join(out, "")
try:
return _encode(pattern.sub(escape_entities, text), "ascii")
except TypeError:
_raise_serialization_error(text)

Python 2.7

ElementTree.write() -> _serialize_xml() -> _escape_cdata()，

在此函式中，會用 encode() 函式將 text 字串轉成想要的編碼，

但它假設 text 字串一定是一個 Unicode 字串，

因此當它只是一個普通的 DBCS (像 BIG5) 字串時，就會出錯了：

def _escape_cdata(text, encoding):
# escape character data
try:
# it's worth avoiding do-nothing calls for strings that are
# shorter than 500 character, or so.  assume that's, by far,
# the most common case in most applications.
if "&" in text:
text = text.replace("&", "&amp;")
if "<" in text:
text = text.replace("<", "&lt;")
if ">" in text:
text = text.replace(">", "&gt;")
return text.encode(encoding, "xmlcharrefreplace")
except (TypeError, AttributeError):
_raise_serialization_error(text)

看起來 python 2.6 的處理方式似乎比較好一些，exception 比較不會發生，

但也許也有人希望這種狀況下能出現 exception，那就會比較喜歡 python 2.7 的作法～

這是我第一次來比較 python 2.6 & 2.7 的原始碼，

感覺挺有趣的，看起來小小的版號不同，處理上也還是會有許多差異啊～

(本頁面已被瀏覽過 541 次)

EPH 的程式日記

記錄程式設計生活的點點滴滴

[Python] Python 2.6 和 2.7 在處理 ElementTree.write() 例外處理的差異

2015-12-21 ephrain Comments 0 Comment

發佈留言取消回覆

發佈留言 取消回覆

發佈留言取消回覆