ElementTree是python的XML解析模块,cElementTree是ElementTree的C语言实现。Python 2.5的标准库已经包含了ElementTree和cElementTree。
下面是从cElementTree网站得到的测试数据:
Here are some benchmark figures, using a number of popular XML toolkits to parse a 3405k document-style XML file, from disk to memory:
| library | time | space | notes |
|---|---|---|---|
| xml.dom.minidom (Python 2.1) | 6.3 s | 80000k | (1) |
| gnosis.objectify | 2.0 s | 22000k | (5) |
| xml.dom.minidom (Python 2.4) | 1.4 s | 53000k | (1) |
| ElementTree 1.2 | 1.6 s | 14500k | |
| ElementTree 1.2.4/1.3 | 1.1 s | 14500k | |
| cDomlette (C extension) | 0.540 s | 20500k | (1) |
| PyRXPU (C extension) | 0.175 s | 10850k | (2) |
| lxml.etree (C extension) | (4) | (4) | (3) |
| libxml2 (C extension) | 0.098 s | 16000k | (3) |
| readlines (read as utf-8) | 0.093 s | 8850k | |
| cElementTree (C extension) | 0.047 s | 4900k | |
| readlines (read as ascii) | 0.032 s | 5050k |
| library | time | throughput |
|---|---|---|
| xml.sax (Python 2.1) | 0.330 s | 10300 k/s |
| xml.sax (Python 2.4) | 0.292 s | 11700 k/s |
| xml.parsers.expat | 0.184 s | 18500 k/s |
| cElementTree XMLParser | 0.124 s | 27500 k/s |
| sgmlop | 0.092 s | 37000 k/s |
| cElementTree iterparse | 0.071 s | 48000 k/s |
ElementTree是一棵由元素节点构成的树,文本内容是作为元素的text或tail属性表现的,如ele.text。这点比DOM把元素和文本都作为节点的方式简洁、方便很多。element支持一些字典或列表的操作,属性用字典方式,子节点用列表。查找用find或findall函数。
| Operation | Result |
|---|---|
elem[n] | Returns n'th child element. |
elem[m:n] | Returns list of m'th through n'th child elements. |
len(elem) | Returns number of child elements. |
list(elem) | Returns list of child elements. |
elem.append(elem2) | Adds elem2 as a child. |
elem.insert(index, elem2) | Inserts elem2 at the specified location. |
del elem[n] | Deletes n'th child element. |
elem.keys() | Returns list of attribute names. |
elem.get(name) | Returns value of attribute name. |
elem.set(name, value) | Sets new value for attribute name. |
elem.attrib | Retrieves the dictionary containing attributes. |
del elem.attrib[name] | Deletes attribute name. |
确实是好东西,而且用起来非常方便,简单的写几行代码体验一下~~~
#Python2.4下的代码
import cElementTree as ET
#解析文件
tree = ET.parse('test.xml')
#获得根节点
root = tree.getroot()
#找到第一个tagformat标签
tag = root.find('tagformat')
#遍历所有的opt标签
for ele in tag.findall('opt'):
print ele.text
#获得属性
print root.get('name')
#修改或新建属性
root.set('user', 'liujunzhi')
#以utf-8编码保存
f = open('output.xml', 'w')
tree.write(f, encoding='utf-8')
f.close()
本文介绍了Python中cElementTree模块的基本使用方法及性能对比,通过示例展示了如何解析XML文件、操作元素节点及其属性。

3358

被折叠的 条评论
为什么被折叠?



