搜索XML文档
你好,
我一直在阅读有关ElementTreee和Element Path的信息,以便我可以使用
他们在DOM中找到正确的元素.不幸的是没有
这些似乎提供了像我可以找到元素的功能一样的XPATH
基于标签,属性值等
给我XPath像功能一样吗?
提前致谢
Gowri Schrieb:
LXML做到了.
迪兹
1月15日,下午3:49," Diez B. Roggisch"
LXML做到了.
迪兹
嗨,迪兹
我正在尝试lxml,找不到任何示例
帮助我解析带有名称空间的XML文件.例如,我的XML文件
看起来这样:
XMLNS:XSI =" http://www.w3.org/2001/xmlschema-instance"
XSI:示意图=" http://a.b.com/phedex requests.xsd">
t1_ral_mss
t2_london_ichep
t2_southgrid_bristol
/primaryds1/processedds1/tier
/primaryds2/processedds2/tier/block
如果我的Xpath查询是//请求,显然它是行不通的.在那儿
某种形式的名称空间注册等.
发出查询?示例代码将有很大帮助.
1月15日,晚上9:33,高里
ns0 ='{http://a.b.com/phedex}'
查询='%srequest/%sstatus'%(ns0,ns0)
另外,尽管不完美,但有些人发现这很有用:
http://gflanagan.net/site/python/uti...tfilter.py.txt
- test = '''<phedexData xmlns="http://a.b.com/phedex"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://a.b.com/phedex requests.xsd">
- <!-- Low priority replication request -->
- <request id="1234" last_update="1060199000.0">
- <status>
- <approved>T1_RAL_MSS</approved>
- <approved>T2_London_ICHEP</approved>
- <disapproved>T2_Southgrid_Bristol</
- disapproved>
- <pending/>
- <move_pending/>
- </status>
- <subscription open="1" priority="0" type="replicate">
- <items>
- <dataset>/PrimaryDS1/ProcessedDS1/
- Tier</dataset>
- <block>/PrimaryDS2/
- ProcessedDS2/Tier/block</block>
- </items>
- </subscription>
- </request>
- </phedexData>
- '''
- from xml.etree import ElementTree as ET
- root = ET.fromstring(test)
- ns0 = '{http://a.b.com/phedex}'
- from rattlebag.elementfilter import findall, data
- #http://gflanagan.net/site/python/utils/elementfilter/
- elementfilter.py.txt
- query0 = '%(ns)srequest/%(ns)sstatus' % {'ns': ns0}
- query1 = '%(ns)srequest/%(ns)ssubscription[@type=="replicate"]/%
- (ns)sitems' % {'ns': ns0}
- query2 = '%(ns)srequest[@id==1234]/%(ns)sstatus/%(ns)sapproved' %
- {'ns': ns0}
- print 'With ElementPath: '
- print root.findall(query0)
- print 'With ElementFilter:'
- for query in [query0, query1, query2]:
- print '+'*50
- print 'query: ', query
- for item in findall(root, query):
- print 'item: ', item
- print 'xml:'
- ET.dump(item)
- print '-'*50
- print 'approved: ', data(root, query2)
[输出]
与Element Path:
[]
使用元素滤波器:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
查询:{http://a.b.com/phedex} request/ {http://a.b.com/phedex} status
物品:
XML:
t1_ral_mss
t2_london_ichep
t2_southgrid_bristol
NS0:不赞成>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
查询:{http://a.b.com/phedex} request/{
http://a.b.com/
phedex}订阅[@Type
=="重复"]/{http://a.b.com/phedex}项目
物品:
XML:
/primaryds1/processedds1/
层
数据集>
/primaryds2/
processedds2/tier
/堵塞
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
查询:{http://a.b.com/phedex}请求[@ID == 1234]/{
http://a.b.com/
phedex}状态/
{http://a.b.com/phedex} appraved
物品:
XML:
t1_ral_mss
NS0:批准>
物品:
XML:
t2_london_ichep
NS0:批准>
------------------------------------------------------------------------------------
批准:['t1_ral_mss','t2_london_ichep']
信息结束记录.
[/输出]
嗨,杰拉德,
我不知道该说些什么:)非常感谢您抽出宝贵的时间发布
所有这些.真的很感激:)
Grflanagan写道:
创建您的查询:
ns0 ='{http://a.b.com/phedex}'
查询='%srequest/%sstatus'%(ns0,ns0)
LXML支持同一件事,顺便说一句,以及如何与名称空间一起工作
在教程中解释:
http://codespeak.net/lxml/dev/tutorial.html#namespaces
斯特凡
标签: python