lxml, comparing nodes
I'd like to know if there is any built in mechanism in lxml that lets you check equality of two nodes from separate documents. I'd like it to ignore attribute order and so on. It would be even better if there was built in method for checking equality of whole documents (ignoring document order). Please let me know if you know of such method or existing scipt. I dont like reinventing the wheel :) -- http://mail.python.org/mailman/listinfo/python-list
Re: lxml, comparing nodes
On Jul 23, 6:29 pm, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Your requirements for a single Element are simple enough to write it in three > to five lines of Python code (depending on your definition of equality). > Checking this equality recursively is another two to three lines. Not complex > enough to be considered a wheel in the first place. Forgive my ignorance as I am new to both Python and lxml ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: lxml, comparing nodes
> off the top of my head (untested): > > >>> def equal(a, b): > ... if a.tag != b.tag or a.attrib != b.attrib: > ... return False > ... if a.text != b.text or a.tail != b.tail: > ... return False > ... if len(a) != len(b): > ... return False > ... if any(not equal(a, b) for a, b in zip(a, b)): > ... return False > ... return True > > this should work for arbitrary ET implementations (lxmk, xml.etree, ET, > etc). tweak as necessary. > > Thanks for help. Thats inspiring, tho not exactly what I need, coz ignoring document order is requirement (ignoring changes in order of different siblings of the same type, etc). I plan to try something like that: def xmlCmp(xmlStr1, xmlStr2): et1 = etree.XML(xmlStr1) et2 = etree.XML(xmlStr2) queue = [] tmpq = deque([et1]) tmpq2 = deque([et2]) while tmpq: el = tmpq.popleft() tmpq.extend(el) queue.append(el.tag) while queue: el = queue.pop() foundEl = findMatchingElem(el, et2) if foundEl: et1.remove(el) tmpq2.remove(foundEl) else: return False if len(tmpq2) == 0: return True else: return False def findMatchingElem(el, eTree): for elem in eTree: if elemCmp(el, elem): return elem return None def elemCmp(el1, el2): pass # yet to be implemented ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: lxml, comparing nodes
> If document order doesn't matter, try sorting the elements of each level in
> the two documents by some arbitrary deterministic key, such as (tag name,
> text, attr count, whatever), and then compare them in order, instead of trying
> to find matches in multiple passes. itertools.groupby() might be your friend
> here.
I think that sorting multiple times by each attribute will cost more
than I've managed to do:
from lxml import etree
from collections import deque
import string, re, time
def xmlEqual(xmlStr1, xmlStr2):
et1 = etree.XML(xmlStr1)
et2 = etree.XML(xmlStr2)
let1 = [x for x in et1.iter()]
let2 = [x for x in et2.iter()]
if len(let1) != len(let2):
return False
while let1:
el = let1.pop(0)
foundEl = findMatchingElem(el, let2)
if foundEl is None:
return False
let2.remove(foundEl)
return True
def findMatchingElem(el, eList):
for elem in eList:
if elemsEqual(el, elem):
return elem
return None
def elemsEqual(el1, el2):
if el1.tag != el2.tag or el1.attrib != el2.attrib:
return False
# no requirement for text checking for now
#if el1.text != el2.text or el1.tail != el2.tail:
#return False
path1 = el1.getroottree().getpath(el1)
path2 = el2.getroottree().getpath(el2)
idxRE = re.compile(r"(\[\d*\])")
path1 = idxRE.sub("", path1)
path2 = idxRE.sub("", path2)
if path1 != path2:
return False
return True
Notice that if documents are in exact same order, each element is
compared only once!
--
http://mail.python.org/mailman/listinfo/python-list
Re: lxml, comparing nodes
> Not in your code. > > Stefan Not sure what you mean, but I tested and so far every document with the same order of elements had number of comparisons equal to number of nodes. -- http://mail.python.org/mailman/listinfo/python-list
SWIG and char* newb questions :)
Hi i'm relatively new to Python and my C/C++ knowledge is near to None. Having said that I feel justified to ask stupid questions :) Ok now more seriously. I have question refering to char* used as function parameters to return values. I have read SWIG manual to find best way to overcome that, but there are many warnings about memory leaks and stuff, so I feel confused. Ok to put it more simply: how to safely define a variable in Python and have it modified by C/C++ function? Even better would be a way to make a tuple of return value and out parameters, but thats probably a lot more work. Any hint will be appreciated! -- http://mail.python.org/mailman/listinfo/python-list
Re: SWIG and char* newb questions :)
Ok I think I got it:
PyObject* myFuncXXX(char* p_1, int p_2, char* p_3, int p_4)
{
int res;
char _host[255] = "";
int _port;
res = funcXXX(p_1, p_2, p_3, p_4, _host, &_port);
PyObject* res1 = PyInt_FromLong(res);
PyObject* res2 = PyString_FromStringAndSize(_host, strlen(_host));
PyObject* res3 = PyInt_FromLong(_port);
PyObject* resTuple = PyTuple_New(3);
PyTuple_SetItem(resTuple, 0, res1);
PyTuple_SetItem(resTuple, 1, res2);
PyTuple_SetItem(resTuple, 2, res3);
return resTuple;
}
It seems to work when I put it into swig's "*.i" file.
me proud of me.self :D
--
http://mail.python.org/mailman/listinfo/python-list
