Hi Kent, Well even before reading your last email I gave it a go, just parsing the xml file and trying out some basic functions. It ran in less than two seconds. I don't know why BeautifulSoup is taking so long...
Thanks for the "to get you started"! Bernard On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > Bernard Lebel wrote: > > The file size is 112 Kb. Most lines look this way: > > > > <parameter name="roty" type="Parameter" sourceclassname="nosource"> > > > > > > I'll give a try to ElementTree. > > To get you started: > > from elementtree import ElementTree > doc = ElementTree.parse('myfile.xml') > for sceneobject in doc.findall('//sceneobject'): > if sceneobject.get('type') == 'CameraRoot': > # this is a sceneobject that you want > print sceneobject.get('name') > > One gotcha - if your XML uses namespaces, you have to prefix the namespace to > the tag name in findall(). It will look something like > d.findall('//{http://www.imsproject.org/xsd/imscp_rootv1p1p2}resource') > > Let us know how long that takes... > > Kent > > > > > > > Bernard > > > > > > > > On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > > > >>Bernard Lebel wrote: > >> > >>>Thanks for that pointer Kent, I'll check it out. Also thanks for > >>>letting me know I'm not nuts! :-) > >>> > >>>Alan's suggestion about BeautifulSoup is actually excellent. The > >>>documentation is nice and the tool is very easy to use. > >>> > >>>However is it normal that to parse a 2618 lines xml file it takes > >>>20-30 seconds or so? > >> > >>That seems slow to me unless the lines are really long! How many bytes is > >>the file? But I don't have much experience with BeautifulSoup. > >> > >>ElementTree is fast and cElementTree (the C implementation) is really fast. > >>I have used it to read, process and write a 28 MB XML file, it took about > >>10 seconds. > >> > >>Kent > >> > >> > >>> > >>>Thanks > >>>Bernard > >>> > >>> > >>> > >>>On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > >>> > >>> > >>>>Bernard Lebel wrote: > >>>> > >>>> > >>>>>Thanks Alan, > >>>>> > >>>>>I'll check BeautifulSoup asap. > >>>>> > >>>>>I'm using regex simply because I have no clue where to start to parse > >>>>>XML. I have read the various xml tools available in the Python > >>>>>library, however I'm a complete loss at what to make out of them. Many > >>>>>of them seem to use some programming standards, wich I am completely > >>>>>unfamiliar with (this is the first time that I dig into XML writing > >>>>>and parsing). > >>>>> > >>>>>I don't know where to start to learn about all these standards, and as > >>>>>usual with new programming things, the documentation is hard to > >>>>>swallow (it usually is written more as a reference than a proper user > >>>>>guide/tutorial). I have to admit this is very frustrating, so if I'm > >>>>>looking at things from a wrong perspective please advise me, I need > >>>>>it. > >>>> > >>>>I agree that the Python XML story is confusing even for the files in the > >>>>standard library. Worse, the (IMO) best solutions are not to be found in > >>>>the standard lib or PyXML at all. > >>>> > >>>>The std lib and PyXML are based on the DOM and SAX standards. These > >>>>standards were designed to be "language-neutral" - there are > >>>>implementations in Python, Java and other languages. The good side of > >>>>this is, if you learn how to use them, the knowledge is pretty portable > >>>>to other languages. The bad side is, the APIs defined by the standard are > >>>>IMO clunky and painful to use, especially in Python. > >>>> > >>>>There is a current thread on comp.lang.python discussing this with good > >>>>suggestions and pointers to more info: > >>>>http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b > >>>> > >>>>My personal preference is ElementTree. Beautiful Soup is good too though > >>>>I have only tried it with HTML. If I was running on Linux I would try > >>>>lxml which uses the ElementTree API and adds full XPath support. Amara > >>>>looks like the Cadillac solution - big and cushy. I haven't tried it. > >>>>Uche's articles (referenced in the thread above) have pointers to many > >>>>other choices but these seem to be the most popular. > >>>> > >>>>My favorite XML lib is actually dom4j which is in Java. It works great > >>>>with Jython. > >>>> > >>>>Kent > >>>> > >>>> > >>>> > >>>>>So right now I'm just taking a shortcut and using ultra-simple > >>>>>re-based parser to retrieve the tags I'm looking for. I know it will > >>>>>probably be slow, but hopefully I'll get familiar with sophisticated > >>>>>parsing in the future and improve my code. As it stands right now, > >>>>>even the re syntax is not super easy to learn. > >>>> > >>>>For what you are doing re seems fine to me. You can get in trouble using > >>>>re's with XML because of nested tags, variations in spelling and order, > >>>>probably a bunch of other things. But for simple stuff it can work fine. > >>>> > >>>>Kent > >>>> > >>>> > >>>> > >>>>>Kent: That works (of course!). Thanks a bunch once again! > >>>>> > >>>>> > >>>>>Thanks > >>>>>Bernard > >>>>> > >>>>>On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote: > >>>>> > >>>>> > >>>>> > >>>>>>Hi Bernard, > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Hello, yet another regular expression question :-) > >>>>>>> > >>>>>>>So I have this xml file that I'm trying to find a > >>>>>>>specific tag in. > >>>>>> > >>>>>>I'm always suspicious when I see regular expression > >>>>>>and xml/html in the same context. regex are not good > >>>>>>for parsing xml/html files and it's usually much easier > >>>>>>to use a proper parser - such as beautiful soup. > >>>>>> > >>>>>>http://www.crummy.com/software/BeautifulSoup/ > >>>>>> > >>>>>>Is there any special reason why you are using a regex > >>>>>>sledgehammer to crack this particular nut? Or is it > >>>>>>just to gain experience using regex? > >>>>>> > >>>>>>Alan G. > >>>>>> > >>>>> > >>>>>_______________________________________________ > >>>>>Tutor maillist - Tutor@python.org > >>>>>http://mail.python.org/mailman/listinfo/tutor > >>>>> > >>>>> > >>>> > >>>>_______________________________________________ > >>>>Tutor maillist - Tutor@python.org > >>>>http://mail.python.org/mailman/listinfo/tutor > >>>> > >>> > >>>_______________________________________________ > >>>Tutor maillist - Tutor@python.org > >>>http://mail.python.org/mailman/listinfo/tutor > >>> > >>> > >> > >>_______________________________________________ > >>Tutor maillist - Tutor@python.org > >>http://mail.python.org/mailman/listinfo/tutor > >> > > > > > > > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor