Hi Kent,

Well even before reading your last email I gave it a go, just parsing
the xml file and trying out some basic functions. It ran in less than
two seconds. I don't know why BeautifulSoup is taking so long...

Thanks for the "to get you started"!


Bernard



On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Bernard Lebel wrote:
> > The file size is 112 Kb. Most lines look this way:
> >
> > <parameter name="roty" type="Parameter" sourceclassname="nosource">
> >
> >
> > I'll give a try to ElementTree.
> 
> To get you started:
> 
> from elementtree import ElementTree
> doc = ElementTree.parse('myfile.xml')
> for sceneobject in doc.findall('//sceneobject'):
>   if sceneobject.get('type') == 'CameraRoot':
>     # this is a sceneobject that you want
>     print sceneobject.get('name')
> 
> One gotcha - if your XML uses namespaces, you have to prefix the namespace to 
> the tag name in findall(). It will look something like
>   d.findall('//{http://www.imsproject.org/xsd/imscp_rootv1p1p2}resource')
> 
> Let us know how long that takes...
> 
> Kent
> 
> >
> >
> > Bernard
> >
> >
> >
> > On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> >
> >>Bernard Lebel wrote:
> >>
> >>>Thanks for that pointer Kent, I'll check it out. Also thanks for
> >>>letting me know I'm not nuts! :-)
> >>>
> >>>Alan's suggestion about BeautifulSoup is actually excellent. The
> >>>documentation is nice and the tool is very easy to use.
> >>>
> >>>However is it normal that to parse a 2618 lines xml file it takes
> >>>20-30 seconds or so?
> >>
> >>That seems slow to me unless the lines are really long! How many bytes is 
> >>the file? But I don't have much experience with BeautifulSoup.
> >>
> >>ElementTree is fast and cElementTree (the C implementation) is really fast. 
> >>I have used it to read, process and write a 28 MB XML file, it took about 
> >>10 seconds.
> >>
> >>Kent
> >>
> >>
> >>>
> >>>Thanks
> >>>Bernard
> >>>
> >>>
> >>>
> >>>On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote:
> >>>
> >>>
> >>>>Bernard Lebel wrote:
> >>>>
> >>>>
> >>>>>Thanks Alan,
> >>>>>
> >>>>>I'll check BeautifulSoup asap.
> >>>>>
> >>>>>I'm using regex simply because I have no clue where to start to parse
> >>>>>XML. I have read the various xml tools available in the Python
> >>>>>library, however I'm a complete loss at what to make out of them. Many
> >>>>>of them seem to use some programming standards, wich I am completely
> >>>>>unfamiliar with (this is the first time that I dig into XML writing
> >>>>>and parsing).
> >>>>>
> >>>>>I don't know where to start to learn about all these standards, and as
> >>>>>usual with new programming things, the documentation is hard to
> >>>>>swallow (it usually is written more as a reference than a proper user
> >>>>>guide/tutorial). I have to admit this is very frustrating, so if I'm
> >>>>>looking at things from a wrong perspective please advise me, I need
> >>>>>it.
> >>>>
> >>>>I agree that the Python XML story is confusing even for the files in the 
> >>>>standard library. Worse, the (IMO) best solutions are not to be found in 
> >>>>the standard lib or PyXML at all.
> >>>>
> >>>>The std lib and PyXML are based on the DOM and SAX standards. These 
> >>>>standards were designed to be "language-neutral" - there are 
> >>>>implementations in Python, Java and other languages. The good side of 
> >>>>this is, if you learn how to use them, the knowledge is pretty portable 
> >>>>to other languages. The bad side is, the APIs defined by the standard are 
> >>>>IMO clunky and painful to use, especially in Python.
> >>>>
> >>>>There is a current thread on comp.lang.python discussing this with good 
> >>>>suggestions and pointers to more info:
> >>>>http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b
> >>>>
> >>>>My personal preference is ElementTree. Beautiful Soup is good too though 
> >>>>I have only tried it with HTML. If I was running on Linux I would try 
> >>>>lxml which uses the ElementTree API and adds full XPath support. Amara 
> >>>>looks like the Cadillac solution - big and cushy. I haven't tried it. 
> >>>>Uche's articles (referenced in the thread above) have pointers to many 
> >>>>other choices but these seem to be the most popular.
> >>>>
> >>>>My favorite XML lib is actually dom4j which is in Java. It works great 
> >>>>with Jython.
> >>>>
> >>>>Kent
> >>>>
> >>>>
> >>>>
> >>>>>So right now I'm just taking a shortcut and using ultra-simple
> >>>>>re-based parser to retrieve the tags I'm looking for. I know it will
> >>>>>probably be slow, but hopefully I'll get familiar with sophisticated
> >>>>>parsing in the future and improve my code. As it stands right now,
> >>>>>even the re syntax is not super easy to learn.
> >>>>
> >>>>For what you are doing re seems fine to me. You can get in trouble using 
> >>>>re's with XML because of nested tags, variations in spelling and order, 
> >>>>probably a bunch of other things. But for simple stuff it can work fine.
> >>>>
> >>>>Kent
> >>>>
> >>>>
> >>>>
> >>>>>Kent: That works (of course!). Thanks a bunch once again!
> >>>>>
> >>>>>
> >>>>>Thanks
> >>>>>Bernard
> >>>>>
> >>>>>On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>Hi Bernard,
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>>Hello, yet another regular expression question :-)
> >>>>>>>
> >>>>>>>So I have this xml file that I'm trying to find a
> >>>>>>>specific tag in.
> >>>>>>
> >>>>>>I'm always suspicious when I see regular expression
> >>>>>>and xml/html in the same context. regex are not good
> >>>>>>for parsing xml/html files and it's usually much easier
> >>>>>>to use a proper parser - such as beautiful soup.
> >>>>>>
> >>>>>>http://www.crummy.com/software/BeautifulSoup/
> >>>>>>
> >>>>>>Is there any special reason why you are using a regex
> >>>>>>sledgehammer to crack this particular nut? Or is it
> >>>>>>just to gain experience using regex?
> >>>>>>
> >>>>>>Alan G.
> >>>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Tutor maillist  -  Tutor@python.org
> >>>>>http://mail.python.org/mailman/listinfo/tutor
> >>>>>
> >>>>>
> >>>>
> >>>>_______________________________________________
> >>>>Tutor maillist  -  Tutor@python.org
> >>>>http://mail.python.org/mailman/listinfo/tutor
> >>>>
> >>>
> >>>_______________________________________________
> >>>Tutor maillist  -  Tutor@python.org
> >>>http://mail.python.org/mailman/listinfo/tutor
> >>>
> >>>
> >>
> >>_______________________________________________
> >>Tutor maillist  -  Tutor@python.org
> >>http://mail.python.org/mailman/listinfo/tutor
> >>
> >
> >
> >
> 
> _______________________________________________
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to