Bernard Lebel wrote: > Thanks for that pointer Kent, I'll check it out. Also thanks for > letting me know I'm not nuts! :-) > > Alan's suggestion about BeautifulSoup is actually excellent. The > documentation is nice and the tool is very easy to use. > > However is it normal that to parse a 2618 lines xml file it takes > 20-30 seconds or so?
That seems slow to me unless the lines are really long! How many bytes is the file? But I don't have much experience with BeautifulSoup. ElementTree is fast and cElementTree (the C implementation) is really fast. I have used it to read, process and write a 28 MB XML file, it took about 10 seconds. Kent > > > Thanks > Bernard > > > > On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > >>Bernard Lebel wrote: >> >>>Thanks Alan, >>> >>>I'll check BeautifulSoup asap. >>> >>>I'm using regex simply because I have no clue where to start to parse >>>XML. I have read the various xml tools available in the Python >>>library, however I'm a complete loss at what to make out of them. Many >>>of them seem to use some programming standards, wich I am completely >>>unfamiliar with (this is the first time that I dig into XML writing >>>and parsing). >>> >>>I don't know where to start to learn about all these standards, and as >>>usual with new programming things, the documentation is hard to >>>swallow (it usually is written more as a reference than a proper user >>>guide/tutorial). I have to admit this is very frustrating, so if I'm >>>looking at things from a wrong perspective please advise me, I need >>>it. >> >>I agree that the Python XML story is confusing even for the files in the >>standard library. Worse, the (IMO) best solutions are not to be found in the >>standard lib or PyXML at all. >> >>The std lib and PyXML are based on the DOM and SAX standards. These standards >>were designed to be "language-neutral" - there are implementations in Python, >>Java and other languages. The good side of this is, if you learn how to use >>them, the knowledge is pretty portable to other languages. The bad side is, >>the APIs defined by the standard are IMO clunky and painful to use, >>especially in Python. >> >>There is a current thread on comp.lang.python discussing this with good >>suggestions and pointers to more info: >>http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b >> >>My personal preference is ElementTree. Beautiful Soup is good too though I >>have only tried it with HTML. If I was running on Linux I would try lxml >>which uses the ElementTree API and adds full XPath support. Amara looks like >>the Cadillac solution - big and cushy. I haven't tried it. Uche's articles >>(referenced in the thread above) have pointers to many other choices but >>these seem to be the most popular. >> >>My favorite XML lib is actually dom4j which is in Java. It works great with >>Jython. >> >>Kent >> >> >>>So right now I'm just taking a shortcut and using ultra-simple >>>re-based parser to retrieve the tags I'm looking for. I know it will >>>probably be slow, but hopefully I'll get familiar with sophisticated >>>parsing in the future and improve my code. As it stands right now, >>>even the re syntax is not super easy to learn. >> >>For what you are doing re seems fine to me. You can get in trouble using re's >>with XML because of nested tags, variations in spelling and order, probably a >>bunch of other things. But for simple stuff it can work fine. >> >>Kent >> >> >>> >>>Kent: That works (of course!). Thanks a bunch once again! >>> >>> >>>Thanks >>>Bernard >>> >>>On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote: >>> >>> >>>>Hi Bernard, >>>> >>>> >>>> >>>>>Hello, yet another regular expression question :-) >>>>> >>>>>So I have this xml file that I'm trying to find a >>>>>specific tag in. >>>> >>>>I'm always suspicious when I see regular expression >>>>and xml/html in the same context. regex are not good >>>>for parsing xml/html files and it's usually much easier >>>>to use a proper parser - such as beautiful soup. >>>> >>>>http://www.crummy.com/software/BeautifulSoup/ >>>> >>>>Is there any special reason why you are using a regex >>>>sledgehammer to crack this particular nut? Or is it >>>>just to gain experience using regex? >>>> >>>>Alan G. >>>> >>> >>>_______________________________________________ >>>Tutor maillist - Tutor@python.org >>>http://mail.python.org/mailman/listinfo/tutor >>> >>> >> >>_______________________________________________ >>Tutor maillist - Tutor@python.org >>http://mail.python.org/mailman/listinfo/tutor >> > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor