Thanks for that pointer Kent, I'll check it out. Also thanks for letting me know I'm not nuts! :-)
Alan's suggestion about BeautifulSoup is actually excellent. The documentation is nice and the tool is very easy to use. However is it normal that to parse a 2618 lines xml file it takes 20-30 seconds or so? Thanks Bernard On 9/14/05, Kent Johnson <[EMAIL PROTECTED]> wrote: > Bernard Lebel wrote: > > Thanks Alan, > > > > I'll check BeautifulSoup asap. > > > > I'm using regex simply because I have no clue where to start to parse > > XML. I have read the various xml tools available in the Python > > library, however I'm a complete loss at what to make out of them. Many > > of them seem to use some programming standards, wich I am completely > > unfamiliar with (this is the first time that I dig into XML writing > > and parsing). > > > > I don't know where to start to learn about all these standards, and as > > usual with new programming things, the documentation is hard to > > swallow (it usually is written more as a reference than a proper user > > guide/tutorial). I have to admit this is very frustrating, so if I'm > > looking at things from a wrong perspective please advise me, I need > > it. > > I agree that the Python XML story is confusing even for the files in the > standard library. Worse, the (IMO) best solutions are not to be found in the > standard lib or PyXML at all. > > The std lib and PyXML are based on the DOM and SAX standards. These standards > were designed to be "language-neutral" - there are implementations in Python, > Java and other languages. The good side of this is, if you learn how to use > them, the knowledge is pretty portable to other languages. The bad side is, > the APIs defined by the standard are IMO clunky and painful to use, > especially in Python. > > There is a current thread on comp.lang.python discussing this with good > suggestions and pointers to more info: > http://groups.google.com/group/comp.lang.python/browse_frm/thread/a48891aa645ead13/dcd8fdc20b4b191b?hl=en#dcd8fdc20b4b191b > > My personal preference is ElementTree. Beautiful Soup is good too though I > have only tried it with HTML. If I was running on Linux I would try lxml > which uses the ElementTree API and adds full XPath support. Amara looks like > the Cadillac solution - big and cushy. I haven't tried it. Uche's articles > (referenced in the thread above) have pointers to many other choices but > these seem to be the most popular. > > My favorite XML lib is actually dom4j which is in Java. It works great with > Jython. > > Kent > > > > > So right now I'm just taking a shortcut and using ultra-simple > > re-based parser to retrieve the tags I'm looking for. I know it will > > probably be slow, but hopefully I'll get familiar with sophisticated > > parsing in the future and improve my code. As it stands right now, > > even the re syntax is not super easy to learn. > > For what you are doing re seems fine to me. You can get in trouble using re's > with XML because of nested tags, variations in spelling and order, probably a > bunch of other things. But for simple stuff it can work fine. > > Kent > > > > > > > Kent: That works (of course!). Thanks a bunch once again! > > > > > > Thanks > > Bernard > > > > On 9/14/05, Alan G <[EMAIL PROTECTED]> wrote: > > > >>Hi Bernard, > >> > >> > >>>Hello, yet another regular expression question :-) > >>> > >>>So I have this xml file that I'm trying to find a > >>>specific tag in. > >> > >>I'm always suspicious when I see regular expression > >>and xml/html in the same context. regex are not good > >>for parsing xml/html files and it's usually much easier > >>to use a proper parser - such as beautiful soup. > >> > >>http://www.crummy.com/software/BeautifulSoup/ > >> > >>Is there any special reason why you are using a regex > >>sledgehammer to crack this particular nut? Or is it > >>just to gain experience using regex? > >> > >>Alan G. > >> > > > > _______________________________________________ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor