Thanks Greg for the good idea! I have been trying to keep a global array to keep the element. If the element in the character handling call, I append the section body. It is assuming that there are no the same element in two consecutive line in the XML file.
Cheers, Jason Date: Thu, 1 Jul 2010 18:09:01 -0500 Subject: Re: A problem while using XML::Parser::PerlSAX From: [email protected] To: [email protected] Jason, I have not worked with PerlSAX, however I played around with SAX in Java and have an idea you can try out. I'm not sure if you can do this in PerlSAX, but in Java I simply kept a flag variable to let me know if the opening tag was encountered. After the opening tag is encountered simply append the section body until you encounter a closing tag. In the code handling characters simply check if you are still within the specific tag by checking the flag variable. Good luck, Greg On Thu, Jul 1, 2010 at 5:51 PM, Jason Feng <[email protected]> wrote: Thanks Jenda, But it is a bit frustrating that I can't expect when the multiple calls will happen. For the same repetetive element, most of the time one call. But suddenly multiple calls take place. Cheers, Jason > From: [email protected] > To: [email protected] > Date: Thu, 1 Jul 2010 08:09:20 +0200 > Subject: Re: A problem while using XML::Parser::PerlSAX > > From: Jason Feng <[email protected]> > > I am using XML::Parser::PerlSAX > > to parse a 300M XML file. I meet a strange issue with handler characters. > > This handler is supposed to return > > all the contents between start markup and end markup. But sometimes it just > > returns one part of the whole contents. On the second call, perhaps it > > returns > > the rest part of the contents. > > That is to be expected. > > From the docs of XML::Parser: > > Char (Expat, String) > > This event is generated when non-markup is recognized. The > non-markup sequence of characters is in String. A single > non-markup sequence of characters may generate multiple calls > to this handler. Whatever the encoding of the string in > the original document, this is given to the handler in UTF-8. > > Write your code so that it handles this. Or use a module that does > this for you. > > Jenda > ===== [email protected] === http://Jenda.Krynicky.cz ===== > When it comes to wine, women and song, wizards are allowed > to get drunk and croon as much as they like. > -- Terry Pratchett in Sourcery > > > -- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > http://learn.perl.org/ > > _________________________________________________________________ If It Exists, You'll Find it on SEEK. Australia's #1 job site http://clk.atdmt.com/NMN/go/157639755/direct/01/ _________________________________________________________________ New, Used, Demo, Dealer or Private? Find it at CarPoint.com.au http://clk.atdmt.com/NMN/go/206222968/direct/01/
