Thanks Greg for the good idea!

I have been trying to keep a global array to keep the element. If the element 
in the character handling call, I append the section body. It is assuming that 
there are no the same element in two consecutive line in the XML file.

Cheers,
Jason

Date: Thu, 1 Jul 2010 18:09:01 -0500
Subject: Re: A problem while using XML::Parser::PerlSAX
From: [email protected]
To: [email protected]

Jason,

I have not worked with PerlSAX, however I played around with SAX in Java and 
have an idea you can try out.

I'm not sure if you can do this in PerlSAX, but in Java I simply kept a flag 
variable to let me know if the opening tag was encountered.  After the opening 
tag is encountered simply append the section body until you encounter a closing 
tag.  In the code handling characters simply check if you are still within the 
specific tag by checking the flag variable.


Good luck,
Greg

On Thu, Jul 1, 2010 at 5:51 PM, Jason Feng <[email protected]> wrote:



Thanks Jenda,



But it is a bit frustrating that I can't expect when the multiple calls will 
happen. For the same repetetive element, most of the time one call. But 
suddenly multiple calls take place.



Cheers,

Jason



> From: [email protected]

> To: [email protected]

> Date: Thu, 1 Jul 2010 08:09:20 +0200

> Subject: Re: A problem while using XML::Parser::PerlSAX

>

> From: Jason Feng <[email protected]>

> > I am using XML::Parser::PerlSAX

> > to parse a 300M XML file. I meet a strange issue with handler characters. 
> > This handler is supposed to return

> > all the contents between start markup and end markup. But sometimes it just

> > returns one part of the whole contents. On the second call, perhaps it 
> > returns

> > the rest part of the contents.

>

> That is to be expected.

>

> From the docs of XML::Parser:

>

>   Char (Expat, String)

>

>   This event is generated when non-markup is recognized. The

>   non-markup sequence of characters is in String. A single

>   non-markup sequence of characters may generate multiple calls

>   to this handler. Whatever the encoding of the string in

>   the original document, this is given to the handler in UTF-8.

>

> Write your code so that it handles this. Or use a module that does

> this for you.

>

> Jenda

> ===== [email protected] === http://Jenda.Krynicky.cz =====

> When it comes to wine, women and song, wizards are allowed

> to get drunk and croon as much as they like.

>       -- Terry Pratchett in Sourcery

>

>

> --

> To unsubscribe, e-mail: [email protected]

> For additional commands, e-mail: [email protected]

> http://learn.perl.org/

>

>



_________________________________________________________________

If It Exists, You'll Find it on SEEK. Australia's #1 job site

http://clk.atdmt.com/NMN/go/157639755/direct/01/
                                          
_________________________________________________________________
New, Used, Demo, Dealer or Private? Find it at CarPoint.com.au
http://clk.atdmt.com/NMN/go/206222968/direct/01/

Reply via email to