Sorry this would be a job for one of the pdfbox developers. Until now I'm just doing some support for the list and didn't have too much know-how about it. So I can just have a look in the evening and maybe I will find a solution. ;)
Daniel 2008/12/29 Duseja, Sushil <[email protected]> > If possible, can you please let us know your contact number to discuss > this issue? > > > > Thanks! > > > > *From:* Daniel Manzke [mailto:[email protected]] > *Sent:* Monday, December 29, 2008 5:12 PM > *To:* Duseja, Sushil; [email protected] > *Cc:* Rally, Menka > *Subject:* Re: Garbage Output > > > > Hi, > > > > I've just added this line: > > > > //after stripper.extractRegions(); > > stripper.getText(document)); > > > > After doing this I got some text for the regions. But it seems that this > text is related to page 1. Did you have found an example how to use the > Stripper? Maybe another guy could help you, due the fact that I don't have > any knowledge about the Stripper. > > > > If I have some time in the evening I will give it another test. > > > > > > Bye, > > Daniel > > 2008/12/29 Duseja, Sushil <[email protected]> > > Hello Daniel, > > > > I tried using the compiled version sent across by you with no luck. > > > > I tried running a java program (for text extraction) with PDFBox 0.7.3 and > 0.8 versions in the classpath separately. With 0.8, I am not being able to > fetch anything. However with 0.7.3, I could extract all values apart from > "Year of Form" whose value is garbage - À¾´» , which is why you recommended > using 0.8. > > > > Note - Java program and my PDF are attached for your kind reference. The > names of the java files are self explanatory and indicative of which version > they are using. The contents of these java files are exactly the same. > > > > Please advise. > > > > Thanks! > > > > *From:* Daniel Manzke [mailto:[email protected]] > *Sent:* Monday, December 29, 2008 2:45 PM > > > *To:* Duseja, Sushil > *Cc:* [email protected]; Rally, Menka > *Subject:* Re: Garbage Output > > > > Just check out the latest source code and run Maven. > > > > I will send you a compiled version. > > > > > > Bye > > 2008/12/29 Duseja, Sushil <[email protected]> > > Thanks Daniel. > > > > Do you mean that - I need to fetch the latest source code from the trunk in > the Subversion repository? If no, how can I get the source code for 0.8? > > > > I would really appreciate if you can build me a compiled version. I hope I > am not bothering you. > > > > Thanking you in anticipation. > > > > *From:* Daniel Manzke [mailto:[email protected]] > *Sent:* Monday, December 29, 2008 1:41 PM > > > *To:* Duseja, Sushil > *Cc:* [email protected]; Rally, Menka > *Subject:* Re: Garbage Output > > > > PDFBox is still under incubation and there is not 0.8 distribution. What > you could do, is downloading the source code and build it by your own. So > you could have a look at the code and debug it, where the garbage is > produced. Or ask me and I will build you a compiled version. > > > > > > Daniel > > 2008/12/29 Duseja, Sushil <[email protected]> > > Thanks again for responding. > > > > Can you please point me to the URL/location from which 0.8 version can be > downloaded? > > > > I referred to - > http://sourceforge.net/project/showfiles.php?group_id=78314; however it > shows the latest version is 0.7.3. > > > > Thanks for your time. > > > > *From:* Daniel Manzke [mailto:[email protected]] > *Sent:* Monday, December 29, 2008 1:29 PM > *To:* Duseja, Sushil > *Cc:* [email protected]; Rally, Menka > *Subject:* Re: Garbage Output > > > > Try to check out the latest Development Build. Due the fact thaht 0.7.3 is > outdated. (year: 2006) In 0.8 there are a lot of issues fixed. > > > > > > Bye, > > daniel > > 2008/12/29 Duseja, Sushil <[email protected]> > > Hello Daniel, > > Thanks for the response. > > I am using version 0.7.3. > > Thanks! > > > -----Original Message----- > From: Daniel Manzke [mailto:[email protected]] > Sent: Friday, December 26, 2008 9:11 PM > To: [email protected] > Subject: Re: Garbage Output > > Hi, > standard question. ;) Which version are you using? > > > Daniel > > 2008/12/26 Duseja, Sushil <[email protected]> > > > Hello, > > > > > > > > While extracting text from a pdf file (attached for your kind reference) > > using PDFBox, I get garbage output (*À¾´»*) for a special text > value"*2007 > > *" (please see page 2); I can fetch other values correctly though. > > > > Is this an *encoding issue*; if yes, can anyone please let me know how to > > fix it? If possible, please point me to some working examples. > > > > > > > > Thanks in advance. > > > > > > -- > Mit freundlichen Grüßen > > Daniel Manzke > > > > > -- > Mit freundlichen Grüßen > > Daniel Manzke > > > > > -- > Mit freundlichen Grüßen > > Daniel Manzke > > > > > -- > Mit freundlichen Grüßen > > Daniel Manzke > > > > > -- > Mit freundlichen Grüßen > > Daniel Manzke > -- Mit freundlichen Grüßen Daniel Manzke
