Hi Dexter,
First of all, thanks for replying so promptly. As I had mentioned in my earlier mail, using the PDFTextStripper I get a series of TextPosition objects. Each TextPosition object gives me its respective PDFont object. So in a way I know which font is being used to render the respective text in pdf. So going forward, I need to understand that, for any font returned by a TextPosition object (PDFont), what are the ways in which I can calculate the width of a any string. Thanks, Shishir. From: Dexter Mishra [mailto:[email protected]] Sent: Wednesday, March 25, 2009 9:03 PM To: Shishir Mane-Patil Subject: Re: Finding the x-coordinate and width of a sub-string Hi Shishir, As I told again the width of string depends on the type of font you are using. It will be different for the strings "AAA" and "lll", unless its is a monopitched font like Courier. What font are you using? I havent seen the implementation of the stringWidth function. I will have a look at it and see what I can do, if there is a bug I will try to fix it. But again you need to be very careful when you are calculating these widths. ~Thank Dexter On Wed, Mar 25, 2009 at 4:27 PM, Shishir Mane-Patil <[email protected]> wrote: Hi, I am already having all the TextPosition objects for a particular Pdf page. So I can always retrieve the font and font size for a particular string. For instance, if we consider the earlier example: String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001 width=108.87001]Primary Diagnosis: elder Earlier if i had to find the x-coordinate of the word Diagnosis, I would perform the following steps (considering the above example): 1. Find the PDFont object using the TextPosition 2. Then use the stringWidth function to calculate the string width of "Primary ". Let's say it is sw. The current value of x-coordinate is x, the x-scale is xs and the font size is fs. 3. Then to calculate the new x-coordinate of, let's say, the word "Diagnosis", i use the following formula: New X-Coordinate = x+((sw/1000)*xs*fs) 4. Similarly i also found the string width for the word "Diagnosis". The above steps worked satisfactorily for many PDF's substrings. But they seem to fail for some. In case of success, it was observed that the string width returned from the TextPosition object was very much near to the one calculated by the above formula. In case of failure, it was observed that the string width returned by the PDFont object was either zero or was calculated incorrectly. So can anyone help me in some way by which i can accurately calculate the starting x-coordinate for a substring or in other words the actual width of any string for a particular font. On Tue, Mar 24, 2009 at 6:52 PM, Dexter Mishra <[email protected]> wrote: Shishir, PDF does not store the word co-ordinates in parts for this string. Primary Diagnosis: elder will be a single entry in the PDF. The information you can get is string length, width, height etc. So if you know the font point size you need to calculate the x-co-ordinate of Diagnosis:. but beware. This is quite tricky when you go for varriable pitch font (Arial, Times new roman etc.) ~Thanks Dexter On Tue, Mar 24, 2009 at 11:42 AM, Shishir Mane-Patil <[email protected]>wrote: > Hi, > > I wish to find accurately the width of a sub-string using the > PDFTextStripper. For e.g. part of the output of PDF Text extraction example > is as follows: > > > > String[75.0,278.8 fs=10.0 xscale=1.0 height=7.0000005 space=5.830001 > width=108.87001]Primary Diagnosis: elder > > > > Here the width calculated for the entire string "Primary Diagnosis: elder" > is 108.87001. I wish to find the starting x-coordinate for just the word > 'Diagnosis' and the width of the same word. How can I find the exact > x-coordinate and the width of such substrings. > > > > > > Thanks and Regards, > > Shishir Mane >
