This happens for fonts where Tika does not have font metrics. Open the document 
in Adobe Reader, then use document info to find the list of fonts.

Then post this question to the Tika list.

Fix it in Tika, don’t patch it in Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 22, 2016, at 6:40 PM, Binoy Dalal <binoydala...@gmail.com> wrote:
> 
> Is there some set pattern to how these words occur or do they occur
> randomly in the text, i.e., somewhere it'll be "subtitle" and somewhere "s
> u b t i t l e"?
> 
> On Tue, 23 Feb 2016, 05:01 Francisco Andrés Fernández <fra...@gmail.com>
> wrote:
> 
>> Hi all,
>> I'm extracting some text from pdf. As result, some important words end with
>> spaces between characters. I know they are words but, don't know how to
>> make Solr detect and index them.
>> For example, I could have the word "Subtitle" that I want to detect,
>> written like "S u b t i t l e". If I would parse the text with a standard
>> tokenizer, the word will be lost.
>> How could I make Solr detect this type of word occurrence?
>> Many thanks,
>> 
>> Francisco
>> 
> -- 
> Regards,
> Binoy Dalal

Reply via email to