This happens for fonts where Tika does not have font metrics. Open the document in Adobe Reader, then use document info to find the list of fonts.
Then post this question to the Tika list. Fix it in Tika, don’t patch it in Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 22, 2016, at 6:40 PM, Binoy Dalal <binoydala...@gmail.com> wrote: > > Is there some set pattern to how these words occur or do they occur > randomly in the text, i.e., somewhere it'll be "subtitle" and somewhere "s > u b t i t l e"? > > On Tue, 23 Feb 2016, 05:01 Francisco Andrés Fernández <fra...@gmail.com> > wrote: > >> Hi all, >> I'm extracting some text from pdf. As result, some important words end with >> spaces between characters. I know they are words but, don't know how to >> make Solr detect and index them. >> For example, I could have the word "Subtitle" that I want to detect, >> written like "S u b t i t l e". If I would parse the text with a standard >> tokenizer, the word will be lost. >> How could I make Solr detect this type of word occurrence? >> Many thanks, >> >> Francisco >> > -- > Regards, > Binoy Dalal