Binoy and Walter, many thanks for your answer. I think I'll go by Walter sugestion. Best regards,
Francisco El lun., 22 de feb. de 2016 a la(s) 23:43, Walter Underwood < wun...@wunderwood.org> escribió: > This happens for fonts where Tika does not have font metrics. Open the > document in Adobe Reader, then use document info to find the list of fonts. > > Then post this question to the Tika list. > > Fix it in Tika, don’t patch it in Solr. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Feb 22, 2016, at 6:40 PM, Binoy Dalal <binoydala...@gmail.com> wrote: > > > > Is there some set pattern to how these words occur or do they occur > > randomly in the text, i.e., somewhere it'll be "subtitle" and somewhere > "s > > u b t i t l e"? > > > > On Tue, 23 Feb 2016, 05:01 Francisco Andrés Fernández <fra...@gmail.com> > > wrote: > > > >> Hi all, > >> I'm extracting some text from pdf. As result, some important words end > with > >> spaces between characters. I know they are words but, don't know how to > >> make Solr detect and index them. > >> For example, I could have the word "Subtitle" that I want to detect, > >> written like "S u b t i t l e". If I would parse the text with a > standard > >> tokenizer, the word will be lost. > >> How could I make Solr detect this type of word occurrence? > >> Many thanks, > >> > >> Francisco > >> > > -- > > Regards, > > Binoy Dalal > >