Binoy and Walter, many thanks for your answer.
I think I'll go by Walter sugestion.
Best regards,

Francisco

El lun., 22 de feb. de 2016 a la(s) 23:43, Walter Underwood <
wun...@wunderwood.org> escribió:

> This happens for fonts where Tika does not have font metrics. Open the
> document in Adobe Reader, then use document info to find the list of fonts.
>
> Then post this question to the Tika list.
>
> Fix it in Tika, don’t patch it in Solr.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Feb 22, 2016, at 6:40 PM, Binoy Dalal <binoydala...@gmail.com> wrote:
> >
> > Is there some set pattern to how these words occur or do they occur
> > randomly in the text, i.e., somewhere it'll be "subtitle" and somewhere
> "s
> > u b t i t l e"?
> >
> > On Tue, 23 Feb 2016, 05:01 Francisco Andrés Fernández <fra...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >> I'm extracting some text from pdf. As result, some important words end
> with
> >> spaces between characters. I know they are words but, don't know how to
> >> make Solr detect and index them.
> >> For example, I could have the word "Subtitle" that I want to detect,
> >> written like "S u b t i t l e". If I would parse the text with a
> standard
> >> tokenizer, the word will be lost.
> >> How could I make Solr detect this type of word occurrence?
> >> Many thanks,
> >>
> >> Francisco
> >>
> > --
> > Regards,
> > Binoy Dalal
>
>

Reply via email to