When you say "they're not indexed correctly", what's your evidence?
You cannot rely
on the display in the browser, that's the raw input just as it was
sent to Solr, _not_
the actual tokens in the index. What do you see when you go to the admin
schema browser pate and load the actual tokens.

Or use the TermsComponent
(https://cwiki.apache.org/confluence/display/solr/The+Terms+Component)
to see the actual terms in the index as opposed to the stored data you
see in the browser
when you look at search results.

If the actual terms don't seem right _in the index_ we need to see
your analysis chain,
i.e. your fieldType definition.

I'm, 90% sure you're seeing the stored data and your terms are indexed
just fine, but
I've certainly been wrong before, more times than I want to remember.....

Best,
Erick

On Thu, Apr 23, 2015 at 1:18 AM,  <steve.sch...@t-systems.com> wrote:
> Hey Erick,
>
> thanks for your answer. They are not indexed correctly. Also throught the 
> solr admin interface I see these typical questionmarks within a rhombus where 
> a blank space should be.
> I now figured out the following (not sure if it is relevant at all):
> - PDF documents created with "Acrobat PDFMaker 10.0 for Word" are indexed 
> correctly, no issues
> - PDF documents (with editable form fields) created with "Adobe InDesign CS5 
> (7.0.1)"  are indexed with the blank space issue
>
> Best
> Steve
>
> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Mittwoch, 22. April 2015 17:11
> An: solr-user@lucene.apache.org
> Betreff: Re: Odp.: solr issue with pdf forms
>
> Are they not _indexed_ correctly or not being displayed correctly?
> Take a look at admin UI>>schema browser>> your field and press the "load 
> terms" button. That'll show you what is _in_ the index as opposed to what the 
> raw data looked like.
>
> When you return the field in a Solr search, you get a verbatim, un-analyzed 
> copy of your original input. My guess is that your browser isn't using the 
> compatible character encoding for display.
>
> Best,
> Erick
>
> On Wed, Apr 22, 2015 at 7:08 AM,  <steve.sch...@t-systems.com> wrote:
>> Thanks for your answer. Maybe my English is not good enough, what are you 
>> trying to say? Sorry I didn't get the point.
>> :-(
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: LAFK [mailto:tomasz.bo...@gmail.com]
>> Gesendet: Mittwoch, 22. April 2015 14:01
>> An: solr-user@lucene.apache.org; solr-user@lucene.apache.org
>> Betreff: Odp.: solr issue with pdf forms
>>
>> Out of my head I'd follow how are writable PDFs created and encoded.
>>
>> @LAFK_PL
>>   Oryginalna wiadomość
>> Od: steve.sch...@t-systems.com
>> Wysłano: środa, 22 kwietnia 2015 12:41
>> Do: solr-user@lucene.apache.org
>> Odpowiedz: solr-user@lucene.apache.org
>> Temat: solr issue with pdf forms
>>
>> Hi guys,
>>
>> hopefully you can help me with my issue. We are using a solr setup and have 
>> the following issue:
>> - usual pdf files are indexed just fine
>> - pdf files with writable form-fields look like this:
>> Ich bestätige mit meiner Unterschrift, dass alle Angaben korrekt und v
>> ollständig sind
>>
>> Somehow the blank space character is not indexed correctly.
>>
>> Is this a know issue? Does anybody have an idea?
>>
>> Thanks a lot
>> Best
>> Steve

Reply via email to