Hey, thanks a lot for the hint with pdfbox-app.jar.
For testing purpose I now extracted a affected pdf form and a usual pdf file.
The result ist he following:
Usual pdf file:
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod
tempor invidunt ut
labore et d
pdf form:
Thank you very much fort he detailed information.
I now checked the properties of the content fied. In my oppinion it is indexed,
right?:
Field: content
Properties: Indexed, Tokenized, Stored, TermVector Stored
Schema: Indexed, Tokenized, Stored, TermVector Stored
Index: Indexed, Tokenized, Store
Sorry, but there really isn't... :-/
I never used the terms component. So I first looked if it is configured, and it
really is.
Then I tried to get an idea how it works and tried the examples described in
the doku.
After that I tried to figure out how to get the output from the "misscoded" pdf
Thanks a lot for being patient with me. Unfortunately there is no button "load
term info". :-(
Can you may be help me using the TermsComponent instead? I read it is per
default configured.
Thanks a lot
Best
Steve
-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.
Erick,
thanks a lot for helping me here. In my case it ist he "content" field which is
displayed not correctly. So I went tot he schema browser like you pointed out.
Here ist he information I found:
Field: content
Field Type: text
Properties: Indexed, Tokenized, Stored, TermVector Stored
Schema
Hey Erick,
thanks a lot for your answer. I went to the admin schema browser, but what
should I see there? Sorry I'm not firm with the admin schema browser. :-(
Best
Steve
-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com]
Gesendet: Donnerstag, 23. April 201
Hey Erick,
thanks for your answer. They are not indexed correctly. Also throught the solr
admin interface I see these typical questionmarks within a rhombus where a
blank space should be.
I now figured out the following (not sure if it is relevant at all):
- PDF documents created with "Acrobat P
Thanks for your answer. Maybe my English is not good enough, what are you
trying to say? Sorry I didn't get the point.
:-(
-Ursprüngliche Nachricht-
Von: LAFK [mailto:tomasz.bo...@gmail.com]
Gesendet: Mittwoch, 22. April 2015 14:01
An: solr-user@lucene.apache.org; solr-user@lucene.apac