RE: Odp.: solr issue with pdf forms

2015-04-30 Thread Davis, Daniel (NIH/NLM) [C]
ck Erickson [mailto:erickerick...@gmail.com] Sent: Thursday, April 30, 2015 11:28 AM To: solr-user@lucene.apache.org Subject: Re: Odp.: solr issue with pdf forms Jack: I keep forgetting those things exist, thanks for the reminder! On Thu, Apr 30, 2015 at 8:23 AM, Jack Krupansky wrote: > Or use a S

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Erick Erickson
Daz >> > >> > Best >> > Steve >> > >> > -Ursprüngliche Nachricht- >> > Von: Allison, Timothy B. [mailto:talli...@mitre.org] >> > Gesendet: Mittwoch, 29. April 2015 14:16 >> > An: solr-user@lucene.apache.org >> >

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Jack Krupansky
nachweise bei.^HDaz > > > > Best > > Steve > > > > -Ursprüngliche Nachricht- > > Von: Allison, Timothy B. [mailto:talli...@mitre.org] > > Gesendet: Mittwoch, 29. April 2015 14:16 > > An: solr-user@lucene.apache.org > > Cc: u...@tika.apache.org &

Re: Odp.: solr issue with pdf forms

2015-04-30 Thread Erick Erickson
Gesendet: Mittwoch, 29. April 2015 14:16 > An: solr-user@lucene.apache.org > Cc: u...@tika.apache.org > Betreff: RE: Odp.: solr issue with pdf forms > > I completely agree with Erick about the utility of the TermsComponent to see > what is actually being indexed. If you find probl

Re: Odp.: solr issue with pdf forms

2015-04-29 Thread Erick Erickson
e pdfbox-app.jar (ExtractText option) > on your files outside of Solr to see what text/noise you're getting for the > files that are causing problems. > > > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Tuesday, April 28,

RE: Odp.: solr issue with pdf forms

2015-04-29 Thread Allison, Timothy B.
day, April 28, 2015 9:07 PM To: solr-user@lucene.apache.org Subject: Re: Odp.: solr issue with pdf forms There better be. 1> go to the admin UI 2> select a core 3> select "schema browser" 4> select a field from the drop-down Until you do step 4 the window will be pr

Re: Odp.: solr issue with pdf forms

2015-04-28 Thread Erick Erickson
fault configured. > > Thanks a lot > Best > Steve > > -Ursprüngliche Nachricht- > Von: Erick Erickson [mailto:erickerick...@gmail.com] > Gesendet: Montag, 27. April 2015 17:23 > An: solr-user@lucene.apache.org > Betreff: Re: Odp.: solr issue with pdf forms > > W

Re: Odp.: solr issue with pdf forms

2015-04-27 Thread Erick Erickson
inct: 160403 > > Does this somehow help to figure out the issue? > Thanks > Best > Steve > > > -Ursprüngliche Nachricht- > Von: Erick Erickson [mailto:erickerick...@gmail.com] > Gesendet: Freitag, 24. April 2015 20:15 > An: solr-user@lucene.apache.org > Be

Re: Odp.: solr issue with pdf forms

2015-04-24 Thread Erick Erickson
admin schema browser, but what > should I see there? Sorry I'm not firm with the admin schema browser. :-( > > Best > Steve > > > -Ursprüngliche Nachricht- > Von: Erick Erickson [mailto:erickerick...@gmail.com] > Gesendet: Donnerstag, 23. April 2015 18:00 &g

Re: Odp.: solr issue with pdf forms

2015-04-23 Thread Dan Davis
fields) created with "Adobe InDesign > CS5 (7.0.1)" are indexed with the blank space issue > > > > Best > > Steve > > > > -Ursprüngliche Nachricht- > > Von: Erick Erickson [mailto:erickerick...@gmail.com] > > Gesendet: Mittwoch, 22. April

Re: Odp.: solr issue with pdf forms

2015-04-23 Thread Erick Erickson
üngliche Nachricht- > Von: Erick Erickson [mailto:erickerick...@gmail.com] > Gesendet: Mittwoch, 22. April 2015 17:11 > An: solr-user@lucene.apache.org > Betreff: Re: Odp.: solr issue with pdf forms > > Are they not _indexed_ correctly or not being displayed correctly? > Ta

Re: Odp.: solr issue with pdf forms

2015-04-22 Thread Dan Davis
+1 - I like Erick's answer. Let me know if that turns out to be the problem - I'm interested in this problem and would be happy to help. On Wed, Apr 22, 2015 at 11:11 AM, Erick Erickson wrote: > Are they not _indexed_ correctly or not being displayed correctly? > Take a look at admin UI>>schema

Re: Odp.: solr issue with pdf forms

2015-04-22 Thread Erick Erickson
Are they not _indexed_ correctly or not being displayed correctly? Take a look at admin UI>>schema browser>> your field and press the "load terms" button. That'll show you what is _in_ the index as opposed to what the raw data looked like. When you return the field in a Solr search, you get a verb