Re: Problem with pdf files indexing

Erick Erickson Wed, 23 Nov 2011 09:34:21 -0800

The first thing I'd do is go over to the server and
try using the admin interface to query on *:*. If that
returns nothing, look at the admin/schema browser page
and see what's in your fields, if anything. Then go back
to SolrJ and work on the query part sans the indexing
part once you're sure you have data to work with.


Also, do you Solr logs show anything?

Best
Erick

On Tue, Nov 22, 2011 at 4:13 AM, Dali <medalibenmans...@gmail.com> wrote:
> Hi !I'm using solr 3.3 version and i have some pdf files which i want to
> index. I followed instructions from the wiki page:
> http://wiki.apache.org/solr/ExtractingRequestHandler
> The problem is that i can add my documents to Solr but i cannot request
> them. Here is what i have:
>
> *solrconfig.xml*:
> <requestHandler name="/update/extract"
>                  startup="lazy"
>                  class="solr.extraction.ExtractingRequestHandler" >
>    <lst name="defaults">
>      <str name="fmap.content">text</str>
>      <str name="lowernames">true</str>
>      <str name="uprefix">ignored_</str>
>      <str name="captureAttr">true</str>
>      <str name="fmap.a">links</str>
>      <str name="fmap.div">ignored_</str>
>    </lst>
>  </requestHandler>
>
> *schema.xml *:
> <field name="title" type="string" indexed="true" stored="true"/>
>  <field name="author" type="string" indexed="true" stored="true" />
>  <field name="text" type="text_general" indexed="true" stored="true"
> multiValued="true"/>
>
> *data-config.xml* :
>  ...
> <dataSource type="BinFileDataSource" name="ds-file"/>
> ...
>  <entity  processor="TikaEntityProcessor"  dataSource="ds-file"
> url="../${document.filename}">
>                                                <field column="Author" 
> name="author" meta="true"/>
>                                                <field column="title" 
> name="title" meta="true"/>
>                                                <field column="text" 
> name="text"/>
> </entity>
> ...
>
> I use Solrj to add documents as follows:
> SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/solr";);
>           ContentStreamUpdateRequest up = new
> ContentStreamUpdateRequest("/update/extract");
>           up.addFile(new File("d:\\test.pdf"));
>           up.setParam("literal.id", "test");
>           up.setParam("extractOnly", "true");
>           server.commit();
>           NamedList result = server.request(up);
>           System.out.println("Result: " + result);  // can display information
> about test.pdf
>           QueryResponse rsp = server.query( new SolrQuery( "*:*") );
>           System.out.println("rsp: " + rsp); // returns nothing
>
> Any suggestion?
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Problem with pdf files indexing

Reply via email to