Problem with pdf files indexing

Dali Tue, 22 Nov 2011 01:13:59 -0800

Hi !I'm using solr 3.3 version and i have some pdf files which i want to
index. I followed instructions from the wiki page:
http://wiki.apache.org/solr/ExtractingRequestHandler
The problem is that i can add my documents to Solr but i cannot request
them. Here is what i have:


*solrconfig.xml*:
<requestHandler name="/update/extract" 
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="fmap.content">text</str>
      <str name="lowernames">true</str>
      <str name="uprefix">ignored_</str>
      <str name="captureAttr">true</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>
    </lst>
  </requestHandler>

*schema.xml *:
<field name="title" type="string" indexed="true" stored="true"/>
 <field name="author" type="string" indexed="true" stored="true" />
  <field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>

*data-config.xml* :
 ...
<dataSource type="BinFileDataSource" name="ds-file"/>
...
 <entity  processor="TikaEntityProcessor"  dataSource="ds-file"
url="../${document.filename}">
                                                <field column="Author" 
name="author" meta="true"/>
                                                <field column="title" 
name="title" meta="true"/>
                                                <field column="text" 
name="text"/>
</entity>
...

I use Solrj to add documents as follows:
SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/solr";);
           ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
           up.addFile(new File("d:\\test.pdf"));
           up.setParam("literal.id", "test");
           up.setParam("extractOnly", "true");
           server.commit();
           NamedList result = server.request(up);
           System.out.println("Result: " + result);  // can display information
about test.pdf
           QueryResponse rsp = server.query( new SolrQuery( "*:*") );
           System.out.println("rsp: " + rsp); // returns nothing

Any suggestion?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html
Sent from the Solr - User mailing list archive at Nabble.com.

Problem with pdf files indexing

Reply via email to