Hi !I'm using solr 3.3 version and i have some pdf files which i want to
index. I followed instructions from the wiki page:
http://wiki.apache.org/solr/ExtractingRequestHandler
The problem is that i can add my documents to Solr but i cannot request
them. Here is what i have:
*solrconfig.xml*:
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="fmap.content">text</str>
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
</lst>
</requestHandler>
*schema.xml *:
<field name="title" type="string" indexed="true" stored="true"/>
<field name="author" type="string" indexed="true" stored="true" />
<field name="text" type="text_general" indexed="true" stored="true"
multiValued="true"/>
*data-config.xml* :
...
<dataSource type="BinFileDataSource" name="ds-file"/>
...
<entity processor="TikaEntityProcessor" dataSource="ds-file"
url="../${document.filename}">
<field column="Author"
name="author" meta="true"/>
<field column="title"
name="title" meta="true"/>
<field column="text"
name="text"/>
</entity>
...
I use Solrj to add documents as follows:
SolrServer server = new CommonsHttpSolrServer("http://localhost:8080/solr");
ContentStreamUpdateRequest up = new
ContentStreamUpdateRequest("/update/extract");
up.addFile(new File("d:\\test.pdf"));
up.setParam("literal.id", "test");
up.setParam("extractOnly", "true");
server.commit();
NamedList result = server.request(up);
System.out.println("Result: " + result); // can display information
about test.pdf
QueryResponse rsp = server.query( new SolrQuery( "*:*") );
System.out.println("rsp: " + rsp); // returns nothing
Any suggestion?
--
View this message in context:
http://lucene.472066.n3.nabble.com/Problem-with-pdf-files-indexing-tp3527202p3527202.html
Sent from the Solr - User mailing list archive at Nabble.com.