You can use the Tika library to parse the PDFs and then post the text to the Solr servers
> Am 19.05.2019 um 11:02 schrieb Mareike Glock > <mareike.gl...@student.htw-berlin.de>: > > Dear Solr Team, > > I am trying to index Word and PDF documents with Solr using SolrJ, but most > of the examples I found on the internet use the SolrServer class which I > guess is deprecated. > The connection to Solr itself is working, because I can add > SolrInputDocuments to the index but it does not work for rich documents > because I get an exception. > > > public static void main(String[] args) throws IOException, > SolrServerException { > String urlString = "http://localhost:8983/solr/localDocs16"; > HttpSolrClient solr = new HttpSolrClient.Builder(urlString).build(); > > //is working > for(int i=0;i<1000;++i) { > SolrInputDocument doc = new SolrInputDocument(); > doc.addField("cat", "book"); > doc.addField("id", "book-" + i); > doc.addField("name", "The Legend of the Hobbit part " + i); > solr.add(doc); > if(i%100==0) solr.commit(); // periodically flush > } > > //is not working > File file = new File("path\\testfile.pdf"); > > ContentStreamUpdateRequest req = new > ContentStreamUpdateRequest("update/extract"); > > req.addFile(file, "application/pdf"); > req.setParam("literal.id", "doc1"); > req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); > try{ > solr.request(req); > } > catch(IOException e){ > PrintWriter out = new > PrintWriter("C:\\Users\\mareike\\Desktop\\filename.txt"); > e.printStackTrace(out); > out.close(); > System.out.println("IO message: " + e.getMessage()); > } catch(SolrServerException e){ > PrintWriter out = new > PrintWriter("C:\\Users\\mareike\\Desktop\\filename.txt"); > e.printStackTrace(out); > out.close(); > System.out.println("SolrServer message: " + e.getMessage()); > } catch(Exception e){ > PrintWriter out = new > PrintWriter("C:\\Users\\mareike\\Desktop\\filename.txt"); > e.printStackTrace(out); > out.close(); > System.out.println("UnknownException message: " + e.getMessage()); > }finally{ > solr.commit(); > } > } > > > I am using Maven (pom.xml attached) and created a JAR file, which I then > tried to execute from the command line, and this is the output I get: > > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further > details. > SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder". > SLF4J: Defaulting to no-operation MDCAdapter implementation. > SLF4J: See http://www.slf4j.org/codes.html#no_static_mdc_binder for > further details. > message: UnknownException message: Error from server at > http://localhost:8983/solr/localDocs17: Bad contentType for search handler > :application/pdf request={wt=javabin&version=2} > > > > > > I hope you may be able to help me with this. I also posted this issue on > Github. > > Cheers, > Mareike Glock > > <pom.xml>