I think the extract handler is not defined in schemaless. This may be a change from before and the documentation is out of sync.
Can you try 'techproducts' example instead of schemaless: bin/solr stop (if you are still running it) bin/solr start -e techproducts Then the import command. The Tika integration is defined in solrconfig.xml and needs both handler defined and some libraries loaded. Once you confirmed you like what you see, you can copy those into whatever configuration you are working with. Regards, Alex. On Fri, 5 Feb 2021 at 07:38, nq <nq@uber.space> wrote: > > Hi, > > > I am new to Solr and tried to follow the guide to upload PDF data using > Tika, on Solr 8.7.0 (running on Debian 10): > > https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html > > but I get an HTTP 404 error when trying to import the file. > > > In the solr installation directory, after spinning up the example server > using > > solr/bin/solr -e schemaless > > I firstly used the Post Tool to index a PDF file as described in the > guide, giving the following output (paths truncated using “[…]” for > privacy reasons): > > bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params > "literal.id=doc1" > > > java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes > > -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa > > che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf > > SimplePostTool version 5.0.0 > > Posting files to [base] url > > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > > Entering auto mode. File endings considered are > > xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > > POSTing file solr-word.pdf (application/pdf) to [base]/extract > > SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for > > url: > > http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&r > > esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > > SimplePostTool: WARNING: Response: <html> > > <head> > > <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> > > <title>Error 404 Not Found</title> > > </head> > > <body><h2>HTTP ERROR 404 Not Found</h2> > > <table> > > <tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr> > > <tr><th>STATUS:</th><td>404</td></tr> > > <tr><th>MESSAGE:</th><td>Not Found</td></tr> > > <tr><th>SERVLET:</th><td>default</td></tr> > > </table> > > > > </body> > > </html> > > SimplePostTool: WARNING: IOException while reading response: > > java.io.FileNotFoundException: > > http://localhost:8983/solr/gettingstarted/update/extract > > ?literal.id=doc1&resource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > > > > 1 files indexed. > > COMMITting Solr index changes to > > http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > > Time spent: 0:00:00.038 > resulting in no actual changes being visible in the Solr. > > > Using curl results in the same HTTP response: > > > curl > > 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&commit=true' > > -F "myfile=@example > > /exampledocs/solr-word.pdf" > > <html> > > <head> > > <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> > > <title>Error 404 Not Found</title> > > </head> > > <body><h2>HTTP ERROR 404 Not Found</h2> > > <table> > > <tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr> > > <tr><th>STATUS:</th><td>404</td></tr> > > <tr><th>MESSAGE:</th><td>Not Found</td></tr> > > <tr><th>SERVLET:</th><td>default</td></tr> > > </table> > > > > </body> > > </html> > > > > Sorry if this has already been discussed somewhere; I have not been able > to find anything helpful yet. > > Thank you! > > Leon >