Hi Leon, Feel free to create JIRA issue https://issues.apache.org/jira/secure/Dashboard.jspa and then do Github pull request to fix the example name. The documentation is in asciidoc format at: https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src with names matching those on the server.
This could be a great issue to cut your teeth on with helping Solr :-) Regards, Alex. On Fri, 5 Feb 2021 at 10:35, nq <nq@uber.space> wrote: > > Hi Alex, > > > Thanks a lot for your help! > > I have tested the same using the 'techproducts' example as proposed, and > it worked fine. > > > You are right, the documentation seems to be outdated in this aspect. > > I have just reviewed the solrconfig.xml of the 'schemaless' example and > found all the Solr Cell config was completely missing. > > After adding it as described at > > https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml > > everything worked fine again. > > > What can I do to help updating the docs? > > > Best regards, > > Leon > > > Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch: > > I think the extract handler is not defined in schemaless. This may be > > a change from before and the documentation is out of sync. > > > > Can you try 'techproducts' example instead of schemaless: > > bin/solr stop (if you are still running it) > > bin/solr start -e techproducts > > > > Then the import command. > > > > The Tika integration is defined in solrconfig.xml and needs both > > handler defined and some libraries loaded. Once you confirmed you like > > what you see, you can copy those into whatever configuration you are > > working with. > > > > Regards, > > Alex. > > > > On Fri, 5 Feb 2021 at 07:38, nq <nq@uber.space> wrote: > >> Hi, > >> > >> > >> I am new to Solr and tried to follow the guide to upload PDF data using > >> Tika, on Solr 8.7.0 (running on Debian 10): > >> > >> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html > >> > >> but I get an HTTP 404 error when trying to import the file. > >> > >> > >> In the solr installation directory, after spinning up the example server > >> using > >> > >> solr/bin/solr -e schemaless > >> > >> I firstly used the Post Tool to index a PDF file as described in the > >> guide, giving the following output (paths truncated using “[…]” for > >> privacy reasons): > >> > >> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params > >> "literal.id=doc1" > >> > >>> java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes > >>> -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa > >>> che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf > >>> SimplePostTool version 5.0.0 > >>> Posting files to [base] url > >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > >>> Entering auto mode. File endings considered are > >>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log > >>> POSTing file solr-word.pdf (application/pdf) to [base]/extract > >>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for > >>> url: > >>> http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&r > >>> esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > >>> SimplePostTool: WARNING: Response: <html> > >>> <head> > >>> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> > >>> <title>Error 404 Not Found</title> > >>> </head> > >>> <body><h2>HTTP ERROR 404 Not Found</h2> > >>> <table> > >>> <tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr> > >>> <tr><th>STATUS:</th><td>404</td></tr> > >>> <tr><th>MESSAGE:</th><td>Not Found</td></tr> > >>> <tr><th>SERVLET:</th><td>default</td></tr> > >>> </table> > >>> > >>> </body> > >>> </html> > >>> SimplePostTool: WARNING: IOException while reading response: > >>> java.io.FileNotFoundException: > >>> http://localhost:8983/solr/gettingstarted/update/extract > >>> ?literal.id=doc1&resource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf > >>> > >>> 1 files indexed. > >>> COMMITting Solr index changes to > >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1... > >>> Time spent: 0:00:00.038 > >> resulting in no actual changes being visible in the Solr. > >> > >> > >> Using curl results in the same HTTP response: > >> > >>> curl > >>> 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&commit=true' > >>> -F "myfile=@example > >>> /exampledocs/solr-word.pdf" > >>> <html> > >>> <head> > >>> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> > >>> <title>Error 404 Not Found</title> > >>> </head> > >>> <body><h2>HTTP ERROR 404 Not Found</h2> > >>> <table> > >>> <tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr> > >>> <tr><th>STATUS:</th><td>404</td></tr> > >>> <tr><th>MESSAGE:</th><td>Not Found</td></tr> > >>> <tr><th>SERVLET:</th><td>default</td></tr> > >>> </table> > >>> > >>> </body> > >>> </html> > >>> > >> Sorry if this has already been discussed somewhere; I have not been able > >> to find anything helpful yet. > >> > >> Thank you! > >> > >> Leon > >>