Hi Leon,

Feel free to create JIRA issue
https://issues.apache.org/jira/secure/Dashboard.jspa
and then do Github pull request to fix the example name.  The
documentation is in asciidoc format at:
https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide/src
with names matching those on the server.

This could be a great issue to cut your teeth on with helping Solr :-)

Regards,
   Alex.

On Fri, 5 Feb 2021 at 10:35, nq <nq@uber.space> wrote:
>
> Hi Alex,
>
>
> Thanks a lot for your help!
>
> I have tested the same using the 'techproducts' example as proposed, and
> it worked fine.
>
>
> You are right, the documentation seems to be outdated in this aspect.
>
> I have just reviewed the solrconfig.xml of the 'schemaless' example and
> found all the Solr Cell config was completely missing.
>
> After adding it as described at
>
> https://lucene.apache.org/solr/guide/8_8/uploading-data-with-solr-cell-using-apache-tika.html#configuring-the-extractingrequesthandler-in-solrconfig-xml
>
> everything worked fine again.
>
>
> What can I do to help updating the docs?
>
>
> Best regards,
>
> Leon
>
>
> Am 05.02.21 um 16:15 schrieb Alexandre Rafalovitch:
> > I think the extract handler is not defined in schemaless. This may be
> > a change from before and the documentation is out of sync.
> >
> > Can you try 'techproducts' example instead of schemaless:
> > bin/solr stop (if you are still running it)
> > bin/solr start -e techproducts
> >
> > Then the import command.
> >
> > The Tika integration is defined in solrconfig.xml and needs both
> > handler defined and some libraries loaded. Once you confirmed you like
> > what you see, you can copy those into whatever configuration you are
> > working with.
> >
> > Regards,
> > Alex.
> >
> > On Fri, 5 Feb 2021 at 07:38, nq <nq@uber.space> wrote:
> >> Hi,
> >>
> >>
> >> I am new to Solr and tried to follow the guide to upload PDF data using
> >> Tika, on Solr 8.7.0 (running on Debian 10):
> >>
> >> https://lucene.apache.org/solr/guide/8_7/uploading-data-with-solr-cell-using-apache-tika.html
> >>
> >> but I get an HTTP 404 error when trying to import the file.
> >>
> >>
> >> In the solr installation directory, after spinning up the example server
> >> using
> >>
> >> solr/bin/solr -e schemaless
> >>
> >> I firstly used the Post Tool to index a PDF file as described in the
> >> guide, giving the following output (paths truncated using “[…]” for
> >> privacy reasons):
> >>
> >> bin/post -c gettingstarted example/exampledocs/solr-word.pdf -params
> >> "literal.id=doc1"
> >>
> >>> java -classpath /[…]/solr-8.7.0/dist/solr-core-8.7.0.jar -Dauto=yes
> >>> -Dparams=literal.id=doc1 -Dc=gettingstarted -Ddata=files org.apa
> >>> che.solr.util.SimplePostTool example/exampledocs/solr-word.pdf
> >>> SimplePostTool version 5.0.0
> >>> Posting files to [base] url
> >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> >>> Entering auto mode. File endings considered are
> >>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
> >>> POSTing file solr-word.pdf (application/pdf) to [base]/extract
> >>> SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for
> >>> url:
> >>> http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&r
> >>> esource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >>> SimplePostTool: WARNING: Response: <html>
> >>> <head>
> >>> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> >>> <title>Error 404 Not Found</title>
> >>> </head>
> >>> <body><h2>HTTP ERROR 404 Not Found</h2>
> >>> <table>
> >>> <tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
> >>> <tr><th>STATUS:</th><td>404</td></tr>
> >>> <tr><th>MESSAGE:</th><td>Not Found</td></tr>
> >>> <tr><th>SERVLET:</th><td>default</td></tr>
> >>> </table>
> >>>
> >>> </body>
> >>> </html>
> >>> SimplePostTool: WARNING: IOException while reading response:
> >>> java.io.FileNotFoundException:
> >>> http://localhost:8983/solr/gettingstarted/update/extract
> >>> ?literal.id=doc1&resource.name=%2F[…]%2Fsolr-8.7.0%2Fexample%2Fexampledocs%2Fsolr-word.pdf
> >>>
> >>> 1 files indexed.
> >>> COMMITting Solr index changes to
> >>> http://localhost:8983/solr/gettingstarted/update?literal.id=doc1...
> >>> Time spent: 0:00:00.038
> >> resulting in no actual changes being visible in the Solr.
> >>
> >>
> >> Using curl results in the same HTTP response:
> >>
> >>> curl
> >>> 'http://localhost:8983/solr/gettingstarted/update/extract?literal.id=doc1&commit=true'
> >>> -F "myfile=@example
> >>> /exampledocs/solr-word.pdf"
> >>> <html>
> >>> <head>
> >>> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> >>> <title>Error 404 Not Found</title>
> >>> </head>
> >>> <body><h2>HTTP ERROR 404 Not Found</h2>
> >>> <table>
> >>> <tr><th>URI:</th><td>/solr/gettingstarted/update/extract</td></tr>
> >>> <tr><th>STATUS:</th><td>404</td></tr>
> >>> <tr><th>MESSAGE:</th><td>Not Found</td></tr>
> >>> <tr><th>SERVLET:</th><td>default</td></tr>
> >>> </table>
> >>>
> >>> </body>
> >>> </html>
> >>>
> >> Sorry if this has already been discussed somewhere; I have not been able
> >> to find anything helpful yet.
> >>
> >> Thank you!
> >>
> >> Leon
> >>

Reply via email to