Ravisher Singh created SOLR-14959:
-------------------------------------

             Summary: Getting an error trying to web crawl a website
                 Key: SOLR-14959
                 URL: https://issues.apache.org/jira/browse/SOLR-14959
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: website
    Affects Versions: 8.6.3
         Environment: OS: Mac

 
            Reporter: Ravisher Singh


Hi,

I am getting following error when trying to crawl a website please direct me in 
right direction.

Ravishers-MacBook-Air:solr-8.6.3 ravishersingh$ bin/post -c solrhelp -filetypes 
html https://factorpad.com/tech/solr/index.html

java -classpath 
/Users/ravishersingh/desktop/solr-8.6.3/dist/solr-core-8.6.3.jar -Dauto=yes 
-Dfiletypes=html -Dc=solrhelp -Ddata=web org.apache.solr.util.SimplePostTool 
https://factorpad.com/tech/solr/index.html

SimplePostTool version 5.0.0

Posting web pages to Solr url http://localhost:8983/solr/solrhelp/update/extract

Entering auto mode. Indexing pages with content-types corresponding to file 
endings html

Entering crawl at level 0 (1 links total, 1 new)

SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: 
http://localhost:8983/solr/solrhelp/update/extract?literal.id=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html&literal.url=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html

SimplePostTool: WARNING: Response: <html>

<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

<title>Error 404 Not Found</title>

</head>

<body><h2>HTTP ERROR 404 Not Found</h2>

<table>

<tr><th>URI:</th><td>/solr/solrhelp/update/extract</td></tr>

<tr><th>STATUS:</th><td>404</td></tr>

<tr><th>MESSAGE:</th><td>Not Found</td></tr>

<tr><th>SERVLET:</th><td>default</td></tr>

</table>

 

</body>

</html>

SimplePostTool: WARNING: IOException while reading response: 
java.io.FileNotFoundException: 
http://localhost:8983/solr/solrhelp/update/extract?literal.id=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html&literal.url=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html

SimplePostTool: WARNING: An error occurred while posting 
https://factorpad.com/tech/solr/index.html

0 web pages indexed.

COMMITting Solr index changes to 
http://localhost:8983/solr/solrhelp/update/extract...

SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: 
http://localhost:8983/solr/solrhelp/update/extract?commit=true

SimplePostTool: WARNING: Response: <html>

<head>

<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

<title>Error 404 Not Found</title>

</head>

<body><h2>HTTP ERROR 404 Not Found</h2>

<table>

<tr><th>URI:</th><td>/solr/solrhelp/update/extract</td></tr>

<tr><th>STATUS:</th><td>404</td></tr>

<tr><th>MESSAGE:</th><td>Not Found</td></tr>

<tr><th>SERVLET:</th><td>default</td></tr>

</table>

 

</body>

</html>

Time spent: 0:00:01.356



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to