Ravisher Singh created SOLR-14959: ------------------------------------- Summary: Getting an error trying to web crawl a website Key: SOLR-14959 URL: https://issues.apache.org/jira/browse/SOLR-14959 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: website Affects Versions: 8.6.3 Environment: OS: Mac
Reporter: Ravisher Singh Hi, I am getting following error when trying to crawl a website please direct me in right direction. Ravishers-MacBook-Air:solr-8.6.3 ravishersingh$ bin/post -c solrhelp -filetypes html https://factorpad.com/tech/solr/index.html java -classpath /Users/ravishersingh/desktop/solr-8.6.3/dist/solr-core-8.6.3.jar -Dauto=yes -Dfiletypes=html -Dc=solrhelp -Ddata=web org.apache.solr.util.SimplePostTool https://factorpad.com/tech/solr/index.html SimplePostTool version 5.0.0 Posting web pages to Solr url http://localhost:8983/solr/solrhelp/update/extract Entering auto mode. Indexing pages with content-types corresponding to file endings html Entering crawl at level 0 (1 links total, 1 new) SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?literal.id=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html&literal.url=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html SimplePostTool: WARNING: Response: <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 404 Not Found</title> </head> <body><h2>HTTP ERROR 404 Not Found</h2> <table> <tr><th>URI:</th><td>/solr/solrhelp/update/extract</td></tr> <tr><th>STATUS:</th><td>404</td></tr> <tr><th>MESSAGE:</th><td>Not Found</td></tr> <tr><th>SERVLET:</th><td>default</td></tr> </table> </body> </html> SimplePostTool: WARNING: IOException while reading response: java.io.FileNotFoundException: http://localhost:8983/solr/solrhelp/update/extract?literal.id=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html&literal.url=https%3A%2F%2Ffactorpad.com%2Ftech%2Fsolr%2Findex.html SimplePostTool: WARNING: An error occurred while posting https://factorpad.com/tech/solr/index.html 0 web pages indexed. COMMITting Solr index changes to http://localhost:8983/solr/solrhelp/update/extract... SimplePostTool: WARNING: Solr returned an error #404 (Not Found) for url: http://localhost:8983/solr/solrhelp/update/extract?commit=true SimplePostTool: WARNING: Response: <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/> <title>Error 404 Not Found</title> </head> <body><h2>HTTP ERROR 404 Not Found</h2> <table> <tr><th>URI:</th><td>/solr/solrhelp/update/extract</td></tr> <tr><th>STATUS:</th><td>404</td></tr> <tr><th>MESSAGE:</th><td>Not Found</td></tr> <tr><th>SERVLET:</th><td>default</td></tr> </table> </body> </html> Time spent: 0:00:01.356 -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org