I want to use solr to index a markdown website.  The files
are in native markdown, but they are served in HTML (by markserv).

Here's what I did:

docker run --name solr -d -p 8983:8983 -t solr
docker exec -it --user=solr solr bin/solr create_core -c handbook

Then, to crawl the site:

quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook 
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar 
-Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web 
org.apache.solr.util.SimplePostTool http://quadra.franz.com:9091/index.md
SimplePostTool version 5.0.0
Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract
Entering auto mode. Indexing pages with content-types corresponding to file 
endings md
SimplePostTool: WARNING: Never crawl an external web site faster than every 10 
seconds, your IP will probably be blocked
Entering recursive mode, depth=10, delay=0s
Entering crawl at level 0 (1 links total, 1 new)
Exception in thread "main" java.lang.NullPointerException
        at 
org.apache.solr.util.SimplePostTool$PageFetcher.readPageFromUrl(SimplePostTool.java:1138)
        at org.apache.solr.util.SimplePostTool.webCrawl(SimplePostTool.java:603)
        at 
org.apache.solr.util.SimplePostTool.postWebPages(SimplePostTool.java:563)
        at 
org.apache.solr.util.SimplePostTool.doWebMode(SimplePostTool.java:365)
        at org.apache.solr.util.SimplePostTool.execute(SimplePostTool.java:187)
        at org.apache.solr.util.SimplePostTool.main(SimplePostTool.java:172)
quadra[git:master]$ 


Any ideas on what I did wrong?

Thanks.

Kevin

Reply via email to