Strange, Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's Content-Type. Let's see what it says now.
Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <la...@franz.com> wrote: > OK, so I hacked markserv to add Content-Type text/html, but now I get > > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > > What is it expecting? > > $ docker exec -it --user=solr solr bin/post -c handbook > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md > /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md > SimplePostTool version 5.0.0 > Posting web pages to Solr url http://localhost:8983/solr/ > handbook/update/extract > Entering auto mode. Indexing pages with content-types corresponding to > file endings md > SimplePostTool: WARNING: Never crawl an external web site faster than > every 10 seconds, your IP will probably be blocked > Entering recursive mode, depth=10, delay=0s > Entering crawl at level 0 (1 links total, 1 new) > SimplePostTool: WARNING: Skipping URL with unsupported type text/html > SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a > HTTP result status of 415 > 0 web pages indexed. > COMMITting Solr index changes to http://localhost:8983/solr/ > handbook/update/extract... > Time spent: 0:00:03.882 > $ > > Thanks. > > Kevin >