solr 7.0.1: exception running post to crawl simple website

2017-10-11 Thread Kevin Layer
I want to use solr to index a markdown website. The files are in native markdown, but they are served in HTML (by markserv). Here's what I did: docker run --name solr -d -p 8983:8983 -t solr docker exec -it --user=solr solr bin/solr create_core -c handbook Then, to crawl the site: quadra[git:m

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Amrit Sarkar wrote: >> Kevin, >> >> You are getting NPE at: >> >> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL >> >> // related code >> >> String rawContentType = conn.getContentType(); >> >> public String getContentType() { >> return getHeaderField("content

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
OK, so I hacked markserv to add Content-Type text/html, but now I get SimplePostTool: WARNING: Skipping URL with unsupported type text/html What is it expecting? $ docker exec -it --user=solr solr bin/post -c handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md /docker-java

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Kevin >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Fri, Oct 13, 2017 at

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
rted. >> > >> > >> > Amrit Sarkar >> > Search Engineer >> > Lucidworks, Inc. >> > 415-589-9269 >> > www.lucidworks.com >> > Twitter http://twitter.com/lucidworks >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >>

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar >> wrote: >> >> > Ah! >> > >> > Only supported type is: text/html; encoding=utf-8 >> > >> > I am not confident of this either :) but this should work. >> > >> > See the

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
ucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote: >> >> > Amrit Sarkar wrote: >> >

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
89-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote: >> >> > Amrit Sarkar wrote: >> > >> > >> Kevin, >

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
:38.861 INFO (qtp1911006827-14) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1507905257696&since=0} status=0 QTime=14 2017-10-13 14:49:48.853 INFO (qtp1911006827-18) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging par

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
md" back to the regular output that doesn't scan anything. >> >> If you get past this hurdle this hurdle, let me know. >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http:

Re: solr 7.0.1: exception running post to crawl simple website

2017-10-13 Thread Kevin Layer
.java:339) >> > at javax.xml.parsers.DocumentBuilder.parse( >> > DocumentBuilder.java:121) >> > at org.apache.solr.util.SimplePostTool.makeDom( >> > SimplePostTool.java:1061) >> > at org.apache.solr.util.SimplePostTool$PageFetcher. >> >