I want to use solr to index a markdown website. The files
are in native markdown, but they are served in HTML (by markserv).
Here's what I did:
docker run --name solr -d -p 8983:8983 -t solr
docker exec -it --user=solr solr bin/solr create_core -c handbook
Then, to crawl the site:
quadra[git:m
Amrit Sarkar wrote:
>> Kevin,
>>
>> You are getting NPE at:
>>
>> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
>>
>> // related code
>>
>> String rawContentType = conn.getContentType();
>>
>> public String getContentType() {
>> return getHeaderField("content
OK, so I hacked markserv to add Content-Type text/html, but now I get
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
What is it expecting?
$ docker exec -it --user=solr solr bin/post -c handbook
http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java
Kevin
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at
rted.
>> >
>> >
>> > Amrit Sarkar
>> > Search Engineer
>> > Lucidworks, Inc.
>> > 415-589-9269
>> > www.lucidworks.com
>> > Twitter http://twitter.com/lucidworks
>> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar
>> wrote:
>>
>> > Ah!
>> >
>> > Only supported type is: text/html; encoding=utf-8
>> >
>> > I am not confident of this either :) but this should work.
>> >
>> > See the
ucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote:
>>
>> > Amrit Sarkar wrote:
>> >
89-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>>
>> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer wrote:
>>
>> > Amrit Sarkar wrote:
>> >
>> > >> Kevin,
>
:38.861 INFO (qtp1911006827-14) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/logging
params={wt=json&_=1507905257696&since=0} status=0 QTime=14
2017-10-13 14:49:48.853 INFO (qtp1911006827-18) [ ] o.a.s.s.HttpSolrCall
[admin] webapp=null path=/admin/info/logging
par
md" back to the regular output that doesn't scan
anything.
>>
>> If you get past this hurdle this hurdle, let me know.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http:
.java:339)
>> > at javax.xml.parsers.DocumentBuilder.parse(
>> > DocumentBuilder.java:121)
>> > at org.apache.solr.util.SimplePostTool.makeDom(
>> > SimplePostTool.java:1061)
>> > at org.apache.solr.util.SimplePostTool$PageFetcher.
>> >
11 matches
Mail list logo