ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in the machine. I haven't played much with docker, any way you can get that file from that location.
Amrit Sarkar Search Engineer Lucidworks, Inc. 415-589-9269 www.lucidworks.com Twitter http://twitter.com/lucidworks LinkedIn: https://www.linkedin.com/in/sarkaramrit2 On Fri, Oct 13, 2017 at 8:08 PM, Kevin Layer <la...@franz.com> wrote: > Amrit Sarkar wrote: > > >> Hi Kevin, > >> > >> Can you post the solr log in the mail thread. I don't think it handled > the > >> .md by itself by first glance at code. > > How do I extract the log you want? > > > >> > >> Amrit Sarkar > >> Search Engineer > >> Lucidworks, Inc. > >> 415-589-9269 > >> www.lucidworks.com > >> Twitter http://twitter.com/lucidworks > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> On Fri, Oct 13, 2017 at 7:42 PM, Kevin Layer <la...@franz.com> wrote: > >> > >> > Amrit Sarkar wrote: > >> > > >> > >> Kevin, > >> > >> > >> > >> Just put "html" too and give it a shot. These are the types it is > >> > expecting: > >> > > >> > Same thing. > >> > > >> > >> > >> > >> mimeMap = new HashMap<>(); > >> > >> mimeMap.put("xml", "application/xml"); > >> > >> mimeMap.put("csv", "text/csv"); > >> > >> mimeMap.put("json", "application/json"); > >> > >> mimeMap.put("jsonl", "application/json"); > >> > >> mimeMap.put("pdf", "application/pdf"); > >> > >> mimeMap.put("rtf", "text/rtf"); > >> > >> mimeMap.put("html", "text/html"); > >> > >> mimeMap.put("htm", "text/html"); > >> > >> mimeMap.put("doc", "application/msword"); > >> > >> mimeMap.put("docx", > >> > >> "application/vnd.openxmlformats-officedocument. > >> > wordprocessingml.document"); > >> > >> mimeMap.put("ppt", "application/vnd.ms-powerpoint"); > >> > >> mimeMap.put("pptx", > >> > >> "application/vnd.openxmlformats-officedocument. > >> > presentationml.presentation"); > >> > >> mimeMap.put("xls", "application/vnd.ms-excel"); > >> > >> mimeMap.put("xlsx", > >> > >> "application/vnd.openxmlformats-officedocument. > spreadsheetml.sheet"); > >> > >> mimeMap.put("odt", "application/vnd.oasis.opendocument.text"); > >> > >> mimeMap.put("ott", "application/vnd.oasis.opendocument.text"); > >> > >> mimeMap.put("odp", "application/vnd.oasis. > opendocument.presentation"); > >> > >> mimeMap.put("otp", "application/vnd.oasis. > opendocument.presentation"); > >> > >> mimeMap.put("ods", "application/vnd.oasis. > opendocument.spreadsheet"); > >> > >> mimeMap.put("ots", "application/vnd.oasis. > opendocument.spreadsheet"); > >> > >> mimeMap.put("txt", "text/plain"); > >> > >> mimeMap.put("log", "text/plain"); > >> > >> > >> > >> The keys are the types supported. > >> > >> > >> > >> > >> > >> Amrit Sarkar > >> > >> Search Engineer > >> > >> Lucidworks, Inc. > >> > >> 415-589-9269 > >> > >> www.lucidworks.com > >> > >> Twitter http://twitter.com/lucidworks > >> > >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> > >> > >> On Fri, Oct 13, 2017 at 6:56 PM, Amrit Sarkar < > sarkaramr...@gmail.com> > >> > >> wrote: > >> > >> > >> > >> > Ah! > >> > >> > > >> > >> > Only supported type is: text/html; encoding=utf-8 > >> > >> > > >> > >> > I am not confident of this either :) but this should work. > >> > >> > > >> > >> > See the code-snippet below: > >> > >> > > >> > >> > ...... > >> > >> > > >> > >> > if(res.httpStatus == 200) { > >> > >> > // Raw content type of form "text/html; encoding=utf-8" > >> > >> > String rawContentType = conn.getContentType(); > >> > >> > String type = rawContentType.split(";")[0]; > >> > >> > if(typeSupported(type) || "*".equals(fileTypes)) { > >> > >> > String encoding = conn.getContentEncoding(); > >> > >> > > >> > >> > .... > >> > >> > > >> > >> > > >> > >> > Amrit Sarkar > >> > >> > Search Engineer > >> > >> > Lucidworks, Inc. > >> > >> > 415-589-9269 > >> > >> > www.lucidworks.com > >> > >> > Twitter http://twitter.com/lucidworks > >> > >> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> > > >> > >> > On Fri, Oct 13, 2017 at 6:51 PM, Kevin Layer <la...@franz.com> > wrote: > >> > >> > > >> > >> >> Amrit Sarkar wrote: > >> > >> >> > >> > >> >> >> Strange, > >> > >> >> >> > >> > >> >> >> Can you add: "text/html;charset=utf-8". This is > wiki.apache.org > >> > page's > >> > >> >> >> Content-Type. Let's see what it says now. > >> > >> >> > >> > >> >> Same thing. Verified Content-Type: > >> > >> >> > >> > >> >> quadra[git:master]$ wget -S -O /dev/null > http://quadra:9091/index.md > >> > |& > >> > >> >> grep Content-Type > >> > >> >> Content-Type: text/html;charset=utf-8 > >> > >> >> quadra[git:master]$ ] > >> > >> >> > >> > >> >> quadra[git:master]$ docker exec -it --user=solr solr bin/post -c > >> > handbook > >> > >> >> http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes > md > >> > >> >> /docker-java-home/jre/bin/java -classpath > >> > /opt/solr/dist/solr-core-7.0.1.jar > >> > >> >> -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook > >> > -Ddata=web > >> > >> >> org.apache.solr.util.SimplePostTool http://quadra:9091/index.md > >> > >> >> SimplePostTool version 5.0.0 > >> > >> >> Posting web pages to Solr url http://localhost:8983/solr/han > >> > >> >> dbook/update/extract > >> > >> >> Entering auto mode. Indexing pages with content-types > corresponding > >> > to > >> > >> >> file endings md > >> > >> >> SimplePostTool: WARNING: Never crawl an external web site > faster than > >> > >> >> every 10 seconds, your IP will probably be blocked > >> > >> >> Entering recursive mode, depth=10, delay=0s > >> > >> >> Entering crawl at level 0 (1 links total, 1 new) > >> > >> >> SimplePostTool: WARNING: Skipping URL with unsupported type > text/html > >> > >> >> SimplePostTool: WARNING: The URL http://quadra:9091/index.md > >> > returned a > >> > >> >> HTTP result status of 415 > >> > >> >> 0 web pages indexed. > >> > >> >> COMMITting Solr index changes to http://localhost:8983/solr/han > >> > >> >> dbook/update/extract... > >> > >> >> Time spent: 0:00:00.531 > >> > >> >> quadra[git:master]$ > >> > >> >> > >> > >> >> Kevin > >> > >> >> > >> > >> >> >> > >> > >> >> >> Amrit Sarkar > >> > >> >> >> Search Engineer > >> > >> >> >> Lucidworks, Inc. > >> > >> >> >> 415-589-9269 > >> > >> >> >> www.lucidworks.com > >> > >> >> >> Twitter http://twitter.com/lucidworks > >> > >> >> >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 > >> > >> >> >> > >> > >> >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer < > la...@franz.com> > >> > wrote: > >> > >> >> >> > >> > >> >> >> > OK, so I hacked markserv to add Content-Type text/html, > but now > >> > I get > >> > >> >> >> > > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type > >> > text/html > >> > >> >> >> > > >> > >> >> >> > What is it expecting? > >> > >> >> >> > > >> > >> >> >> > $ docker exec -it --user=solr solr bin/post -c handbook > >> > >> >> >> > http://quadra:9091/index.md -recursive 10 -delay 0 > -filetypes > >> > md > >> > >> >> >> > /docker-java-home/jre/bin/java -classpath > >> > >> >> /opt/solr/dist/solr-core-7.0.1.jar > >> > >> >> >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md > -Dc=handbook > >> > >> >> -Ddata=web > >> > >> >> >> > org.apache.solr.util.SimplePostTool > http://quadra:9091/index.md > >> > >> >> >> > SimplePostTool version 5.0.0 > >> > >> >> >> > Posting web pages to Solr url http://localhost:8983/solr/ > >> > >> >> >> > handbook/update/extract > >> > >> >> >> > Entering auto mode. Indexing pages with content-types > >> > corresponding > >> > >> >> to > >> > >> >> >> > file endings md > >> > >> >> >> > SimplePostTool: WARNING: Never crawl an external web site > >> > faster than > >> > >> >> >> > every 10 seconds, your IP will probably be blocked > >> > >> >> >> > Entering recursive mode, depth=10, delay=0s > >> > >> >> >> > Entering crawl at level 0 (1 links total, 1 new) > >> > >> >> >> > SimplePostTool: WARNING: Skipping URL with unsupported type > >> > text/html > >> > >> >> >> > SimplePostTool: WARNING: The URL > http://quadra:9091/index.md > >> > >> >> returned a > >> > >> >> >> > HTTP result status of 415 > >> > >> >> >> > 0 web pages indexed. > >> > >> >> >> > COMMITting Solr index changes to > http://localhost:8983/solr/ > >> > >> >> >> > handbook/update/extract... > >> > >> >> >> > Time spent: 0:00:03.882 > >> > >> >> >> > $ > >> > >> >> >> > > >> > >> >> >> > Thanks. > >> > >> >> >> > > >> > >> >> >> > Kevin > >> > >> >> >> > > >> > >> >> > >> > >> > > >> > >> > > >> > >