Toby,
Your mention of "-recursive" causing a problem reminded me of a simple
crawl (of the 7.0 Ref Guide) using bin/post I was trying to get to
work the other day and couldn't.
The order of the parameters seems to make a difference with what error
you get (this is using 7.1):
1. "./bin/post -c g
Amrit Sarkar wrote
> The above is SAXParse, runtime exception. Nothing can be done at Solr end
> except curating your own data.
I'm trying to replace a solr-4.6.0 system (which has been working
brilliantly for 3 years!) with solr-7.1.0. I'm running into this exact same
problem.
I do not believe i
On 2017-10-13 04:19 PM, Kevin Layer wrote:
Amrit Sarkar wrote:
Kevin,
fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.
OK, thanks. Looks like I'll have to abandon using solr for this
project (or find another way to crawl the site).
Thank you for all the help,
Amrit Sarkar wrote:
>> Kevin,
>>
>> fileType => md is not recognizable format in SimplePostTool, anyway, moving
>> on.
OK, thanks. Looks like I'll have to abandon using solr for this
project (or find another way to crawl the site).
Thank you for all the help, though. I appreciate it.
>> The
Kevin,
fileType => md is not recognizable format in SimplePostTool, anyway, moving
on.
The above is SAXParse, runtime exception. Nothing can be done at Solr end
except curating your own data.
Some helpful links:
https://stackoverflow.com/questions/2599919/java-parsing-xml-document-gives-content-n
Amrit Sarkar wrote:
>> Kevin,
>>
>> I am not able to replicate the issue on my system, which is bit annoying
>> for me. Try this out for last time:
>>
>> docker exec -it --user=solr solr bin/post -c handbook
>> http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
>>
>> a
Kevin,
I am not able to replicate the issue on my system, which is bit annoying
for me. Try this out for last time:
docker exec -it --user=solr solr bin/post -c handbook
http://quadra.franz.com:9091/index.md -recursive 10 -delay 0 -filetypes html
and have Content-Type: "html" and "text/html", tr
Amrit Sarkar wrote:
>> ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
>> the machine. I haven't played much with docker, any way you can get that
>> file from that location.
I see these files:
/opt/solr/server/logs/archived
/opt/solr/server/logs/solr_gc.log.0.current
/o
pardon: [solr-home]/server/log/solr.log
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
On Fri, Oct 13, 2017 at 8:10 PM, Amrit Sarkar
wrote:
> ah oh, dockers. They are placed u
ah oh, dockers. They are placed under [solr-home]/server/log/solr/log in
the machine. I haven't played much with docker, any way you can get that
file from that location.
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: h
Amrit Sarkar wrote:
>> Hi Kevin,
>>
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.
Note that when I use the admin web interface, and click on "Logging"
on the left, I just see a spinner that implies it's trying to retrieve
Amrit Sarkar wrote:
>> Hi Kevin,
>>
>> Can you post the solr log in the mail thread. I don't think it handled the
>> .md by itself by first glance at code.
How do I extract the log you want?
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twi
Hi Kevin,
Can you post the solr log in the mail thread. I don't think it handled the
.md by itself by first glance at code.
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
On Fr
Amrit Sarkar wrote:
>> Kevin,
>>
>> Just put "html" too and give it a shot. These are the types it is expecting:
Same thing.
>>
>> mimeMap = new HashMap<>();
>> mimeMap.put("xml", "application/xml");
>> mimeMap.put("csv", "text/csv");
>> mimeMap.put("json", "application/json");
>> mimeMap.put(
Amrit Sarkar wrote:
>> Reference to the code:
>>
>> .
>>
>> String rawContentType = conn.getContentType();
>> String type = rawContentType.split(";")[0];
>> if(typeSupported(type) || "*".equals(fileTypes)) {
>> String encoding = conn.getContentEncoding();
>>
>> .
>>
>> protected bool
Ah!
Only supported type is: text/html; encoding=utf-8
I am not confident of this either :) but this should work.
See the code-snippet below:
..
if(res.httpStatus == 200) {
// Raw content type of form "text/html; encoding=utf-8"
String rawContentType = conn.getContentType();
String ty
Kevin,
Just put "html" too and give it a shot. These are the types it is expecting:
mimeMap = new HashMap<>();
mimeMap.put("xml", "application/xml");
mimeMap.put("csv", "text/csv");
mimeMap.put("json", "application/json");
mimeMap.put("jsonl", "application/json");
mimeMap.put("pdf", "application/
Reference to the code:
.
String rawContentType = conn.getContentType();
String type = rawContentType.split(";")[0];
if(typeSupported(type) || "*".equals(fileTypes)) {
String encoding = conn.getContentEncoding();
.
protected boolean typeSupported(String type) {
for(String key : mimeM
Amrit Sarkar wrote:
>> Strange,
>>
>> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
>> Content-Type. Let's see what it says now.
Same thing. Verified Content-Type:
quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& grep
Content-Type
Content-Type
Strange,
Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's
Content-Type. Let's see what it says now.
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
On Fri,
OK, so I hacked markserv to add Content-Type text/html, but now I get
SimplePostTool: WARNING: Skipping URL with unsupported type text/html
What is it expecting?
$ docker exec -it --user=solr solr bin/post -c handbook
http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md
/docker-java
Amrit Sarkar wrote:
>> Kevin,
>>
>> You are getting NPE at:
>>
>> String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
>>
>> // related code
>>
>> String rawContentType = conn.getContentType();
>>
>> public String getContentType() {
>> return getHeaderField("content
Kevin,
You are getting NPE at:
String type = rawContentType.split(";")[0]; //HERE - rawContentType is NULL
// related code
String rawContentType = conn.getContentType();
public String getContentType() {
return getHeaderField("content-type");
}
HttpURLConnection conn = (HttpURLConnection)
23 matches
Mail list logo