RE: Excessive Wire logging while indexing.

Phil Scadden Wed, 01 Mar 2017 17:59:49 -0800

>Another side issue:  Using the extracting handler for handling rich documents 
>is discouraged.  Tika (which is what is used by the extracting
>handler) is pretty amazing software, but it has a habit of crashing or 
>consuming all the heap memory when it encounters a document that it doesn't 
>>know how to properly handle.  It is best to run Tika in your external program 
>and send its output to Solr, so that if there's a problem, it won't affect 
>>your search capability.



As an alternative to earlier code, I had tried this (exactly the same set of 
files going in)

            File f = new File(filename);
             ContentHandler textHandler = new BodyContentHandler(10*1024*1024);
             Metadata metadata = new Metadata();
             Parser parser = new AutoDetectParser();
             ParseContext context = new ParseContext();
             InputStream input = new FileInputStream(f);
             try {
               parser.parse(input, textHandler, metadata, context);
             } catch (Exception e) {
               
Logger.getLogger(JsMapAdminService.class.getName()).log(Level.SEVERE, 
null,String.format("File %s failed", f.getCanonicalPath()));
               e.printStackTrace();
              }
             SolrInputDocument up = new SolrInputDocument();
             up.addField("id",f.getCanonicalPath());
             up.addField("fileLocation",idString);
             up.addField("access",access);
             up.addField("title",metadata.get("title"));
             up.addField("author",metadata.get("author"));
             String content = textHandler.toString();
             up.addField("_text_",content);
             solr.add(up);
             return true;
Exceptions never triggered but metadata was essentially empty except for 
contentType, and content was always an empty string. I don’t know what parser 
was doing, but I gave up and with the extractHandler route instead which did at 
least build a full index.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.

RE: Excessive Wire logging while indexing.

Reply via email to