Shawn,

Thank you I will follow Erick's steps
BTW I am also trying to ingesting using Flume , Flume uses Morphlines along 
with Tika
Even Flume SolrSink will have the same issue?

Currently my SolrSink does not ingest the data and also I do not see any error 
in my logs.
I am seeing lot of issues with Solr

Could you please suggest me what could be the issue with my Flume SolrSink?

I have attached my another email sent on SolrSink issue

Regards,
~Sri

-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org] 
Sent: Wednesday, February 08, 2017 2:21 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document 
# 1

On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote:
> Thank you for your reply
> Other archive message you mentioned is posted by me only I am new to 
> Solr, When you say process outside Solr program. What exactly I should do?
>
> I am having lots of text document which I need to index, what should I apply 
> to these document before loading it to Solr?

Did you not see Erick's reply, where he provided the following link, and said 
that the program shown there was a decent guide to writing your own program to 
handle Tika processing?

https://lucidworks.com/2012/02/14/indexing-with-solrj/

The blog post includes code that talks to a database, which would be fairly 
easy to remove/change.  Some knowledge of how to write Java programs is 
required.  Tika is a Java API, so writing the program in Java is a prerequisite.

The entire point of this idea is to take the Tika processing out of the Solr 
server(s).  If Tika runs within Solr, it can cause Solr to hang or crash.  The 
authors of Tika try as hard as they can to make sure it works well, but the 
software is dealing with proprietary data formats that are not publicly 
documented.  Sometimes one of those documents can cause Tika to explode.  
Crashes in client code won't break your application, and it is likely easier to 
recover from a crash at that level.

Thanks,
Shawn


--- Begin Message ---
Hi,





I am indexing text document using Flume,

I do not see any error or warning message but data is not getting ingested to 
Solr

Log level for both Solr and Flume is set to TRACE, ALL



Flume version : 1.5.2.2.3

Solr Version : 5.5

Config files are as below

Flume Config :

agent.sources = SpoolDirSrc

agent.channels = FileChannel

agent.sinks = SolrSink



# Configure Source

agent.sources.SpoolDirSrc.channels = fileChannel

agent.sources.SpoolDirSrc.type = spooldir

agent.sources.SpoolDirSrc.spoolDir = /home/flume/source_emails

agent.sources.SpoolDirSrc.basenameHeader = true

agent.sources.SpoolDirSrc.fileHeader = true

#agent.sources.src1.fileSuffix = .COMPLETED

agent.sources.SpoolDirSrc.deserializer = 
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder

# Use a channel that buffers events in memory

agent.channels.FileChannel.type = file

agent.channels.FileChannel.capacity = 10000

#agent.channels.FileChannel.transactionCapacity = 10000

# Configure Solr Sink

agent.sinks.SolrSink.type = 
org.apache.flume.sink.solr.morphline.MorphlineSolrSink

agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf

agent.sinks.SolrSink.batchsize = 1000

agent.sinks.SolrSink.batchDurationMillis = 2500

agent.sinks.SolrSink.channel = fileChannel

agent.sinks.SolrSink.morphlineId = morphline1

agent.sources.SpoolDirSrc.channels = FileChannel

agent.sinks.SolrSink.channel = FileChannel



Morphline Config

solrLocator: {

collection : gsearch

#zkHost : "127.0.0.1:9983"

zkHost : "codesolr-as-r3p:21810,codesolr-as-r3p:21811,codesolr-as-r3p:21812"

}

morphlines :

[

  {

    id : morphline1

    importCommands : ["org.kitesdk.**", "org.apache.solr.**"]

    commands :

    [

      { detectMimeType { includeDefaultMimeTypes : true } }

      {

        solrCell {

          solrLocator : ${solrLocator}

          captureAttr : true

          lowernames : true

          capture : [_attachment_body, _attachment_mimetype, basename, content, 
content_encoding, content_type, file, meta]

          parsers : [ { parser : org.apache.tika.parser.txt.TXTParser } ]

         }

      }

      { generateUUID { field : id } }

      { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }

      { logDebug { format : "output record: {}", args : ["@{}"] } }

      { loadSolr: { solrLocator : ${solrLocator} } }

    ]

  }

]



Please help me what could be the issue

Regards,

~Sri




--- End Message ---

Reply via email to