Shawn, Thank you I will follow Erick's steps BTW I am also trying to ingesting using Flume , Flume uses Morphlines along with Tika Even Flume SolrSink will have the same issue?
Currently my SolrSink does not ingest the data and also I do not see any error in my logs. I am seeing lot of issues with Solr Could you please suggest me what could be the issue with my Flume SolrSink? I have attached my another email sent on SolrSink issue Regards, ~Sri -----Original Message----- From: Shawn Heisey [mailto:[email protected]] Sent: Wednesday, February 08, 2017 2:21 PM To: [email protected] Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document # 1 On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote: > Thank you for your reply > Other archive message you mentioned is posted by me only I am new to > Solr, When you say process outside Solr program. What exactly I should do? > > I am having lots of text document which I need to index, what should I apply > to these document before loading it to Solr? Did you not see Erick's reply, where he provided the following link, and said that the program shown there was a decent guide to writing your own program to handle Tika processing? https://lucidworks.com/2012/02/14/indexing-with-solrj/ The blog post includes code that talks to a database, which would be fairly easy to remove/change. Some knowledge of how to write Java programs is required. Tika is a Java API, so writing the program in Java is a prerequisite. The entire point of this idea is to take the Tika processing out of the Solr server(s). If Tika runs within Solr, it can cause Solr to hang or crash. The authors of Tika try as hard as they can to make sure it works well, but the software is dealing with proprietary data formats that are not publicly documented. Sometimes one of those documents can cause Tika to explode. Crashes in client code won't break your application, and it is likely easier to recover from a crash at that level. Thanks, Shawn
--- Begin Message ---Hi, I am indexing text document using Flume, I do not see any error or warning message but data is not getting ingested to Solr Log level for both Solr and Flume is set to TRACE, ALL Flume version : 1.5.2.2.3 Solr Version : 5.5 Config files are as below Flume Config : agent.sources = SpoolDirSrc agent.channels = FileChannel agent.sinks = SolrSink # Configure Source agent.sources.SpoolDirSrc.channels = fileChannel agent.sources.SpoolDirSrc.type = spooldir agent.sources.SpoolDirSrc.spoolDir = /home/flume/source_emails agent.sources.SpoolDirSrc.basenameHeader = true agent.sources.SpoolDirSrc.fileHeader = true #agent.sources.src1.fileSuffix = .COMPLETED agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder # Use a channel that buffers events in memory agent.channels.FileChannel.type = file agent.channels.FileChannel.capacity = 10000 #agent.channels.FileChannel.transactionCapacity = 10000 # Configure Solr Sink agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf agent.sinks.SolrSink.batchsize = 1000 agent.sinks.SolrSink.batchDurationMillis = 2500 agent.sinks.SolrSink.channel = fileChannel agent.sinks.SolrSink.morphlineId = morphline1 agent.sources.SpoolDirSrc.channels = FileChannel agent.sinks.SolrSink.channel = FileChannel Morphline Config solrLocator: { collection : gsearch #zkHost : "127.0.0.1:9983" zkHost : "codesolr-as-r3p:21810,codesolr-as-r3p:21811,codesolr-as-r3p:21812" } morphlines : [ { id : morphline1 importCommands : ["org.kitesdk.**", "org.apache.solr.**"] commands : [ { detectMimeType { includeDefaultMimeTypes : true } } { solrCell { solrLocator : ${solrLocator} captureAttr : true lowernames : true capture : [_attachment_body, _attachment_mimetype, basename, content, content_encoding, content_type, file, meta] parsers : [ { parser : org.apache.tika.parser.txt.TXTParser } ] } } { generateUUID { field : id } } { sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } } { logDebug { format : "output record: {}", args : ["@{}"] } } { loadSolr: { solrLocator : ${solrLocator} } } ] } ] Please help me what could be the issue Regards, ~Sri
--- End Message ---
