Shawn,
Thank you I will follow Erick's steps
BTW I am also trying to ingesting using Flume , Flume uses Morphlines along
with Tika
Even Flume SolrSink will have the same issue?
Currently my SolrSink does not ingest the data and also I do not see any error
in my logs.
I am seeing lot of issues with Solr
Could you please suggest me what could be the issue with my Flume SolrSink?
I have attached my another email sent on SolrSink issue
Regards,
~Sri
-----Original Message-----
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Wednesday, February 08, 2017 2:21 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImportHandler - Unable to load Tika Config Processing Document
# 1
On 2/8/2017 9:08 AM, Anatharaman, Srinatha (Contractor) wrote:
> Thank you for your reply
> Other archive message you mentioned is posted by me only I am new to
> Solr, When you say process outside Solr program. What exactly I should do?
>
> I am having lots of text document which I need to index, what should I apply
> to these document before loading it to Solr?
Did you not see Erick's reply, where he provided the following link, and said
that the program shown there was a decent guide to writing your own program to
handle Tika processing?
https://lucidworks.com/2012/02/14/indexing-with-solrj/
The blog post includes code that talks to a database, which would be fairly
easy to remove/change. Some knowledge of how to write Java programs is
required. Tika is a Java API, so writing the program in Java is a prerequisite.
The entire point of this idea is to take the Tika processing out of the Solr
server(s). If Tika runs within Solr, it can cause Solr to hang or crash. The
authors of Tika try as hard as they can to make sure it works well, but the
software is dealing with proprietary data formats that are not publicly
documented. Sometimes one of those documents can cause Tika to explode.
Crashes in client code won't break your application, and it is likely easier to
recover from a crash at that level.
Thanks,
Shawn
--- Begin Message ---
Hi,
I am indexing text document using Flume,
I do not see any error or warning message but data is not getting ingested to
Solr
Log level for both Solr and Flume is set to TRACE, ALL
Flume version : 1.5.2.2.3
Solr Version : 5.5
Config files are as below
Flume Config :
agent.sources = SpoolDirSrc
agent.channels = FileChannel
agent.sinks = SolrSink
# Configure Source
agent.sources.SpoolDirSrc.channels = fileChannel
agent.sources.SpoolDirSrc.type = spooldir
agent.sources.SpoolDirSrc.spoolDir = /home/flume/source_emails
agent.sources.SpoolDirSrc.basenameHeader = true
agent.sources.SpoolDirSrc.fileHeader = true
#agent.sources.src1.fileSuffix = .COMPLETED
agent.sources.SpoolDirSrc.deserializer =
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
# Use a channel that buffers events in memory
agent.channels.FileChannel.type = file
agent.channels.FileChannel.capacity = 10000
#agent.channels.FileChannel.transactionCapacity = 10000
# Configure Solr Sink
agent.sinks.SolrSink.type =
org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
agent.sinks.SolrSink.batchsize = 1000
agent.sinks.SolrSink.batchDurationMillis = 2500
agent.sinks.SolrSink.channel = fileChannel
agent.sinks.SolrSink.morphlineId = morphline1
agent.sources.SpoolDirSrc.channels = FileChannel
agent.sinks.SolrSink.channel = FileChannel
Morphline Config
solrLocator: {
collection : gsearch
#zkHost : "127.0.0.1:9983"
zkHost : "codesolr-as-r3p:21810,codesolr-as-r3p:21811,codesolr-as-r3p:21812"
}
morphlines :
[
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**"]
commands :
[
{ detectMimeType { includeDefaultMimeTypes : true } }
{
solrCell {
solrLocator : ${solrLocator}
captureAttr : true
lowernames : true
capture : [_attachment_body, _attachment_mimetype, basename, content,
content_encoding, content_type, file, meta]
parsers : [ { parser : org.apache.tika.parser.txt.TXTParser } ]
}
}
{ generateUUID { field : id } }
{ sanitizeUnknownSolrFields { solrLocator : ${solrLocator} } }
{ logDebug { format : "output record: {}", args : ["@{}"] } }
{ loadSolr: { solrLocator : ${solrLocator} } }
]
}
]
Please help me what could be the issue
Regards,
~Sri
--- End Message ---