Re: Offline Indexes Update to Shard

Wolfgang Hoschek Tue, 03 Jun 2014 11:17:51 -0700

Hi see comments inline below…

On Jun 2, 2014, at 6:49 AM, Vineet Mishra <clearmido...@gmail.com> wrote:


> Hi Wolfgang,
> 
> Thanks for your response, can you quote some running example of
> MapReduceIndexerTool
> for indexing through csv files.
> If you are referring to
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html?scroll=csug_topic_6_1
> 
> I had a few points to clarify,
> *what is the morphline?

See http://kitesdk.org/docs/current/kite-morphlines/index.html and 
http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/readCSV

> *Is it necessary to use morphline for indexing, if yes how to create one?

Yes, it requires a morphline (which is basically a chain of plugins), and you 
can plug in any custom java code and custom commands into a morphline.

> *can Index only reside on HDFS and not on LocalFS?

The implementation is only on HDFS.

> *what is the minimum cdh version supported for it?

CDH 4 or CDH 5.

Wolfgang.

> 
> Looking forward to your response.
> 
> Thanks!
> 
> 
> On Mon, Jun 2, 2014 at 2:24 PM, Wolfgang Hoschek <whosc...@cloudera.com>
> wrote:
> 
>> Sounds like you should consider using MapReduceIndexerTool. AFAIK, this is
>> the most scalable indexing (and merging) solution out there.
>> 
>> Wolfgang.
>> 
>> On Jun 2, 2014, at 10:33 AM, Vineet Mishra <clearmido...@gmail.com> wrote:
>> 
>>> Hi Erick,
>>> 
>>> Thanks for your mail, please let me go through with my use case.
>>> I am having around 20-40 Billion Records to index with each record is
>>> having around 200-400 fields, the data is sensor data so it can be easily
>>> stored in Integer or Float. Now to index this huge amount of data I am
>>> going with the indexing through EmbeddedSolrServer which was working fine
>>> but I was looking out for a way to move these generated indexes to
>>> different shards possibly without copying pasting it to each machines but
>>> some other approach as to submit this indexes to some shard and let the
>>> shard take care of it distributing it over leader and replica.
>>> I want to mention one more thing, as I started indexing with
>> EmbeddedSolrServer
>>> it went fine for some million of starting documents but there after the
>>> indexing speed is pathetically slow, it indexed around 20GB in a day and
>>> just have indexed 9 GB in another 2 days.
>>> Any indexing optimization approach also requested.
>>> 
>>> Hope this makes things much clearer.
>>> Looking forward to soon hear from you.
>>> 
>>> Thanks and Regards!
>>> 
>>> 
>>> On Fri, May 30, 2014 at 9:09 PM, Erick Erickson <erickerick...@gmail.com
>>> 
>>> wrote:
>>> 
>>>> You can copy to the shards and use the mergindexes command, the
>>>> MapReduceIndexerTool follows that approach.
>>>> 
>>>> But really, what is the higher-level use-case you're trying to support?
>>>> This feels a little like an XY problem. You could do things like
>>>> 1> index to a different collection then use collection aliasing to
>> switch
>>>> 2> just re-index to the current collection.
>>>> 3> use the MapReduceIndexerTool (admittedly it needs Hadoop).
>>>> 
>>>> All in all, it feels like you're doing work you don't need to do. But
>>>> that's a guess since you haven't told us what the use-case is.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> 
>>>> On Thu, May 29, 2014 at 7:22 AM, Otis Gospodnetic <
>>>> otis.gospodne...@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra <clearmido...@gmail.com
>>>>>> wrote:
>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Has anyone tried with building Offline indexes with EmbeddedSolrServer
>>>>> and
>>>>>> posting it to Shards.
>>>>>> 
>>>>> 
>>>>> What do you mean by "posting it to shards"?  How is that different than
>>>>> copying them manually to the right location in FS?  Could you please
>>>>> elaborate?
>>>>> 
>>>>> Otis
>>>>> --
>>>>> Performance Monitoring * Log Analytics * Search Analytics
>>>>> Solr & Elasticsearch Support * http://sematext.com/
>>>>> 
>>>>> 
>>>>> 
>>>>>> FYI, I am done building the indexes but looking out for a way to post
>>>>> these
>>>>>> index files on shards.
>>>>>> Copying the indexes manually to each shard's replica is possible and
>> is
>>>>>> working fine but I don't want to go with that approach.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Offline Indexes Update to Shard

Reply via email to