Hi Wolfgang,

Thanks for your response, can you quote some running example of
MapReduceIndexerTool
for indexing through csv files.
If you are referring to
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html?scroll=csug_topic_6_1

I had a few points to clarify,
*what is the morphline?
*Is it necessary to use morphline for indexing, if yes how to create one?
*can Index only reside on HDFS and not on LocalFS?
*what is the minimum cdh version supported for it?

Looking forward to your response.

Thanks!


On Mon, Jun 2, 2014 at 2:24 PM, Wolfgang Hoschek <whosc...@cloudera.com>
wrote:

> Sounds like you should consider using MapReduceIndexerTool. AFAIK, this is
> the most scalable indexing (and merging) solution out there.
>
> Wolfgang.
>
> On Jun 2, 2014, at 10:33 AM, Vineet Mishra <clearmido...@gmail.com> wrote:
>
> > Hi Erick,
> >
> > Thanks for your mail, please let me go through with my use case.
> > I am having around 20-40 Billion Records to index with each record is
> > having around 200-400 fields, the data is sensor data so it can be easily
> > stored in Integer or Float. Now to index this huge amount of data I am
> > going with the indexing through EmbeddedSolrServer which was working fine
> > but I was looking out for a way to move these generated indexes to
> > different shards possibly without copying pasting it to each machines but
> > some other approach as to submit this indexes to some shard and let the
> > shard take care of it distributing it over leader and replica.
> > I want to mention one more thing, as I started indexing with
> EmbeddedSolrServer
> > it went fine for some million of starting documents but there after the
> > indexing speed is pathetically slow, it indexed around 20GB in a day and
> > just have indexed 9 GB in another 2 days.
> > Any indexing optimization approach also requested.
> >
> > Hope this makes things much clearer.
> > Looking forward to soon hear from you.
> >
> > Thanks and Regards!
> >
> >
> > On Fri, May 30, 2014 at 9:09 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> You can copy to the shards and use the mergindexes command, the
> >> MapReduceIndexerTool follows that approach.
> >>
> >> But really, what is the higher-level use-case you're trying to support?
> >> This feels a little like an XY problem. You could do things like
> >> 1> index to a different collection then use collection aliasing to
> switch
> >> 2> just re-index to the current collection.
> >> 3> use the MapReduceIndexerTool (admittedly it needs Hadoop).
> >>
> >> All in all, it feels like you're doing work you don't need to do. But
> >> that's a guess since you haven't told us what the use-case is.
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Thu, May 29, 2014 at 7:22 AM, Otis Gospodnetic <
> >> otis.gospodne...@gmail.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra <clearmido...@gmail.com
> >>>> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> Has anyone tried with building Offline indexes with EmbeddedSolrServer
> >>> and
> >>>> posting it to Shards.
> >>>>
> >>>
> >>> What do you mean by "posting it to shards"?  How is that different than
> >>> copying them manually to the right location in FS?  Could you please
> >>> elaborate?
> >>>
> >>> Otis
> >>> --
> >>> Performance Monitoring * Log Analytics * Search Analytics
> >>> Solr & Elasticsearch Support * http://sematext.com/
> >>>
> >>>
> >>>
> >>>> FYI, I am done building the indexes but looking out for a way to post
> >>> these
> >>>> index files on shards.
> >>>> Copying the indexes manually to each shard's replica is possible and
> is
> >>>> working fine but I don't want to go with that approach.
> >>>>
> >>>> Thanks!
> >>>>
> >>>
> >>
>
>

Reply via email to