Hi Wolfgang, Thanks for your response, can you quote some running example of MapReduceIndexerTool for indexing through csv files. If you are referring to http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html?scroll=csug_topic_6_1
I had a few points to clarify, *what is the morphline? *Is it necessary to use morphline for indexing, if yes how to create one? *can Index only reside on HDFS and not on LocalFS? *what is the minimum cdh version supported for it? Looking forward to your response. Thanks! On Mon, Jun 2, 2014 at 2:24 PM, Wolfgang Hoschek <whosc...@cloudera.com> wrote: > Sounds like you should consider using MapReduceIndexerTool. AFAIK, this is > the most scalable indexing (and merging) solution out there. > > Wolfgang. > > On Jun 2, 2014, at 10:33 AM, Vineet Mishra <clearmido...@gmail.com> wrote: > > > Hi Erick, > > > > Thanks for your mail, please let me go through with my use case. > > I am having around 20-40 Billion Records to index with each record is > > having around 200-400 fields, the data is sensor data so it can be easily > > stored in Integer or Float. Now to index this huge amount of data I am > > going with the indexing through EmbeddedSolrServer which was working fine > > but I was looking out for a way to move these generated indexes to > > different shards possibly without copying pasting it to each machines but > > some other approach as to submit this indexes to some shard and let the > > shard take care of it distributing it over leader and replica. > > I want to mention one more thing, as I started indexing with > EmbeddedSolrServer > > it went fine for some million of starting documents but there after the > > indexing speed is pathetically slow, it indexed around 20GB in a day and > > just have indexed 9 GB in another 2 days. > > Any indexing optimization approach also requested. > > > > Hope this makes things much clearer. > > Looking forward to soon hear from you. > > > > Thanks and Regards! > > > > > > On Fri, May 30, 2014 at 9:09 PM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > >> You can copy to the shards and use the mergindexes command, the > >> MapReduceIndexerTool follows that approach. > >> > >> But really, what is the higher-level use-case you're trying to support? > >> This feels a little like an XY problem. You could do things like > >> 1> index to a different collection then use collection aliasing to > switch > >> 2> just re-index to the current collection. > >> 3> use the MapReduceIndexerTool (admittedly it needs Hadoop). > >> > >> All in all, it feels like you're doing work you don't need to do. But > >> that's a guess since you haven't told us what the use-case is. > >> > >> Best, > >> Erick > >> > >> > >> On Thu, May 29, 2014 at 7:22 AM, Otis Gospodnetic < > >> otis.gospodne...@gmail.com> wrote: > >> > >>> Hi, > >>> > >>> On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra <clearmido...@gmail.com > >>>> wrote: > >>> > >>>> Hi All, > >>>> > >>>> Has anyone tried with building Offline indexes with EmbeddedSolrServer > >>> and > >>>> posting it to Shards. > >>>> > >>> > >>> What do you mean by "posting it to shards"? How is that different than > >>> copying them manually to the right location in FS? Could you please > >>> elaborate? > >>> > >>> Otis > >>> -- > >>> Performance Monitoring * Log Analytics * Search Analytics > >>> Solr & Elasticsearch Support * http://sematext.com/ > >>> > >>> > >>> > >>>> FYI, I am done building the indexes but looking out for a way to post > >>> these > >>>> index files on shards. > >>>> Copying the indexes manually to each shard's replica is possible and > is > >>>> working fine but I don't want to go with that approach. > >>>> > >>>> Thanks! > >>>> > >>> > >> > >