Hi see comments inline below… On Jun 2, 2014, at 6:49 AM, Vineet Mishra <clearmido...@gmail.com> wrote:
> Hi Wolfgang, > > Thanks for your response, can you quote some running example of > MapReduceIndexerTool > for indexing through csv files. > If you are referring to > http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_mapreduceindexertool.html?scroll=csug_topic_6_1 > > I had a few points to clarify, > *what is the morphline? See http://kitesdk.org/docs/current/kite-morphlines/index.html and http://kitesdk.org/docs/current/kite-morphlines/morphlinesReferenceGuide.html#/readCSV > *Is it necessary to use morphline for indexing, if yes how to create one? Yes, it requires a morphline (which is basically a chain of plugins), and you can plug in any custom java code and custom commands into a morphline. > *can Index only reside on HDFS and not on LocalFS? The implementation is only on HDFS. > *what is the minimum cdh version supported for it? CDH 4 or CDH 5. Wolfgang. > > Looking forward to your response. > > Thanks! > > > On Mon, Jun 2, 2014 at 2:24 PM, Wolfgang Hoschek <whosc...@cloudera.com> > wrote: > >> Sounds like you should consider using MapReduceIndexerTool. AFAIK, this is >> the most scalable indexing (and merging) solution out there. >> >> Wolfgang. >> >> On Jun 2, 2014, at 10:33 AM, Vineet Mishra <clearmido...@gmail.com> wrote: >> >>> Hi Erick, >>> >>> Thanks for your mail, please let me go through with my use case. >>> I am having around 20-40 Billion Records to index with each record is >>> having around 200-400 fields, the data is sensor data so it can be easily >>> stored in Integer or Float. Now to index this huge amount of data I am >>> going with the indexing through EmbeddedSolrServer which was working fine >>> but I was looking out for a way to move these generated indexes to >>> different shards possibly without copying pasting it to each machines but >>> some other approach as to submit this indexes to some shard and let the >>> shard take care of it distributing it over leader and replica. >>> I want to mention one more thing, as I started indexing with >> EmbeddedSolrServer >>> it went fine for some million of starting documents but there after the >>> indexing speed is pathetically slow, it indexed around 20GB in a day and >>> just have indexed 9 GB in another 2 days. >>> Any indexing optimization approach also requested. >>> >>> Hope this makes things much clearer. >>> Looking forward to soon hear from you. >>> >>> Thanks and Regards! >>> >>> >>> On Fri, May 30, 2014 at 9:09 PM, Erick Erickson <erickerick...@gmail.com >>> >>> wrote: >>> >>>> You can copy to the shards and use the mergindexes command, the >>>> MapReduceIndexerTool follows that approach. >>>> >>>> But really, what is the higher-level use-case you're trying to support? >>>> This feels a little like an XY problem. You could do things like >>>> 1> index to a different collection then use collection aliasing to >> switch >>>> 2> just re-index to the current collection. >>>> 3> use the MapReduceIndexerTool (admittedly it needs Hadoop). >>>> >>>> All in all, it feels like you're doing work you don't need to do. But >>>> that's a guess since you haven't told us what the use-case is. >>>> >>>> Best, >>>> Erick >>>> >>>> >>>> On Thu, May 29, 2014 at 7:22 AM, Otis Gospodnetic < >>>> otis.gospodne...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> On Wed, May 28, 2014 at 4:25 AM, Vineet Mishra <clearmido...@gmail.com >>>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Has anyone tried with building Offline indexes with EmbeddedSolrServer >>>>> and >>>>>> posting it to Shards. >>>>>> >>>>> >>>>> What do you mean by "posting it to shards"? How is that different than >>>>> copying them manually to the right location in FS? Could you please >>>>> elaborate? >>>>> >>>>> Otis >>>>> -- >>>>> Performance Monitoring * Log Analytics * Search Analytics >>>>> Solr & Elasticsearch Support * http://sematext.com/ >>>>> >>>>> >>>>> >>>>>> FYI, I am done building the indexes but looking out for a way to post >>>>> these >>>>>> index files on shards. >>>>>> Copying the indexes manually to each shard's replica is possible and >> is >>>>>> working fine but I don't want to go with that approach. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >>>> >> >>