If SolrCloud meets your needs, without Hadoop, then there's no real reason to introduce the added complexity.
There are a bunch of problems that do _not_ work well with SolrCloud over non-Hadoop file systems. For those problems, the combination of SolrCloud and Hadoop make tackling them possible. Best, Erick On Thu, Aug 7, 2014 at 3:55 AM, Ali Nazemian <alinazem...@gmail.com> wrote: > Thank you very much. But why we should go for solr distributed with hadoop? > There is already solrCloud which is pretty applicable in the case of big > index. Is there any advantage for sending indexes over map reduce that > solrCloud can not provide? > Regards. > > > On Wed, Aug 6, 2014 at 9:09 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > > > bq: Are you aware of Cloudera search? I know they provide an integrated > > Hadoop ecosystem. > > > > What Cloudera Search does via the MapReduceIndexerTool (MRIT) is create N > > sub-indexes for > > each shard in the M/R paradigm via EmbeddedSolrServer. Eventually, these > > sub-indexes for > > each shard are merged (perhaps through some number of levels) in the > reduce > > phase and > > maybe merged into a live Solr instance (--go-live). You'll note that this > > tool requires the > > address of the ZK ensemble from which it can get the network topology, > > configuration files, > > all that rot. If you don't use the --go-live option, the output is still > a > > Solr index, it's just that > > the index for each shard is left in a specific directory on HDFS. Being > on > > HDFS allows > > this kind of M/R paradigm for massively parallel indexing operations, and > > perhaps massively > > complex analysis. > > > > Nowhere is there any low-level non-Solr manipulation of the indexes. > > > > The Flume fork just writes directly to the Solr nodes. It knows about the > > ZooKeeper > > ensemble and the collection too and communicates via SolrJ I'm pretty > sure. > > > > As far as integrating with HDFS, you're right, HA is part of the package. > > As far as using > > the Solr indexes for analysis, well you can write anything you want to > use > > the Solr indexes > > from anywhere in the M/R world and have them available from anywhere in > the > > cluster. There's > > no real need to even have Solr running, you could use the output from > MRIT > > and access the > > sub-shards with the EmbeddedSolrServer if you wanted, leaving out all the > > pesky servlet > > container stuff. > > > > bq: So why we go for HDFS in the case of analysis if we want to use SolrJ > > for this purpose? > > What is the point? > > > > Scale and data access in a nutshell. In the HDFS world, you can scale > > pretty linearly > > with the number of nodes you can rack together. > > > > Frankly though, if your data set is small enough to fit on a single > machine > > _and_ you can get > > through your analysis in a reasonable time (reasonable here is up to > you), > > then HDFS > > is probably not worth the hassle. But in the big data world where we're > > talking petabyte scale, > > having HDFS as the underpinning opens up possibilities for working on > data > > that were > > difficult/impossible with Solr previously. > > > > Best, > > Erick > > > > > > > > On Tue, Aug 5, 2014 at 9:37 PM, Ali Nazemian <alinazem...@gmail.com> > > wrote: > > > > > Dear Erick, > > > I remembered some times ago, somebody asked about what is the point of > > > modify Solr to use HDFS for storing indexes. As far as I remember > > somebody > > > told him integrating Solr with HDFS has two advantages. 1) having > hadoop > > > replication and HA. 2) using indexes and Solr documents for other > > purposes > > > such as Analysis. So why we go for HDFS in the case of analysis if we > > want > > > to use SolrJ for this purpose? What is the point? > > > Regards. > > > > > > > > > On Wed, Aug 6, 2014 at 8:59 AM, Ali Nazemian <alinazem...@gmail.com> > > > wrote: > > > > > > > Dear Erick, > > > > Hi, > > > > Thank you for you reply. Yeah I am aware that SolrJ is my last > option. > > I > > > > was thinking about raw I/O operation. So according to your reply > > probably > > > > it is not applicable somehow. What about the Lily project that > Michael > > > > mentioned? Is that consider SolrJ too? Are you aware of Cloudera > > search? > > > I > > > > know they provide an integrated Hadoop ecosystem. Do you know what is > > > their > > > > suggestion? > > > > Best regards. > > > > > > > > > > > > > > > > On Wed, Aug 6, 2014 at 12:28 AM, Erick Erickson < > > erickerick...@gmail.com > > > > > > > > wrote: > > > > > > > >> What you haven't told us is what you mean by "modify the > > > >> index outside Solr". SolrJ? Using raw Lucene? Trying to modify > > > >> things by writing your own codec? Standard Java I/O operations? > > > >> Other? > > > >> > > > >> You could use SolrJ to connect to an existing Solr server and > > > >> both read and modify at will form your M/R jobs. But if you're > > > >> thinking of trying to write/modify the segment files by raw I/O > > > >> operations, good luck! I'm 99.99% certain that's going to cause > > > >> you endless grief. > > > >> > > > >> Best, > > > >> Erick > > > >> > > > >> > > > >> On Tue, Aug 5, 2014 at 9:55 AM, Ali Nazemian <alinazem...@gmail.com > > > > > >> wrote: > > > >> > > > >> > Actually I am going to do some analysis on the solr data using map > > > >> reduce. > > > >> > For this purpose it might be needed to change some part of data or > > add > > > >> new > > > >> > fields from outside solr. > > > >> > > > > >> > > > > >> > On Tue, Aug 5, 2014 at 5:51 PM, Shawn Heisey <s...@elyograg.org> > > > wrote: > > > >> > > > > >> > > On 8/5/2014 7:04 AM, Ali Nazemian wrote: > > > >> > > > I changed solr 4.9 to write index and data on hdfs. Now I am > > going > > > >> to > > > >> > > > connect to those data from the outside of solr for changing > some > > > of > > > >> the > > > >> > > > values. Could somebody please tell me how that is possible? > > > Suppose > > > >> I > > > >> > am > > > >> > > > using Hbase over hdfs for do these changes. > > > >> > > > > > >> > > I don't know how you could safely modify the index without a > > Lucene > > > >> > > application or another instance of Solr, but if you do manage to > > > >> modify > > > >> > > the index, simply reloading the core or restarting Solr should > > cause > > > >> it > > > >> > > to pick up the changes. Either you would need to make sure that > > Solr > > > >> > > never modifies the index, or you would need some way of > > coordinating > > > >> > > updates so that Solr and the other application would never try > to > > > >> modify > > > >> > > the index at the same time. > > > >> > > > > > >> > > Thanks, > > > >> > > Shawn > > > >> > > > > > >> > > > > > >> > > > > >> > > > > >> > -- > > > >> > A.Nazemian > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > A.Nazemian > > > > > > > > > > > > > > > > -- > > > A.Nazemian > > > > > > > > > -- > A.Nazemian >