Can it read distributed lucene indexes in SolrCloud?
On Jul 26, 2012 7:11 PM, "Lance Norskog" <goks...@gmail.com> wrote:

> Mahout includes a file reader for Lucene indexes. It will read from
> HDFS or local disks.
>
> On Thu, Jul 26, 2012 at 6:57 PM, Darren Govoni <dar...@ontrenet.com>
> wrote:
> > You raise an interesting possibility. A map/reduce solr handler over
> > solrcloud.......
> >
> > On Thu, 2012-07-26 at 18:52 -0700, Trung Pham wrote:
> >
> >> I think the performance should be close to Hadoop running on HDFS, if
> >> somehow Hadoop job can directly read the Solr Index file while executing
> >> the job on the local solr node.
> >>
> >> Kindna like how HBase and Cassadra integrate with Hadoop.
> >>
> >> Plus, we can run the map reduce job on a standby Solr4 cluster.
> >>
> >> This way, the documents in Solr will be our primary source of truth.
> And we
> >> have the ability to run near real time search queries and analytics on
> it.
> >> No need to export data around.
> >>
> >> Solr4 is becoming a very interesting solution to many web scale
> problems.
> >> Just missing the map/reduce component. :)
> >>
> >> On Thu, Jul 26, 2012 at 3:01 PM, Darren Govoni <dar...@ontrenet.com>
> wrote:
> >>
> >> > Of course you can do it, but the question is whether this will produce
> >> > the performance results you expect.
> >> > I've seen talk about this in other forums, so you might find some
> prior
> >> > work here.
> >> >
> >> > Solr and HDFS serve somewhat different purposes. The key issue would
> be
> >> > if your map and reduce code
> >> > overloads the Solr endpoint. Even using SolrCloud, I believe all
> >> > requests will have to go through a single
> >> > URL (to be routed), so if you have thousands of map/reduce jobs all
> >> > running simultaneously, the question is whether
> >> > your Solr is architected to handle that amount of throughput.
> >> >
> >> >
> >> > On Thu, 2012-07-26 at 14:55 -0700, Trung Pham wrote:
> >> >
> >> > > Is it possible to run map reduce jobs directly on Solr4?
> >> > >
> >> > > I'm asking this because I want to use Solr4 as the primary storage
> >> > engine.
> >> > > And I want to be able to run near real time analytics against it as
> well.
> >> > > Rather than export solr4 data out to a hadoop cluster.
> >> >
> >> >
> >> >
> >
> >
>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Reply via email to