For HDFS, failover, sharding you may want to use Solr with Katta.
There's an issue open at:
http://issues.apache.org/jira/browse/SOLR-1301

Near realtime search needs to be added incrementally to Solr.  Today I
wouldn't recommend it.

On Wed, Sep 2, 2009 at 10:14 AM, Zhenyu Zhong<zhongresea...@gmail.com> wrote:
> Dear all,
>
> I am very interested in Solr and would like to deploy Solr for distributed
> indexing and searching. I hope you are the right Solr expert who can help me
> out.
> However, I have concerns about the scalability and management overhead of
> Solr. I am wondering if anyone could give me some guidance on Solr.
>
> Basically, I have the following questions,
> For indexing
> 1.  How does Solr handle the distributed indexing? It seems Solr generates
> index on a single box. What if the index is huge and can't sit on one box?
> 2.  Is it possible for Solr to generate index in HDFS?
>
> For searching
> 3.  Solr provides Master/Slave framework. How does the Solr distribute the
> search? Does Solr know which index/shard to deliver the query to? Or does it
> have to do a multicast query to all the nodes?
>
> For fault tolerance
> 4. Does Solr handle the management overhead automatically? suppose master
> goes down, how does Solr recover the master in order to get the latest index
> updates?
>    Do we have to code ourselves to handle this?
> 5. Suppose master goes down immediately after the index updates, while the
> updates haven't been replicated to the slaves, data loss seems to happen.
> Does Solr have any mechanism to deal with that?
>
> Performance of real-time index updating
> 6. How is the performance of this realtime index updating? Suppose we are
> updating a million records for a huge index with billions of records
> frequently. Can Solr provides a reasonable performance and low latency on
> that? (Probably it is related to Lucene library)
>
>
>
>
> I would be very appreciated if you can give us some guidance.
>
> Best,
> edward
>

Reply via email to