For HDFS, failover, sharding you may want to use Solr with Katta. There's an issue open at: http://issues.apache.org/jira/browse/SOLR-1301
Near realtime search needs to be added incrementally to Solr. Today I wouldn't recommend it. On Wed, Sep 2, 2009 at 10:14 AM, Zhenyu Zhong<zhongresea...@gmail.com> wrote: > Dear all, > > I am very interested in Solr and would like to deploy Solr for distributed > indexing and searching. I hope you are the right Solr expert who can help me > out. > However, I have concerns about the scalability and management overhead of > Solr. I am wondering if anyone could give me some guidance on Solr. > > Basically, I have the following questions, > For indexing > 1. How does Solr handle the distributed indexing? It seems Solr generates > index on a single box. What if the index is huge and can't sit on one box? > 2. Is it possible for Solr to generate index in HDFS? > > For searching > 3. Solr provides Master/Slave framework. How does the Solr distribute the > search? Does Solr know which index/shard to deliver the query to? Or does it > have to do a multicast query to all the nodes? > > For fault tolerance > 4. Does Solr handle the management overhead automatically? suppose master > goes down, how does Solr recover the master in order to get the latest index > updates? > Do we have to code ourselves to handle this? > 5. Suppose master goes down immediately after the index updates, while the > updates haven't been replicated to the slaves, data loss seems to happen. > Does Solr have any mechanism to deal with that? > > Performance of real-time index updating > 6. How is the performance of this realtime index updating? Suppose we are > updating a million records for a huge index with billions of records > frequently. Can Solr provides a reasonable performance and low latency on > that? (Probably it is related to Lucene library) > > > > > I would be very appreciated if you can give us some guidance. > > Best, > edward >