Hello,
> I am not sure if this is the right forum for this question, but it would > be great if I could be pointed in the right direction. We have been using a > combination of MySql and Solr for all our company full text and query > needs. But as our customers have grow so has the amount of data and MySql > is just not proving to be a right option for storing/querying. > > I have been looking at Solr Cloud and it looks really impressive, but and > not sure if we should give away our storage system. So, I have been > exploring DataStax but a commercial option is out of question. So we were > thinking of using hbase to store the data and at the same time index the > data into Solr cloud, but for many reasons this design doesn't seem > convincing (Also seen basic of Lilly). > > 1) Would it be recommended to just user Solr cloud with multiple > replication or hbase-solr seems like good option > If you trust SolrCloud with replication and keep all your fields stored then you could live without an external DB. At this point I personally would still want an external DB. Whether HBase is the right DB for the job I can't tell because I don't know anything about your data, volume, access patterns, etc. I can tell you that HBase does scale well - we have tables with many billions of rows stored in it for instance. > 2) How much strain would be to keep both Solr Shard and Hbase node on the > same machine > HBase loves memory. So does Solr. They both dislike disk IO (who doesn't!). Solr can use a lot of CPU for indexing/searching, depending on the volume. HBase RegionServers can use a lot of CPU if you run MapReuce on data in HBase. > 3) if there a calculation on what kind of machine configuration would I > need to store 500-1000 million records. Most of these with be social data > (Twitter/facebook/blogs etc) and how many shards. > No recipe here, unfortunately. You'd have to experiment and test, do load and performance testing, etc. If you need help with Solr + HBase, we happen to have a lot of experience with both and have even used them together for some of our clients. Otis -- Performance Monitoring - http://sematext.com/spm/index.html Search Analytics - http://sematext.com/search-analytics/index.html