OK, you want to get to a go/no-go decision fast. It's actually relatively cheap to do....
Here's the long form: https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ The short form is: Take your best stab at a schema. Take your best stab at a "realistic" set of queries. Take your best stab at the query-per-second rate you expect to support. Index a bunch of data. Fire up jmeter or similar to put a load on it. Keep adding docs until it falls over. Tune. Repeat. Note, you only need a single box for this, perhaps two for a two-shard SolrCloud if you want. Now you have the following data: You can fit D documents on server X with JVM memory Y at QPS rate Q and still get 'reasonable' performance. Scaling to 10D documents means 10 shards. Scaling to 10Q QPS rates means 10 replicas/shard. Scaling to 10D documents _and_ 10Q QPS means 100 machines. You see where this is going. There are a bunch of tradeoffs, but you need to start somewhere. >From here, getting an idea of the hardware requirements for your eventual deployment is at least reasonable to estimate. You'll still come out with "between X and 2X machines" or some such, but that's a heck of a lot better than "I have no clue". As to whether to use HBase or not, "It depends" mostly on how much data we're talking about here. And whether you can store it all in Solr or use Solr for searching and HBase for serving the docs. Or... there are about a zillion variations, but until and unless you do the sizing exercise in the link above it's all pure speculation. The link above won't give you totally accurate information, but at least gives you a sense of whether what you want to do is possible given your time/money constraints... and it's cheap in terms of hardware to actually do.... Best, Erick On Tue, Nov 24, 2015 at 11:14 AM, GW <thegeofo...@gmail.com> wrote: > I hope I am in the context of this mailing list, > > Thanks in advance. > > A little background > > I learned computers with 6800 machine assembly. With decades of RDBMS > jumping into the Solr/Hadoop/Hbase is still a pilgrimage through hell. I > think I didn't need to learn hadoop or hbase. > > So, I have a personal project that I think could go viral. This means I > have boatloads of uncertainties to deal with because there seems to be no > real guidelines on scaling and hardware selection. I cannot just pull out > my calculator. Ironically I have a similar problem in RDBs. The calculator > shows I am bordering on stupidity. Seeing as how I am also spending other > peoples money as well as my own you can guess I am nervous. > > I will try to speak Apache now. > > I have 4 cores that are completely flat, ie; I will never join them. Two > will cores almost never change. One will be large. Extremely large. > > This large core has a schema of 11 fields 7 of which I need to store. It > seems crazy to offload this to hbase. Am I crazy? These indexes will be > updated weekly and regularly. Daily. > > As it stands I plan to deploy a dedicated Zookeeper ensemble of three > servers that scale vertically to insanity but minimal HW config, single > quad Xeon processor on a dual socket board 2 16G strips. Intel SSD > > I'm planning 5 quadcore Solr boxes all Intel SSD drives. > > From what I have read Intel makes the only SSD drive that supports caching > in RAID 0 and some people say they're happening. So I'm thinking 5 and > alive. 3 servers on JBODs and two on RAID 0. > > I'm having a tough time not doing VMware on this. It's funny when I reflect > on going to Xen, KVM, VMware. Oh well, knowledge be damned, I'm back in the > iron age lol. These indexes do not need to be backed up so why should I > care. I only need to worry about my crawler's DB so I'm not worried for > backups. It's all in my comfort zone. > > Ahhhhh! So SolrCloud is a petri dish! I hear people yelling jump in the > water's fine! I still want to VM. Am I better off in the land of tarballs > and configs? Can I use Linux volume manager? > > My primary concern is should I use hbase. Everything is screaming no at me. > It just looks like a useless abstraction to me. I think I am a pure Solr > project after grasping Hadoop/Hbase to some degree. At the end of the day I > know $h1t3. I'm only one guy so if I can skip Hadoop/Hbase management I > will. I think I fit the criteria for just being a search engine. > > I almost wish I never did the Solr/Nutch/Hbase tutorial. > > Any criticism or comments appreciated. > > :-)