I hope I am in the context of this mailing list, Thanks in advance.
A little background I learned computers with 6800 machine assembly. With decades of RDBMS jumping into the Solr/Hadoop/Hbase is still a pilgrimage through hell. I think I didn't need to learn hadoop or hbase. So, I have a personal project that I think could go viral. This means I have boatloads of uncertainties to deal with because there seems to be no real guidelines on scaling and hardware selection. I cannot just pull out my calculator. Ironically I have a similar problem in RDBs. The calculator shows I am bordering on stupidity. Seeing as how I am also spending other peoples money as well as my own you can guess I am nervous. I will try to speak Apache now. I have 4 cores that are completely flat, ie; I will never join them. Two will cores almost never change. One will be large. Extremely large. This large core has a schema of 11 fields 7 of which I need to store. It seems crazy to offload this to hbase. Am I crazy? These indexes will be updated weekly and regularly. Daily. As it stands I plan to deploy a dedicated Zookeeper ensemble of three servers that scale vertically to insanity but minimal HW config, single quad Xeon processor on a dual socket board 2 16G strips. Intel SSD I'm planning 5 quadcore Solr boxes all Intel SSD drives. >From what I have read Intel makes the only SSD drive that supports caching in RAID 0 and some people say they're happening. So I'm thinking 5 and alive. 3 servers on JBODs and two on RAID 0. I'm having a tough time not doing VMware on this. It's funny when I reflect on going to Xen, KVM, VMware. Oh well, knowledge be damned, I'm back in the iron age lol. These indexes do not need to be backed up so why should I care. I only need to worry about my crawler's DB so I'm not worried for backups. It's all in my comfort zone. Ahhhhh! So SolrCloud is a petri dish! I hear people yelling jump in the water's fine! I still want to VM. Am I better off in the land of tarballs and configs? Can I use Linux volume manager? My primary concern is should I use hbase. Everything is screaming no at me. It just looks like a useless abstraction to me. I think I am a pure Solr project after grasping Hadoop/Hbase to some degree. At the end of the day I know $h1t3. I'm only one guy so if I can skip Hadoop/Hbase management I will. I think I fit the criteria for just being a search engine. I almost wish I never did the Solr/Nutch/Hbase tutorial. Any criticism or comments appreciated. :-)