I hope I am in the context of this mailing list,

Thanks in advance.

A little background

I learned computers with 6800 machine assembly. With decades of RDBMS
jumping into the Solr/Hadoop/Hbase is still a pilgrimage through hell. I
think I didn't need to learn hadoop or hbase.

So, I have a personal project that I think could go viral. This means I
have boatloads of uncertainties to deal with because there seems to be no
real guidelines on scaling and hardware selection. I cannot just pull out
my calculator. Ironically I have a similar problem in RDBs. The calculator
shows I am bordering on stupidity. Seeing as how I am also spending other
peoples money as well as my own you can guess I am nervous.

I will try to speak Apache now.

I have 4 cores that are completely flat, ie; I will never join them. Two
will cores almost never change. One will be large. Extremely large.

This large core has a schema of 11 fields 7 of which I need to store. It
seems crazy to offload this to hbase. Am I crazy? These indexes will be
updated weekly and regularly. Daily.

As it stands I plan to deploy a dedicated Zookeeper ensemble of three
servers that scale vertically to insanity but minimal HW config, single
quad Xeon processor on a dual socket board 2 16G strips. Intel SSD

I'm planning 5 quadcore Solr boxes all Intel SSD drives.

>From what I have read Intel makes the only SSD drive that supports caching
in RAID 0 and some people say they're happening. So I'm thinking 5 and
alive. 3 servers on JBODs and two on RAID 0.

I'm having a tough time not doing VMware on this. It's funny when I reflect
on going to Xen, KVM, VMware. Oh well, knowledge be damned, I'm back in the
iron age lol. These indexes do not need to be backed up so why should I
care. I only need to worry about my crawler's DB so I'm not worried for
backups.  It's all in my comfort zone.

Ahhhhh! So SolrCloud is a petri dish! I hear people yelling jump in the
water's fine! I still want to VM. Am I better off in the land of tarballs
and configs? Can I use Linux volume manager?

My primary concern is should I use hbase. Everything is screaming no at me.
It just looks like a useless abstraction to me. I think I am a pure Solr
project after grasping Hadoop/Hbase to some degree. At the end of the day I
know $h1t3. I'm only one guy so if I can skip Hadoop/Hbase management I
will. I think I fit the criteria for just being a search engine.

I almost wish I never did the Solr/Nutch/Hbase tutorial.

Any criticism or comments appreciated.

:-)

Reply via email to