Sorry BTW in case what I wrote below is unclear, is the concern that the Hadoop InputFormat (as an example) will need to have a separate InputSplit (which corresponds to a "SELECT foo FROM bar WHERE token(baz) > min AND token(baz) < max") for every vnode instead of for every token?
(I assume this would be an increase of several orders of magnitude in the number of input splits.) Best regards, Clint On Wed, Jul 2, 2014 at 6:04 PM, Clint Kelly <clint.ke...@gmail.com> wrote: > Hi Tupshin, > > Thanks for the quick reply. Is the performance concern from the > Hadoop integration needing to set up separate SELECT operations for > all of the unique vnode ranges? > > Best regards, > Clint > > On Wed, Jul 2, 2014 at 6:00 PM, Tupshin Harper <tups...@tupshin.com> wrote: >> For performance reasons, you shouldn't enable vnodes on any Cassandra/DSE >> datacenter that is doing hadoop analytics workloads. Other DCs in the >> cluster can use vnodes. >> >> -Tupshin >> >> On Jul 2, 2014 5:50 PM, "Clint Kelly" <clint.ke...@gmail.com> wrote: >>> >>> Hi everyone, >>> >>> Apologies if this is the incorrect forum for a question like this. >>> >>> I am going to set up a mixed-workload (real-time and analytics) >>> installation of DSE 4.5 using bring-your-own Hadoop (BYOH). We are >>> using CDH 5.0. >>> >>> I was reviewing the installation instructions, and I came across the >>> following comment here: >>> >>> http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/byoh/byohInstall.html >>> >>> "6. Observe workload isolation best practices. Do not enable vnodes." >>> >>> Does this mean that the use of vnodes is not compatible with a >>> mixed-workload installation? Or with BYOH? I am confused why this >>> would be the case. >>> >>> If anyone can clarify, I would greatly appreciate it. >>> >>> Best regards, >>> Clint