Generated code?

2010-06-14 Thread Masood Mortazavi
Hi,

My assumption is that what one finds in

  interface/thrift/gen-java

is actually generated code.

If so, why is it checked in as source under SVN?

(Certainly, the avro generated code doesn't seem to be checked in.)

Regards,
Masood


Re: Generated code?

2010-06-14 Thread David Strauss
On 2010-06-15 03:58, Masood Mortazavi wrote:
> Hi,
> 
> My assumption is that what one finds in
> 
>   interface/thrift/gen-java
> 
> is actually generated code.
> 
> If so, why is it checked in as source under SVN?
> 
> (Certainly, the avro generated code doesn't seem to be checked in.)
> 
> Regards,
> Masood
> 

It simplifies the end user's build process. If the code isn't in
Subversion, then you'd need to get all the Thrift dependencies and do
the generation yourself just to build Cassandra. Sure, there are other
methods that don't involve checking into Subversion, but they're more
complex.

-- 
David Strauss
   | da...@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]



signature.asc
Description: OpenPGP digital signature


Re: Generated code?

2010-06-14 Thread Masood Mortazavi
On Mon, Jun 14, 2010 at 9:04 PM, David Strauss wrote:

> On 2010-06-15 03:58, Masood Mortazavi wrote:
> > Hi,
> >
> > My assumption is that what one finds in
> >
> >   interface/thrift/gen-java
> >
> > is actually generated code.
> >
> > If so, why is it checked in as source under SVN?
> >
> > (Certainly, the avro generated code doesn't seem to be checked in.)
> >
> > Regards,
> > Masood
> >
>
> It simplifies the end user's build process. If the code isn't in
> Subversion, then you'd need to get all the Thrift dependencies and do
> the generation yourself just to build Cassandra. Sure, there are other
> methods that don't involve checking into Subversion, but they're more
> complex.
>


Thank you very much for explaining this. It helps me understand the
reasoning.

Out of curiosity, I'm wondering whether those dependencies are any more than
one or more jar files in the lib. The lib is already loaded with many other
jar files . . .

(I'm not a thrift expert but did work on RMI in the JDK, some years go, so I
can guess what may be needed to generate the code. Avro, in Cassandra, seems
to have been able to get away with some jar inclusion in lib. Having one
system for Avro and quite another for Thrift seems a bit odd but maybe I'm
missing something larger.)

- m.


Replication Factor and Data Centers

2010-06-14 Thread Masood Mortazavi
Is the clearer interpretation of this statement (in
conf/datacenters.properties) given anywhere else?

# The sum of all the datacenter replication factor values should equal
# the replication factor of the keyspace (i.e. sum(dc_rf) = RF)

# keyspace\:datacenter=replication factor
Keyspace1\:DC1=3
Keyspace1\:DC2=2
Keyspace1\:DC3=1

Does the above example configuration imply that Keyspace1 has a RF of 6, and
that of these 3 will go to DC1, 2 to DC2 and 1 to DC3?

What will happen if datacenters.properties and cassandra-rack.properties are
simply empty?

- m.


Reviewing . . . RackAwareStrategy.java . . . ( rev 954657 )

2010-06-14 Thread Masood Mortazavi
Hi,

Ran into this as I was going through the new config files for data centers
and racks.
(I may have some comments on those configuration models but will send them
later.)

Turning to RackAwareStrategy.java:

The comment on the top of RackAwareStrategy says:

/*
 * This Replication Strategy returns the nodes responsible for a given
 * key but respects rack awareness. It places one replica in a
 * different data center from the first (if there is any such data center),
 * and remaining replicas in different racks in the same datacenter as
 * the first.
 */

However, the code -- as it is written today -- *seems* to be actually doing
something like the following:

/*
 * This Replication Strategy returns the nodes responsible for a given
 * key but "respects" rack awareness. It places one replica in a
 * different data center from the first (if there is any such data center),
 * and *one* replica in different rack but in the same data center
 * (if there is any such rack), and it spreads the remaining replicas
 * on nodes along the ring, distinct from the first two non-primary
replicas.
 */

It may make sense to clarify this important semantic difference by updating
the commment (along the above lines) to better reflect to code.

Alternatively, the code inside the first while loop in
calculateNaturalEndpoints can be changed to implement some other semantics
that would be more suitable.

In general, with the introduction of data center configurations, the
semantics of this class need to be clarified so the strategy for placing
endpoints on "Set endpoints" can be implemented accordingly.

There are other issues to think about. For example, for quorum write
(consistency.quorum) to work faster, shouldn't the first replicas be as
close as possible (i.e. on the same rack)?  The whole point of choosing this
level of consistency is to improve performance. Right?

I hope this helps, and I hope I've not missed something completely obvious.

Best regards,
- m.


Max number of connections

2010-06-14 Thread Lev Stesin
Hi,

How many connections does one node support? Is it configurable property? Thanks.

-- 
Lev


Re: Max number of connections

2010-06-14 Thread Brandon Williams
On Tue, Jun 15, 2010 at 1:19 AM, Lev Stesin  wrote:

> Hi,
>
> How many connections does one node support? Is it configurable property?
> Thanks.
>

As many as a node can reasonably handle in a thread-per-connection model:
many thousands on a decent OS.

-Brandon