get_slice needs offset

2011-07-06 Thread Boris Yen
Hi,

It seems it was implemented before. However, due to Cassandra-286, it was
removed. I was wondering if it is possible to put it back. Because computing
the offset on the client side should be less efficient than do it on the
server side.

I am thinking this might be possible to achieve by tweaking both the
CassandraServer.multigetSliceInternal and CassandraServer.getSlice. Let us
assume there is one extra attribute called "offset" inside SliceRange
object. When the "offset" attribute exists in the SliceRange, the read
command should be create like "new SliceFromReadCommand(keyspace, key,
column_parent, range.start, range.finish, range.reversed,
range.count+range.offset)". And then, the only thing left to do is to make
the  CassandraServer.getSlice be aware of the "offset", based on offset it
should be able to return the right fragment of data.

Boris


Re: get_slice needs offset

2011-07-06 Thread Jonathan Ellis
It was removed because the "right" way to do this is to page by
last-column-seen, which can take advantage of the per-row column
index.  Offset cannot.

On Wed, Jul 6, 2011 at 3:04 AM, Boris Yen  wrote:
> Hi,
>
> It seems it was implemented before. However, due to Cassandra-286, it was
> removed. I was wondering if it is possible to put it back. Because computing
> the offset on the client side should be less efficient than do it on the
> server side.
>
> I am thinking this might be possible to achieve by tweaking both the
> CassandraServer.multigetSliceInternal and CassandraServer.getSlice. Let us
> assume there is one extra attribute called "offset" inside SliceRange
> object. When the "offset" attribute exists in the SliceRange, the read
> command should be create like "new SliceFromReadCommand(keyspace, key,
> column_parent, range.start, range.finish, range.reversed,
> range.count+range.offset)". And then, the only thing left to do is to make
> the  CassandraServer.getSlice be aware of the "offset", based on offset it
> should be able to return the right fragment of data.
>
> Boris
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Tickets for someone looking to get his or her feet wet in Cassandra internals

2011-07-06 Thread David Boxenhorn
You should include the first link in the second, maybe as a bullet in step
4, e.g.:

- If you're looking for ticket to work on, check out our low-hanging-fruit
list:
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+12310865+AND+labels+%3D+lhf


On Tue, Jul 5, 2011 at 6:31 PM, Jonathan Ellis  wrote:

> I regularly get requests for pointers to "starter" tickets.  I suggest
> we tag such tickets as LHF (for low-hanging fruit).  I've tagged four
> such to get started:
>
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+12310865+AND+labels+%3D+lhf
>
> I'll try to keep this up to date.
>
> Aspiring contributors should also see
> http://wiki.apache.org/cassandra/HowToContribute, and feel free to
> drop by #cassandra-dev on freenode.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Tickets for someone looking to get his or her feet wet in Cassandra internals

2011-07-06 Thread Jonathan Ellis
Good idea.

On Wed, Jul 6, 2011 at 8:40 AM, David Boxenhorn  wrote:
> You should include the first link in the second, maybe as a bullet in step
> 4, e.g.:
>
> - If you're looking for ticket to work on, check out our low-hanging-fruit
> list:
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+12310865+AND+labels+%3D+lhf
>
>
> On Tue, Jul 5, 2011 at 6:31 PM, Jonathan Ellis  wrote:
>
>> I regularly get requests for pointers to "starter" tickets.  I suggest
>> we tag such tickets as LHF (for low-hanging fruit).  I've tagged four
>> such to get started:
>>
>> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+12310865+AND+labels+%3D+lhf
>>
>> I'll try to keep this up to date.
>>
>> Aspiring contributors should also see
>> http://wiki.apache.org/cassandra/HowToContribute, and feel free to
>> drop by #cassandra-dev on freenode.
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Reoganizing drivers

2011-07-06 Thread Jonathan Ellis
I don't think this is working out as well as hoped.

- the git mirror won't pick up anything under drivers/
- building the Java drivers is fragile and complicated, and there's a
lot of duplication with the "main" ant build
- patches that affect both Cassandra and JDBC are cumbersome since
they have to be committed separately (e.g.
https://issues.apache.org/jira/browse/CASSANDRA-2857)

I'm inclined to think we should move it back to trunk (but not have
multiple versions for 0.8 branch).  We can still tag/branch separately
from there.

On Tue, Jun 7, 2011 at 3:08 AM, Eric Evans  wrote:
>
> Sylvain and I have been discussing release issues while here at
> buzzwords, and some of the issues are related to drivers.  Not
> surprising since that's a new concept for us, and there wasn't much
> thought given to the current organization.
>
> Because the CQL drivers are independently versioned and capable of
> releasing on their own timelines, the current location in SVN is
> suboptimal.  There are a number of reasons why, not least of which is
> that it sets the expectation that the correct version of a driver is
> whatever corresponds to the release version of Cassandra.
>
> So, we'd like to move the drivers sub-directory up one level, making it
> look something like the following:
>
> |- branches
> |- tags
> |- site
> |- drivers
> |  |- java
> |  |- py
> |  |- txpy
> |- trunk
>
> There are a few additional implied changes here as well, for example the
> JDBC driver will need its own build, and Cassandra's will need some
> minor changes as well (JDBC driver tests, release artifacts, etc).
>
> Does anyone object to this?
>
>
> --
> Eric Evans
> eev...@rackspace.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


RPMs made easy

2011-07-06 Thread Niels Basjes
Hi all,

I'm new to the realm of Cassandra and I'm just starting to get to know
the tool you've been working on.
I wanted to try it out on my laptop (CentOS 5.6) and found that
getting an RPM for installation was "hard".
Also building one from source was not quite as easy as I would like it to be.

So I fixed that ... and provided a patch via Jira
https://issues.apache.org/jira/browse/CASSANDRA-2861

With this patch in place you can simply do "ant rpm" from a fresh
checkout (assuming you have rpmbuild installed) and you get an RPM and
SRPM.

I not quite sure what your procedure is for integrating "unsolicited &
unexpected patches" like this so I'm sending this message to the dev
list.

Would you like to integrate this ? If so, do I have to do anything in
addition to what I've done so far?

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Tickets for someone looking to get his or her feet wet in Cassandra internals

2011-07-06 Thread Savino Sguera
Excellent idea indeed, I have been looking for such tickets as well :)

I take the chance to introduce myself to the list btw, I'm currently
getting my toes wet in Cassandra, developing (mostly) in python, and
willing to contribute - I'm not there yet though; will have a look at
the LHF tickets for sure.

Savino

On Tue, Jul 5, 2011 at 5:31 PM, Jonathan Ellis  wrote:
> I regularly get requests for pointers to "starter" tickets.  I suggest
> we tag such tickets as LHF (for low-hanging fruit).  I've tagged four
> such to get started:
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+12310865+AND+labels+%3D+lhf
>
> I'll try to keep this up to date.
>
> Aspiring contributors should also see
> http://wiki.apache.org/cassandra/HowToContribute, and feel free to
> drop by #cassandra-dev on freenode.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: get_slice needs offset

2011-07-06 Thread Boris Yen
I suppose what you meant is to use the last-column-seen as the "start" of
SliceRange to make another query, so that the server could use the per-row
column index to jump to the right position of a file and retrieve column
data.

Based on that, if an application/cassandra-client wants to implement a
function to let it's user has the freedom to select arbitrary pages of data
to browser. The application/cassandra-client needs to send two requests to
cassandra to accomplish this, one request is for getting the "start" of
SliceRange, the other request is to use the prepared SliceRange to retrieve
the data needed. From the perspective of application/cassandra-client, it
takes more resources to handle this type of request, because the first
request actually need to get all the columns >= "start" back. From the
perspective of cassandra, it needs to handle two request instead of one, if
we consider the consistency level, there might be more requests going on
between cluster nodes.

Actually, what I am proposing here is to use the "offset" to widen the range
of data got from a get_slice query. Everything of the internal mechanism of
the get_slice query remains the same. The "start" and "offset" could
coexist. The purpose of "offset" is to enlarge the "count" of SliceRange,
makes the "count" = "count+offset". And also "offset" will also be used to
trim data returned from the internal mechanism of a get_slice query.

Regards
Boris

On Wed, Jul 6, 2011 at 9:01 PM, Jonathan Ellis  wrote:

> It was removed because the "right" way to do this is to page by
> last-column-seen, which can take advantage of the per-row column
> index.  Offset cannot.
>
> On Wed, Jul 6, 2011 at 3:04 AM, Boris Yen  wrote:
> > Hi,
> >
> > It seems it was implemented before. However, due to Cassandra-286, it was
> > removed. I was wondering if it is possible to put it back. Because
> computing
> > the offset on the client side should be less efficient than do it on the
> > server side.
> >
> > I am thinking this might be possible to achieve by tweaking both the
> > CassandraServer.multigetSliceInternal and CassandraServer.getSlice. Let
> us
> > assume there is one extra attribute called "offset" inside SliceRange
> > object. When the "offset" attribute exists in the SliceRange, the read
> > command should be create like "new SliceFromReadCommand(keyspace, key,
> > column_parent, range.start, range.finish, range.reversed,
> > range.count+range.offset)". And then, the only thing left to do is to
> make
> > the  CassandraServer.getSlice be aware of the "offset", based on offset
> it
> > should be able to return the right fragment of data.
> >
> > Boris
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>