New SecondaryIndex expression=IndexOperator.LIKE

2011-06-03 Thread Joseph Stein
Hey folks, I was contemplating having a LIKE type search on Secondary
Indexes.

LIKE_S, S_LIKE and S_LIKE_S (similiar to LIKE '%eg', 'eg% and '%eg%'

I am not sure if this has already been discussed or even an existing JIRA or
maybe something maybe I could contribute myself?

It looks like the best way to-do this would be to add some additional
properties to IndexExpression so we know we want to-do a LIKE check

pseudo coding my thoughts taking this from ColumnFamilyStore

int v =
data.metadata().getValueValidator(expression.column_name).compare(column.value(),
expression.value);
if (!satisfies(v, expression.op))
return false;

and make it something like this

if (expression.LIKE) {
 if (!column.MakeItAStringValue().equalsIgnoreCase(express.value))  //this
would actually fall into a switch for S_LIKE, LIKE_S, S_LIKE_S
return false;
else {
int v =
data.metadata().getValueValidator(expression.column_name).compare(column.value(),
expression.value);
if (!satisfies(v, expression.op))
return false;
}

am i on the right track?  smoking crack?

let me know please

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop
*/


CASSANDRA-2495

2011-06-24 Thread Joseph Stein
https://issues.apache.org/jira/browse/CASSANDRA-2495

Is anyone working on this?  If so can I help? If not would then I would like
to contribute!

I am going to jump into the code later tonight and start thinking about it
because at some point this is going to really bite us Medialets

what is the process for ideas on this topic just comment on the JIRA ticket?
 or better to kick it around on this list? irc?  just asking for best
practices

thanks!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: CASSANDRA-2495

2011-06-27 Thread Joseph Stein
Hey Jake, thanks!

Is there any label like "newbie" that are things folks can jump in on to
search for when trying to first contribute getting into the swing of things?
 Or should i just look at anything not assigned?

Slyvain: just read your update to CASSANDRA-2495 will follow-up in the
ticket with questions/thoughts.Also the TTL counter ticket is
interesting granted both might be a big first swallow but want to jump in!

On Fri, Jun 24, 2011 at 9:19 AM, Jake Luciani  wrote:

> Hi Joe,
>
> JIRA and IRC.  also see http://wiki.apache.org/cassandra/HowToContribute
>
> -Jake
>
> On Fri, Jun 24, 2011 at 8:52 AM, Joseph Stein  wrote:
>
> > https://issues.apache.org/jira/browse/CASSANDRA-2495
> >
> > Is anyone working on this?  If so can I help? If not would then I would
> > like
> > to contribute!
> >
> > I am going to jump into the code later tonight and start thinking about
> it
> > because at some point this is going to really bite us Medialets
> >
> > what is the process for ideas on this topic just comment on the JIRA
> > ticket?
> >  or better to kick it around on this list? irc?  just asking for best
> > practices
> >
> > thanks!
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
>
>
>
> --
> http://twitter.com/tjake
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/


No module named cql - nosetests test/system/

2011-06-27 Thread Joseph Stein
So this is probably the first of a few noob questions and I am happy to
update the Wiki or try to help contribute fixes (if it is not something
stupid I am doing of course) as I learn the answers :)

So by following the http://wiki.apache.org/cassandra/HowToContribute

I am getting this error 


==
ERROR: Failure: ImportError (No module named cql)
--
Traceback (most recent call last):
  File
"/Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg/nose/loader.py",
line 382, in loadTestsFromName
addr.filename, addr.module)
  File
"/Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg/nose/importer.py",
line 39, in importFromPath
return self.importFromDir(dir_path, fqname)
  File
"/Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg/nose/importer.py",
line 86, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
  File
"/Users/joestein/source-apache-cassandra/cassandra-trunk/test/system/test_cql.py",
line 26, in 
import cql
ImportError: No module named cql

I followed the steps

new-host-3:source-apache-cassandra joestein$ sudo easy_install nose
Password:
Searching for nose
Best match: nose 0.11.4
Processing nose-0.11.4-py2.6.egg
nose 0.11.4 is already the active version in easy-install.pth
Installing nosetests-2.6 script to /usr/local/bin
Installing nosetests script to /usr/local/bin

Using /Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg
Processing dependencies for nose
Finished processing dependencies for nose

new-host-3:cassandra-trunk joestein$ ant gen-thrift-py
Buildfile: /Users/joestein/source-apache-cassandra/cassandra-trunk/build.xml

gen-thrift-py:
 [echo] Generating Thrift Python code from
/Users/joestein/source-apache-cassandra/cassandra-trunk/interface/cassandra.thrift


BUILD SUCCESSFUL
Total time: 2 seconds

I have not done anything yet with the CQL but will try to figure out later
what I am doing wrong unless someone knows that would be great.

Thanks!!!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: No module named cql - nosetests test/system/

2011-06-27 Thread Joseph Stein
Thanks Jonathan, that worked great.  I also had to build the source so I
added both of these steps to the Wiki (hopefully that is ok was not sure
the etiquette)

On Mon, Jun 27, 2011 at 3:47 PM, Jonathan Ellis  wrote:

> install cql first.  check out from
> https://svn.apache.org/repos/asf/cassandra/drivers; cd drivers/py;
> python setup.py build; sudo python setup.py install
>
> On Mon, Jun 27, 2011 at 2:40 PM, Joseph Stein  wrote:
> > So this is probably the first of a few noob questions and I am happy to
> > update the Wiki or try to help contribute fixes (if it is not something
> > stupid I am doing of course) as I learn the answers :)
> >
> > So by following the http://wiki.apache.org/cassandra/HowToContribute
> >
> > I am getting this error 
> >
> >
> 
> > ==
> > ERROR: Failure: ImportError (No module named cql)
> > --
> > Traceback (most recent call last):
> >  File
> > "/Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg/nose/loader.py",
> > line 382, in loadTestsFromName
> >addr.filename, addr.module)
> >  File
> >
> "/Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg/nose/importer.py",
> > line 39, in importFromPath
> >return self.importFromDir(dir_path, fqname)
> >  File
> >
> "/Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg/nose/importer.py",
> > line 86, in importFromDir
> >mod = load_module(part_fqname, fh, filename, desc)
> >  File
> >
> "/Users/joestein/source-apache-cassandra/cassandra-trunk/test/system/test_cql.py",
> > line 26, in 
> >import cql
> > ImportError: No module named cql
> >
> > I followed the steps
> >
> > new-host-3:source-apache-cassandra joestein$ sudo easy_install nose
> > Password:
> > Searching for nose
> > Best match: nose 0.11.4
> > Processing nose-0.11.4-py2.6.egg
> > nose 0.11.4 is already the active version in easy-install.pth
> > Installing nosetests-2.6 script to /usr/local/bin
> > Installing nosetests script to /usr/local/bin
> >
> > Using /Library/Python/2.6/site-packages/nose-0.11.4-py2.6.egg
> > Processing dependencies for nose
> > Finished processing dependencies for nose
> >
> > new-host-3:cassandra-trunk joestein$ ant gen-thrift-py
> > Buildfile:
> /Users/joestein/source-apache-cassandra/cassandra-trunk/build.xml
> >
> > gen-thrift-py:
> > [echo] Generating Thrift Python code from
> >
> /Users/joestein/source-apache-cassandra/cassandra-trunk/interface/cassandra.thrift
> > 
> >
> > BUILD SUCCESSFUL
> > Total time: 2 seconds
> >
> > I have not done anything yet with the CQL but will try to figure out
> later
> > what I am doing wrong unless someone knows that would be great.
> >
> > Thanks!!!
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/


CounterColumn as a double

2011-06-27 Thread Joseph Stein
So has anyone considered using the CounterColumn for summing?

I wanted to-do this over the weekend until I realized it was only a long :(
so using it for things like duration (as an example for me this would have
been great to keep track of aggregate durations of ad impressions) are not
possible (or total costs when processing business workflows, etc,etc).

I thought this might be a little more the speed of a first contribution too
:) and also helps out with more functionality since a lot of real time
analytics will need double.

Let me know, I think it is a good feature.

Implementing it not sure we would want to break the thrift interface I would
suggest that I would create another interface for the double value?

Under the hood of the thrift interface I was thinking of creating a
CounterValue class and then setting the lValue or the dValue depending on
which thrift function was called. I can update the thrift, add a sister
function and re-work the entire code path of long CounterColumn.value into
CounterValue CounterColumn.value.

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: CounterColumn as a double

2011-06-27 Thread Joseph Stein
I will give that a shot, seems that it will work fantastically, thanks!

I will keep trolling JIRA then for something I feel I can get my feet wet
with and contribute then.

On Mon, Jun 27, 2011 at 7:33 PM, Jason Fager  wrote:

> Longs and Doubles are both 64-bit values and are pretty easily
> convertible.  Check out Double.doubleToLongBits and
> Double.longBitsToDouble in the JDK; you can also read more about the
> details of the conversion and get some pointers to some code in a post
> I wrote last year:
> http://jasonfager.com/770-lexi-sortable-number-strings/  (the emphasis
> is on using doubles in key strings, but it should cover what you
> need).
>
>
>
>
>
> On Mon, Jun 27, 2011 at 7:13 PM, Joseph Stein  wrote:
> > So has anyone considered using the CounterColumn for summing?
> >
> > I wanted to-do this over the weekend until I realized it was only a long
> :(
> > so using it for things like duration (as an example for me this would
> have
> > been great to keep track of aggregate durations of ad impressions) are
> not
> > possible (or total costs when processing business workflows, etc,etc).
> >
> > I thought this might be a little more the speed of a first contribution
> too
> > :) and also helps out with more functionality since a lot of real time
> > analytics will need double.
> >
> > Let me know, I think it is a good feature.
> >
> > Implementing it not sure we would want to break the thrift interface I
> would
> > suggest that I would create another interface for the double value?
> >
> > Under the hood of the thrift interface I was thinking of creating a
> > CounterValue class and then setting the lValue or the dValue depending on
> > which thrift function was called. I can update the thrift, add a sister
> > function and re-work the entire code path of long CounterColumn.value
> into
> > CounterValue CounterColumn.value.
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
> >
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/


Re: CounterColumn as a double

2011-06-27 Thread Joseph Stein
hmmm, well Jason it is not as accurate as I would have thought at first and
the increments on the long are whacked (which now that I think about it more
makes sense since a +1 of the bits as long for the double would not
necessarly represent the +1 on the double).

So I am setting the increment to be

1.23

which comes back as 1.3213044541238036E308

and then another increment of 1.23 comes back as 10.3599987

so for the increment (which is just +value to the long which would not
provide the right shift)

even if under the hood I keep the thrift interface as long somehow the 1.23
vs 1.321 is a big difference when you have billions of them

so I think it goes back to my original idea/proposition.   Any one have
issue?  any +1 ?  should I create a JIRA and get to it?

On Mon, Jun 27, 2011 at 7:39 PM, Joseph Stein  wrote:

> I will give that a shot, seems that it will work fantastically, thanks!
>
> I will keep trolling JIRA then for something I feel I can get my feet wet
> with and contribute then.
>
> On Mon, Jun 27, 2011 at 7:33 PM, Jason Fager  wrote:
>
>> Longs and Doubles are both 64-bit values and are pretty easily
>> convertible.  Check out Double.doubleToLongBits and
>> Double.longBitsToDouble in the JDK; you can also read more about the
>> details of the conversion and get some pointers to some code in a post
>> I wrote last year:
>> http://jasonfager.com/770-lexi-sortable-number-strings/  (the emphasis
>> is on using doubles in key strings, but it should cover what you
>> need).
>>
>>
>>
>>
>>
>> On Mon, Jun 27, 2011 at 7:13 PM, Joseph Stein  wrote:
>> > So has anyone considered using the CounterColumn for summing?
>> >
>> > I wanted to-do this over the weekend until I realized it was only a long
>> :(
>> > so using it for things like duration (as an example for me this would
>> have
>> > been great to keep track of aggregate durations of ad impressions) are
>> not
>> > possible (or total costs when processing business workflows, etc,etc).
>> >
>> > I thought this might be a little more the speed of a first contribution
>> too
>> > :) and also helps out with more functionality since a lot of real time
>> > analytics will need double.
>> >
>> > Let me know, I think it is a good feature.
>> >
>> > Implementing it not sure we would want to break the thrift interface I
>> would
>> > suggest that I would create another interface for the double value?
>> >
>> > Under the hood of the thrift interface I was thinking of creating a
>> > CounterValue class and then setting the lValue or the dValue depending
>> on
>> > which thrift function was called. I can update the thrift, add a sister
>> > function and re-work the entire code path of long CounterColumn.value
>> into
>> > CounterValue CounterColumn.value.
>> >
>> > /*
>> > Joe Stein
>> > http://www.linkedin.com/in/charmalloc
>> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
>> > */
>> >
>>
>
>
>
> --
>
> /*
> Joe Stein
> http://www.linkedin.com/in/charmalloc
> Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> */
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/


thrift generated java changes

2011-07-02 Thread Joseph Stein
So I am working on https://issues.apache.org/jira/browse/CASSANDRA-2833

And when I generate the cassandra.thrift file I am getting weird results and
differences

Should I be modifying the CounterColumn.java by hand?

I am using thrift 0.5.0 and doing

thrift -gen java cassandra.thrift from the command line

some of the issues (as an example)

-tmpMap.put(_Fields.VALUE, new
org.apache.thrift.meta_data.FieldMetaData("value",
org.apache.thrift.TFieldRequirementType.REQUIRED,
-new
org.apache.thrift.meta_data.FieldValueMetaData(org.apache.thrift.protocol.TType.I64)));
+Map<_Fields, FieldMetaData> tmpMap = new EnumMap<_Fields,
FieldMetaData>(_Fields.class);
+tmpMap.put(_Fields.NAME, new FieldMetaData("name",
TFieldRequirementType.REQUIRED,

   public CounterColumn setName(byte[] name) {
-setName(name == null ? (ByteBuffer)null : ByteBuffer.wrap(name));
+setName(ByteBuffer.wrap(name));
 return this;
   }

-  /** Returns true if field name is set (has been assigned a value) and
false otherwise */
+  /** Returns true if field name is set (has been asigned a value) and
false otherwise */

this last ones makes me suspect I am using the wrong thrift version and
maybe not the right commands?
http://wiki.apache.org/cassandra/InstallThriftleads me to what I did
but if there is something different or wrong with
what I am doing please let me know and I can update the wiki and get back on
track.

and all I did to the cassandra.thrift was:

-2: required i64 value
+   2: optional i64 value,
+   3: optional double operand


Thanks!

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/


Re: thrift generated java changes

2011-07-02 Thread Joseph Stein
using 0.6.1 it went better, almost perfect

the diff is still removing this code though and not sure why

   @Override
   public int hashCode() {
-HashCodeBuilder builder = new HashCodeBuilder();
-
-boolean present_name = true && (isSetName());
-builder.append(present_name);
-if (present_name)
-  builder.append(name);
-
-boolean present_value = true;
-builder.append(present_value);
-if (present_value)
-  builder.append(value);
-
-return builder.toHashCode();
+return 0;
   }

right now I just manually put this function back in since it is the only
thing (well the license too of course) that the generation changed from what
is in source besides my change required.


On Sat, Jul 2, 2011 at 12:26 PM, Jake Luciani  wrote:

> 0.8 uses thrift 0.6
>
>
>
> On Jul 2, 2011, at 11:40 AM, Joseph Stein  wrote:
>
> > So I am working on https://issues.apache.org/jira/browse/CASSANDRA-2833
> >
> > And when I generate the cassandra.thrift file I am getting weird results
> and
> > differences
> >
> > Should I be modifying the CounterColumn.java by hand?
> >
> > I am using thrift 0.5.0 and doing
> >
> > thrift -gen java cassandra.thrift from the command line
> >
> > some of the issues (as an example)
> >
> > -tmpMap.put(_Fields.VALUE, new
> > org.apache.thrift.meta_data.FieldMetaData("value",
> > org.apache.thrift.TFieldRequirementType.REQUIRED,
> > -new
> >
> org.apache.thrift.meta_data.FieldValueMetaData(org.apache.thrift.protocol.TType.I64)));
> > +Map<_Fields, FieldMetaData> tmpMap = new EnumMap<_Fields,
> > FieldMetaData>(_Fields.class);
> > +tmpMap.put(_Fields.NAME, new FieldMetaData("name",
> > TFieldRequirementType.REQUIRED,
> >
> >   public CounterColumn setName(byte[] name) {
> > -setName(name == null ? (ByteBuffer)null : ByteBuffer.wrap(name));
> > +setName(ByteBuffer.wrap(name));
> > return this;
> >   }
> >
> > -  /** Returns true if field name is set (has been assigned a value) and
> > false otherwise */
> > +  /** Returns true if field name is set (has been asigned a value) and
> > false otherwise */
> >
> > this last ones makes me suspect I am using the wrong thrift version and
> > maybe not the right commands?
> > http://wiki.apache.org/cassandra/InstallThriftleads me to what I did
> > but if there is something different or wrong with
> > what I am doing please let me know and I can update the wiki and get back
> on
> > track.
> >
> > and all I did to the cassandra.thrift was:
> >
> > -2: required i64 value
> > +   2: optional i64 value,
> > +   3: optional double operand
> >
> >
> > Thanks!
> >
> > /*
> > Joe Stein
> > http://www.linkedin.com/in/charmalloc
> > Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> > */
>



-- 

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
*/


can't reproduce duplicate SELECT rows - CASSANDRA-2717

2011-07-05 Thread Joseph Stein
I tried to reproduce
https://issues.apache.org/jira/browse/CASSANDRA-2717but not able to :(
maybe already fixed from side affect change? =8^|

It was my first time ever using CQL (took a few minutes to figure it out to
download drivers as the way to access cqlsh>) so maybe I am just doing
something wrong.

cqlsh> CREATE KEYSPACE test with strategy_class = 'SimpleStrategy' and
strategy_options:replication_factor=1;
cqlsh> use test;
cqlsh> CREATE COLUMNFAMILY users (
   ... key varchar PRIMARY KEY,
   ... full_name varchar,
   ... birth_date int,
   ... state varchar
   ... );
cqlsh> CREATE INDEX ON users (birth_date);
cqlsh> CREATE INDEX ON users (state);
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES
('bsanderson', 'Brandon Sanderson', 1975, 'UT');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES
('prothfuss', 'Patrick Rothfuss', 1973, 'WI');
cqlsh> INSERT INTO users (key, full_name, birth_date, state) VALUES
('htayler', 'Howard Tayler', 1968, 'UT');
cqlsh> SELECT key, state FROM users;
key | state |
 bsanderson |UT |
  prothfuss |WI |
htayler |UT |

cqlsh> SELECT key, state FROM users where key = 'htayler';
 key | state |
 htayler |UT |

cqlsh> SELECT key, state FROM users where key = 'htayler' and key =
'htayler';
Bad Request: SELECTs can contain only by by-key clause


> not sure why this is erroring. I was expecting 2 results
like the ticket said

cqlsh> SELECT key, state FROM users where state = 'UT';
key | state |
 bsanderson |UT |
htayler |UT |

cqlsh> SELECT key, state FROM users where state = 'UT' and state = 'UT';
key | state |
 bsanderson |UT |
htayler |UT |

> I kind of expected this to give me 2 rows back also

cqlsh> SELECT key, state FROM users where state = 'UT' and state ='WI';
cqlsh>

> no rows as expected

/*
Joe Stein
http://www.linkedin.com/in/charmalloc
Twitter: @allthingshadoop 
*/