Re: CASSANDRA-1472 (bitmap indexes)

2010-11-12 Thread dragos cernahoschi
I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
indexes: time out/succeed on the same queries.

By the way, the insert of my data set with KEYS_BITMAP is much faster than
KEYS (about 5.5 times) and less gc intensive.

Dragos

On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood  wrote:

> Interesting, thanks for the info.
>
> Perhaps the limitation is that index queries involving multiple clauses are
> currently implemented using brute-force filtering rather than an index join?
> The bitmap indexes have native support for this type of join, but it's not
> being used yet.
>
> To confirm: have you tried the same scenario with KEYS indexes? They use
> the same codepath for multiple index expressions, and should experience the
> same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging
> enabled, to ensure that we aren't going into some kind of infinite loop?
>
> Thanks for the help,
> Stu
>
> -Original Message-
> From: "dragos cernahoschi" 
> Sent: Tuesday, November 9, 2010 11:50am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> I'm running the query on three columns with cardinalities: 22, 17 and 10.
> Interesting, if combining columns with cardinalities:
>
> 22 + 17 => no exception
> 22 + 10 => no exception
> 10 + 17 => timed out exception
> 22 + 17 + 10 => timed out exception
>
>
> On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood  wrote:
>
> > Can you tell me a little bit about your key distribution? How many unique
> > values are indexed (the cardinality)?
> >
> > Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> > secondary indexes will perform terribly for high cardinality datasets.
> >
> > Thanks!
> >
> >
> > -Original Message-
> > From: "dragos cernahoschi" 
> > Sent: Tuesday, November 9, 2010 10:14am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > compaction thread suffered the same problem of "too many open files" and
> > couldn't do any compaction.
> >
> > But I'm still not able to run my tests: TimedOutException :(
> >
> > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood  wrote:
> >
> > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > > compaction? By default, Cassandra strives to keep the sstable count
> below
> > > ~32, since accesses to separate sstables require seeks.
> > >
> > > In this case, the query will seek 500 times to check the secondary
> index
> > > for each sstable: if it finds matches it will need to seek to find them
> > in
> > > the primary index, and seek again for the data file.
> > >
> > > -Original Message-
> > > From: "dragos cernahoschi" 
> > > Sent: Tuesday, November 9, 2010 5:33am
> > > To: dev@cassandra.apache.org
> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >
> > > There are about 500 SSTables (12GB of data including index data,
> > > statistics...) The source data file had about 3GB/26 million rows.
> > >
> > > I only test with EQ expressions for now.
> > >
> > > Increasing the file limit resolved the problem, but now I'm getting
> > > TimedOutException(s) from thrift when "querying" even with slice size
> of
> > 1.
> > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such
> a
> > > test?
> > >
> > > I really have some interesting sets of data to test indexes with and I
> > want
> > > to make a comparison between ordinary indexes and bitmap indexes.
> > >
> > > Thank you,
> > > Dragos
> > >
> > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood 
> wrote:
> > >
> > > > Dragos,
> > > >
> > > > How many SSTables did you have on disk, and were any of your index
> > > > expressions GT(E)/LT(E)?
> > > >
> > > > I expect that you are bumping into a limitation of the current
> > > > implementation: it opens up to 128 file-handles per SSTable in the
> > worst
> > > > case for a GT/LT query (one per index bucket).
> > > >
> > > > A future version might remove that requirement, but for now, you
> should
> > > > probably bump the file handle limit on your machine to at least 2^16.
> > > >
> > > > Thanks,
> > > > Stu
> > > >
> > > >
> > > > -Original Message-
> > > > From: "dragos cernahoschi" 
> > > > Sent: Monday, November 8, 2010 10:05am
> > > > To: dev@cassandra.apache.org
> > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > > >
> > > > Hi,
> > > >
> > > > I've got an exception during the following test:
> > > >
> > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> > > >
> > > > test scenario:
> > > > - 1 column family
> > > > - about 15 columns
> > > > - 7 indexed columns (bitmap)
> > > > - 26 million rows (insert operation went fine)
> > > > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > > > (count:
> > > > 100)
> > > > - got the following exception:
> > > >
> > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal
> > exception
> > > in
> > > > thread Thread[ReadStage:3,5,ma

Build failed in Hudson: Cassandra #594

2010-11-12 Thread Apache Hudson Server
See 

Changes:

[jbellis] merge from 0.7

[eevans] remove (unused) jvm opts from cassandra.in.sh

Patch by eevans for CASSANDRA-1726

--
[...truncated 1652 lines...]
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.BloomFilterTrackerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.367 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.LazilyCompactedRowTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 3.078 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.sstable.LegacySSTableTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.71 sec
[junit] 
[junit] - Standard Error -
[junit]  WARN 15:24:35,433 Invalid file '.svn' in data directory 

[junit] -  ---
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.sstable.SSTableReaderTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.286 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.sstable.SSTableTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 6.826 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.sstable.SSTableWriterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.781 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.io.util.BufferedRandomAccessFileTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.107 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.locator.DynamicEndpointSnitchTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.217 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.locator.NetworkTopologyStrategyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.489 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: 
org.apache.cassandra.locator.OldNetworkTopologyStrategyTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.421 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: 
org.apache.cassandra.locator.ReplicationStrategyEndpointCacheTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.737 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.locator.SimpleStrategyTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.768 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.locator.TokenMetadataTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.495 sec
[junit] 
[junit] Cobertura: Loaded information on 979 classes.
[junit] Cobertura: Saved information on 979 classes.
[junit] Testsuite: org.apache.cassandra.service.AntiEntropyServiceTest
[junit] Exception in thread "Thread-5" java.lang.RuntimeException: 
java.lang.NullPointerException
[junit] at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
[junit] at 
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:162)
[junit] at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:66)
[junit] at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:75)
[junit] Caused by: java.lang.NullPointerException
[junit] at 
org.apache.cassandra.service.AntiEntropyService.completedRequest(AntiEntropyService.java:133)
[junit] at 
org.ap

Re: CASSANDRA-1472 (bitmap indexes)

2010-11-12 Thread Stu Hood
Great, thanks for the variable Dragos: I'm fairly sure I broke this in the
refactoring I did in 1472 to fit in a second index type.


On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
dragos.cernahos...@gmail.com> wrote:

> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
> indexes: time out/succeed on the same queries.
>
> By the way, the insert of my data set with KEYS_BITMAP is much faster than
> KEYS (about 5.5 times) and less gc intensive.
>
> Dragos
>
> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood  wrote:
>
> > Interesting, thanks for the info.
> >
> > Perhaps the limitation is that index queries involving multiple clauses
> are
> > currently implemented using brute-force filtering rather than an index
> join?
> > The bitmap indexes have native support for this type of join, but it's
> not
> > being used yet.
> >
> > To confirm: have you tried the same scenario with KEYS indexes? They use
> > the same codepath for multiple index expressions, and should experience
> the
> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
> logging
> > enabled, to ensure that we aren't going into some kind of infinite loop?
> >
> > Thanks for the help,
> > Stu
> >
> > -Original Message-
> > From: "dragos cernahoschi" 
> > Sent: Tuesday, November 9, 2010 11:50am
> > To: dev@cassandra.apache.org
> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >
> > I'm running the query on three columns with cardinalities: 22, 17 and 10.
> > Interesting, if combining columns with cardinalities:
> >
> > 22 + 17 => no exception
> > 22 + 10 => no exception
> > 10 + 17 => timed out exception
> > 22 + 17 + 10 => timed out exception
> >
> >
> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood  wrote:
> >
> > > Can you tell me a little bit about your key distribution? How many
> unique
> > > values are indexed (the cardinality)?
> > >
> > > Until the OrBiC projection I mention on 1472 is implemented, the bitmap
> > > secondary indexes will perform terribly for high cardinality datasets.
> > >
> > > Thanks!
> > >
> > >
> > > -Original Message-
> > > From: "dragos cernahoschi" 
> > > Sent: Tuesday, November 9, 2010 10:14am
> > > To: dev@cassandra.apache.org
> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > >
> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
> > > compaction thread suffered the same problem of "too many open files"
> and
> > > couldn't do any compaction.
> > >
> > > But I'm still not able to run my tests: TimedOutException :(
> > >
> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood 
> wrote:
> > >
> > > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> > > > compaction? By default, Cassandra strives to keep the sstable count
> > below
> > > > ~32, since accesses to separate sstables require seeks.
> > > >
> > > > In this case, the query will seek 500 times to check the secondary
> > index
> > > > for each sstable: if it finds matches it will need to seek to find
> them
> > > in
> > > > the primary index, and seek again for the data file.
> > > >
> > > > -Original Message-
> > > > From: "dragos cernahoschi" 
> > > > Sent: Tuesday, November 9, 2010 5:33am
> > > > To: dev@cassandra.apache.org
> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> > > >
> > > > There are about 500 SSTables (12GB of data including index data,
> > > > statistics...) The source data file had about 3GB/26 million rows.
> > > >
> > > > I only test with EQ expressions for now.
> > > >
> > > > Increasing the file limit resolved the problem, but now I'm getting
> > > > TimedOutException(s) from thrift when "querying" even with slice size
> > of
> > > 1.
> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for
> such
> > a
> > > > test?
> > > >
> > > > I really have some interesting sets of data to test indexes with and
> I
> > > want
> > > > to make a comparison between ordinary indexes and bitmap indexes.
> > > >
> > > > Thank you,
> > > > Dragos
> > > >
> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood 
> > wrote:
> > > >
> > > > > Dragos,
> > > > >
> > > > > How many SSTables did you have on disk, and were any of your index
> > > > > expressions GT(E)/LT(E)?
> > > > >
> > > > > I expect that you are bumping into a limitation of the current
> > > > > implementation: it opens up to 128 file-handles per SSTable in the
> > > worst
> > > > > case for a GT/LT query (one per index bucket).
> > > > >
> > > > > A future version might remove that requirement, but for now, you
> > should
> > > > > probably bump the file handle limit on your machine to at least
> 2^16.
> > > > >
> > > > > Thanks,
> > > > > Stu
> > > > >
> > > > >
> > > > > -Original Message-
> > > > > From: "dragos cernahoschi" 
> > > > > Sent: Monday, November 8, 2010 10:05am
> > > > > To: dev@cassandra.apache.org
> > > > > Subject: CASSANDRA-1472 (bitmap indexes)
> > > > >
> > > > > Hi,
> > > > >
> > > > > I've got an exception during the following test:
> > > > >
> > > > 

[VOTE RESULTS] was: [VOTE] 0.6.8 RC2

2010-11-12 Thread Eric Evans
On Thu, 2010-11-11 at 17:39 -0600, Eric Evans wrote:
> SVN:
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-...@r1034172
> 0.6.8 artifacts: http://people.apache.org/~eevans
> 
> In the first go 'round, most people seemed happy to shorten this vote to
> 24 hours, so unless there are any objections, 24 hours it is.

24 hours later, 5 binding +1s and no -1s.  It is so.  I'll get the
artifacts up.

Thanks.

-- 
Eric Evans
eev...@rackspace.com