Hudson build is back to normal : Cassandra #595

2010-11-13 Thread Apache Hudson Server
See 




Re: CASSANDRA-1472 (bitmap indexes)

2010-11-13 Thread Jonathan Ellis
Is it worth testing 0.7-branch-without-1472 to make sure of that?

On Fri, Nov 12, 2010 at 10:28 AM, Stu Hood  wrote:
> Great, thanks for the variable Dragos: I'm fairly sure I broke this in the
> refactoring I did in 1472 to fit in a second index type.
>
>
> On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
> dragos.cernahos...@gmail.com> wrote:
>
>> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
>> indexes: time out/succeed on the same queries.
>>
>> By the way, the insert of my data set with KEYS_BITMAP is much faster than
>> KEYS (about 5.5 times) and less gc intensive.
>>
>> Dragos
>>
>> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood  wrote:
>>
>> > Interesting, thanks for the info.
>> >
>> > Perhaps the limitation is that index queries involving multiple clauses
>> are
>> > currently implemented using brute-force filtering rather than an index
>> join?
>> > The bitmap indexes have native support for this type of join, but it's
>> not
>> > being used yet.
>> >
>> > To confirm: have you tried the same scenario with KEYS indexes? They use
>> > the same codepath for multiple index expressions, and should experience
>> the
>> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
>> logging
>> > enabled, to ensure that we aren't going into some kind of infinite loop?
>> >
>> > Thanks for the help,
>> > Stu
>> >
>> > -Original Message-
>> > From: "dragos cernahoschi" 
>> > Sent: Tuesday, November 9, 2010 11:50am
>> > To: dev@cassandra.apache.org
>> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
>> >
>> > I'm running the query on three columns with cardinalities: 22, 17 and
>> > 10.
>> > Interesting, if combining columns with cardinalities:
>> >
>> > 22 + 17 => no exception
>> > 22 + 10 => no exception
>> > 10 + 17 => timed out exception
>> > 22 + 17 + 10 => timed out exception
>> >
>> >
>> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood  wrote:
>> >
>> > > Can you tell me a little bit about your key distribution? How many
>> unique
>> > > values are indexed (the cardinality)?
>> > >
>> > > Until the OrBiC projection I mention on 1472 is implemented, the
>> > > bitmap
>> > > secondary indexes will perform terribly for high cardinality datasets.
>> > >
>> > > Thanks!
>> > >
>> > >
>> > > -Original Message-
>> > > From: "dragos cernahoschi" 
>> > > Sent: Tuesday, November 9, 2010 10:14am
>> > > To: dev@cassandra.apache.org
>> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
>> > >
>> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
>> > > compaction thread suffered the same problem of "too many open files"
>> and
>> > > couldn't do any compaction.
>> > >
>> > > But I'm still not able to run my tests: TimedOutException :(
>> > >
>> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood 
>> wrote:
>> > >
>> > > > Hmm, 500 sstables is definitely a degenerate case: did you disable
>> > > > compaction? By default, Cassandra strives to keep the sstable count
>> > below
>> > > > ~32, since accesses to separate sstables require seeks.
>> > > >
>> > > > In this case, the query will seek 500 times to check the secondary
>> > index
>> > > > for each sstable: if it finds matches it will need to seek to find
>> them
>> > > in
>> > > > the primary index, and seek again for the data file.
>> > > >
>> > > > -Original Message-
>> > > > From: "dragos cernahoschi" 
>> > > > Sent: Tuesday, November 9, 2010 5:33am
>> > > > To: dev@cassandra.apache.org
>> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
>> > > >
>> > > > There are about 500 SSTables (12GB of data including index data,
>> > > > statistics...) The source data file had about 3GB/26 million rows.
>> > > >
>> > > > I only test with EQ expressions for now.
>> > > >
>> > > > Increasing the file limit resolved the problem, but now I'm getting
>> > > > TimedOutException(s) from thrift when "querying" even with slice
>> > > > size
>> > of
>> > > 1.
>> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for
>> such
>> > a
>> > > > test?
>> > > >
>> > > > I really have some interesting sets of data to test indexes with and
>> I
>> > > want
>> > > > to make a comparison between ordinary indexes and bitmap indexes.
>> > > >
>> > > > Thank you,
>> > > > Dragos
>> > > >
>> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood 
>> > wrote:
>> > > >
>> > > > > Dragos,
>> > > > >
>> > > > > How many SSTables did you have on disk, and were any of your index
>> > > > > expressions GT(E)/LT(E)?
>> > > > >
>> > > > > I expect that you are bumping into a limitation of the current
>> > > > > implementation: it opens up to 128 file-handles per SSTable in the
>> > > worst
>> > > > > case for a GT/LT query (one per index bucket).
>> > > > >
>> > > > > A future version might remove that requirement, but for now, you
>> > should
>> > > > > probably bump the file handle limit on your machine to at least
>> 2^16.
>> > > > >
>> > > > > Thanks,
>> > > > > Stu
>> > > > >
>> > > > >
>> > > > > -Original Message-
>>

Re: CASSANDRA-1472 (bitmap indexes)

2010-11-13 Thread Stu Hood
> Is it worth testing 0.7-branch-without-1472 to make sure of that?
Dragos: if you have time, this would be helpful. If you already have a KEYS
index created, you shouldn't need to re-load the data, as the file format
hasn't changed.

Thanks,
Stu

On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis  wrote:

> Is it worth testing 0.7-branch-without-1472 to make sure of that?
>
> On Fri, Nov 12, 2010 at 10:28 AM, Stu Hood  wrote:
> > Great, thanks for the variable Dragos: I'm fairly sure I broke this in
> the
> > refactoring I did in 1472 to fit in a second index type.
> >
> >
> > On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi <
> > dragos.cernahos...@gmail.com> wrote:
> >
> >> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP
> >> indexes: time out/succeed on the same queries.
> >>
> >> By the way, the insert of my data set with KEYS_BITMAP is much faster
> than
> >> KEYS (about 5.5 times) and less gc intensive.
> >>
> >> Dragos
> >>
> >> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood 
> wrote:
> >>
> >> > Interesting, thanks for the info.
> >> >
> >> > Perhaps the limitation is that index queries involving multiple
> clauses
> >> are
> >> > currently implemented using brute-force filtering rather than an index
> >> join?
> >> > The bitmap indexes have native support for this type of join, but it's
> >> not
> >> > being used yet.
> >> >
> >> > To confirm: have you tried the same scenario with KEYS indexes? They
> use
> >> > the same codepath for multiple index expressions, and should
> experience
> >> the
> >> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG
> >> logging
> >> > enabled, to ensure that we aren't going into some kind of infinite
> loop?
> >> >
> >> > Thanks for the help,
> >> > Stu
> >> >
> >> > -Original Message-
> >> > From: "dragos cernahoschi" 
> >> > Sent: Tuesday, November 9, 2010 11:50am
> >> > To: dev@cassandra.apache.org
> >> > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >> >
> >> > I'm running the query on three columns with cardinalities: 22, 17 and
> >> > 10.
> >> > Interesting, if combining columns with cardinalities:
> >> >
> >> > 22 + 17 => no exception
> >> > 22 + 10 => no exception
> >> > 10 + 17 => timed out exception
> >> > 22 + 17 + 10 => timed out exception
> >> >
> >> >
> >> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood 
> wrote:
> >> >
> >> > > Can you tell me a little bit about your key distribution? How many
> >> unique
> >> > > values are indexed (the cardinality)?
> >> > >
> >> > > Until the OrBiC projection I mention on 1472 is implemented, the
> >> > > bitmap
> >> > > secondary indexes will perform terribly for high cardinality
> datasets.
> >> > >
> >> > > Thanks!
> >> > >
> >> > >
> >> > > -Original Message-
> >> > > From: "dragos cernahoschi" 
> >> > > Sent: Tuesday, November 9, 2010 10:14am
> >> > > To: dev@cassandra.apache.org
> >> > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >> > >
> >> > > Meantime the number of SSTable(s) reduced to just 7. Initially the
> >> > > compaction thread suffered the same problem of "too many open files"
> >> and
> >> > > couldn't do any compaction.
> >> > >
> >> > > But I'm still not able to run my tests: TimedOutException :(
> >> > >
> >> > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood 
> >> wrote:
> >> > >
> >> > > > Hmm, 500 sstables is definitely a degenerate case: did you disable
> >> > > > compaction? By default, Cassandra strives to keep the sstable
> count
> >> > below
> >> > > > ~32, since accesses to separate sstables require seeks.
> >> > > >
> >> > > > In this case, the query will seek 500 times to check the secondary
> >> > index
> >> > > > for each sstable: if it finds matches it will need to seek to find
> >> them
> >> > > in
> >> > > > the primary index, and seek again for the data file.
> >> > > >
> >> > > > -Original Message-
> >> > > > From: "dragos cernahoschi" 
> >> > > > Sent: Tuesday, November 9, 2010 5:33am
> >> > > > To: dev@cassandra.apache.org
> >> > > > Subject: Re: CASSANDRA-1472 (bitmap indexes)
> >> > > >
> >> > > > There are about 500 SSTables (12GB of data including index data,
> >> > > > statistics...) The source data file had about 3GB/26 million rows.
> >> > > >
> >> > > > I only test with EQ expressions for now.
> >> > > >
> >> > > > Increasing the file limit resolved the problem, but now I'm
> getting
> >> > > > TimedOutException(s) from thrift when "querying" even with slice
> >> > > > size
> >> > of
> >> > > 1.
> >> > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for
> >> such
> >> > a
> >> > > > test?
> >> > > >
> >> > > > I really have some interesting sets of data to test indexes with
> and
> >> I
> >> > > want
> >> > > > to make a comparison between ordinary indexes and bitmap indexes.
> >> > > >
> >> > > > Thank you,
> >> > > > Dragos
> >> > > >
> >> > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood 
> >> > wrote:
> >> > > >
> >> > > > > Dragos,
> >> > > > >
> >> > > > > How many SSTables did you have on d

Re: CASSANDRA-1472 (bitmap indexes)

2010-11-13 Thread dragos cernahoschi
No problem. I'll do the test on Monday.

On Nov 14, 2010 2:35 AM, "Stu Hood"  wrote:

> Is it worth testing 0.7-branch-without-1472 to make sure of that?
Dragos: if you have time, this would be helpful. If you already have a KEYS
index created, you shouldn't need to re-load the data, as the file format
hasn't changed.

Thanks,
Stu


On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis  wrote:

> Is it worth testing 0...