Can you tell me a little bit about your key distribution? How many unique 
values are indexed (the cardinality)?

Until the OrBiC projection I mention on 1472 is implemented, the bitmap 
secondary indexes will perform terribly for high cardinality datasets.

Thanks!


-----Original Message-----
From: "dragos cernahoschi" <dragos.cernahos...@gmail.com>
Sent: Tuesday, November 9, 2010 10:14am
To: dev@cassandra.apache.org
Subject: Re: CASSANDRA-1472 (bitmap indexes)

Meantime the number of SSTable(s) reduced to just 7. Initially the
compaction thread suffered the same problem of "too many open files" and
couldn't do any compaction.

But I'm still not able to run my tests: TimedOutException :(

On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood <stu.h...@rackspace.com> wrote:

> Hmm, 500 sstables is definitely a degenerate case: did you disable
> compaction? By default, Cassandra strives to keep the sstable count below
> ~32, since accesses to separate sstables require seeks.
>
> In this case, the query will seek 500 times to check the secondary index
> for each sstable: if it finds matches it will need to seek to find them in
> the primary index, and seek again for the data file.
>
> -----Original Message-----
> From: "dragos cernahoschi" <dragos.cernahos...@gmail.com>
> Sent: Tuesday, November 9, 2010 5:33am
> To: dev@cassandra.apache.org
> Subject: Re: CASSANDRA-1472 (bitmap indexes)
>
> There are about 500 SSTables (12GB of data including index data,
> statistics...) The source data file had about 3GB/26 million rows.
>
> I only test with EQ expressions for now.
>
> Increasing the file limit resolved the problem, but now I'm getting
> TimedOutException(s) from thrift when "querying" even with slice size of 1.
> Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a
> test?
>
> I really have some interesting sets of data to test indexes with and I want
> to make a comparison between ordinary indexes and bitmap indexes.
>
> Thank you,
> Dragos
>
> On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood <stu.h...@rackspace.com> wrote:
>
> > Dragos,
> >
> > How many SSTables did you have on disk, and were any of your index
> > expressions GT(E)/LT(E)?
> >
> > I expect that you are bumping into a limitation of the current
> > implementation: it opens up to 128 file-handles per SSTable in the worst
> > case for a GT/LT query (one per index bucket).
> >
> > A future version might remove that requirement, but for now, you should
> > probably bump the file handle limit on your machine to at least 2^16.
> >
> > Thanks,
> > Stu
> >
> >
> > -----Original Message-----
> > From: "dragos cernahoschi" <dragos.cernahos...@gmail.com>
> > Sent: Monday, November 8, 2010 10:05am
> > To: dev@cassandra.apache.org
> > Subject: CASSANDRA-1472 (bitmap indexes)
> >
> > Hi,
> >
> > I've got an exception during the following test:
> >
> > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04
> >
> > test scenario:
> > - 1 column family
> > - about 15 columns
> > - 7 indexed columns (bitmap)
> > - 26 million rows (insert operation went fine)
> > - thrift "query" on 3 of the indexed columns with get_indexed_slices
> > (count:
> > 100)
> > - got the following exception:
> >
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:3,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214)
> >    at
> >
> org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523)
> >    at
> >
> >
> org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many
> open
> > files)
> >    at java.io.FileInputStream.open(Native Method)
> >    at java.io.FileInputStream.<init>(FileInputStream.java:106)
> >    at
> > org.apache.avro.file.SeekableFileInput.<init>(SeekableFileInput.java:29)
> >    at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:38)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72)
> >    ... 10 more
> > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception
> in
> > thread Thread[ReadStage:2,5,main]
> > java.io.IOError: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129)
> >    at
> >
> >
> org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455)
> >    at
> >
> >
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572)
> >    at
> >
> >
> org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:49)
> >    at
> >
> >
> org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72)
> >    at
> >
> >
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082)
> >    at
> >
> >
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052)
> >    at
> >
> org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378)
> >    at
> >
> >
> org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41)
> >    at
> >
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >    at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >    at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.FileNotFoundException:
> > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open
> > files)
> >    at java.io.RandomAccessFile.open(Native Method)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:212)
> >    at java.io.RandomAccessFile.<init>(RandomAccessFile.java:98)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:142)
> >    at
> >
> >
> org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62)
> >    ... 16 more
> >
> > The same test worked fine with 1 million rows.
> >
> >
> >
>
>
>


Reply via email to