issues CASSANDRA-1472
Hi, 1. I've tried to apply the patches for this bug. They worked except for the unit test modifications that git refused to apply. 2. After applying the patches I've run the stress.py script (with 500,000 keys). The script output seems to be fine, but the cassandra console contains the below exception. -Dcassandra.config=file:///home/dragos/Workspace/oss/cassandra/conf/cassandra.yaml -Dcassandra-foreground -ea -Xmx1280M binary_memtable_throughput_in_mb: 64 (tried with 128, 256) Is this a jvm memory configuration problem? The exception starts to appear arroung key 100,000. 10/10/29 17:54:57 INFO service.StorageService: Starting up server gossip 10/10/29 17:54:57 INFO db.ColumnFamilyStore: switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288364097273.log', position=700) 10/10/29 17:54:57 INFO db.ColumnFamilyStore: Enqueuing flush of memtable-locationi...@15696851(227 bytes, 4 operations) 10/10/29 17:54:57 INFO db.Memtable: Writing memtable-locationi...@15696851(227 bytes, 4 operations) 10/10/29 17:54:57 INFO db.Memtable: Completed flushing /home/dragos/cassandra/data/system/LocationInfo-e-1-Data.db 10/10/29 17:54:57 WARN service.StorageService: Generated random token 94710572475423860127984872289063475144. Random tokens will result in an unbalanced ring; see http://wiki.apache.org/cassandra/Operations 10/10/29 17:54:57 INFO db.ColumnFamilyStore: switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288364097273.log', position=848) 10/10/29 17:54:57 INFO db.ColumnFamilyStore: Enqueuing flush of memtable-locationi...@19141351(36 bytes, 1 operations) 10/10/29 17:54:57 INFO db.Memtable: Writing memtable-locationi...@19141351(36 bytes, 1 operations) 10/10/29 17:54:57 INFO db.Memtable: Completed flushing /home/dragos/cassandra/data/system/LocationInfo-e-2-Data.db 10/10/29 17:54:57 INFO utils.Mx4jTool: Will not load MX4J, mx4j-tools.jar is not in the classpath 10/10/29 17:54:57 INFO thrift.CassandraDaemon: Binding thrift service to localhost/127.0.0.1:9160 10/10/29 17:54:57 INFO thrift.CassandraDaemon: Using TFramedTransport with a max frame size of 15728640 bytes. 10/10/29 17:54:57 INFO thrift.CassandraDaemon: Listening for thrift clients... 10/10/29 17:55:04 INFO db.ColumnFamilyStore: switching in a fresh Memtable for Migrations at CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288364097273.log', position=12544) 10/10/29 17:55:04 INFO db.ColumnFamilyStore: Enqueuing flush of memtable-migrati...@22958990(6993 bytes, 1 operations) 10/10/29 17:55:04 INFO db.Memtable: Writing memtable-migrati...@22958990(6993 bytes, 1 operations) 10/10/29 17:55:04 INFO db.ColumnFamilyStore: switching in a fresh Memtable for Schema at CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288364097273.log', position=12544) 10/10/29 17:55:04 INFO db.ColumnFamilyStore: Enqueuing flush of memtable-sch...@29336531(2649 bytes, 3 operations) 10/10/29 17:55:05 INFO db.Memtable: Completed flushing /home/dragos/cassandra/data/system/Migrations-e-1-Data.db 10/10/29 17:55:05 INFO db.Memtable: Writing memtable-sch...@29336531(2649 bytes, 3 operations) 10/10/29 17:55:05 INFO db.Memtable: Completed flushing /home/dragos/cassandra/data/system/Schema-e-1-Data.db 10/10/29 17:55:05 INFO db.ColumnFamilyStore: read 0 from saved key cache 10/10/29 17:55:05 INFO db.ColumnFamilyStore: read 0 from saved key cache 10/10/29 17:55:05 INFO db.ColumnFamilyStore: loading row cache for Super1 of Keyspace1 10/10/29 17:55:05 INFO db.ColumnFamilyStore: completed loading (0 ms; 0 keys) row cache for Super1 of Keyspace1 10/10/29 17:55:05 INFO db.ColumnFamilyStore: loading row cache for Standard1 of Keyspace1 10/10/29 17:55:05 INFO db.ColumnFamilyStore: completed loading (0 ms; 0 keys) row cache for Standard1 of Keyspace1 10/10/29 17:55:19 INFO service.GCInspector: GC for PS MarkSweep: 255 ms, 99152 reclaimed leaving 98881416 used; max is 1442054144 10/10/29 17:55:22 INFO db.ColumnFamilyStore: switching in a fresh Memtable for Standard1 at CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288364097273.log', position=37144548) 10/10/29 17:55:22 INFO db.ColumnFamilyStore: Enqueuing flush of memtable-standa...@23934262(17798235 bytes, 348985 operations) 10/10/29 17:55:22 INFO db.Memtable: Writing memtable-standa...@23934262(17798235 bytes, 348985 operations) 10/10/29 17:55:23 INFO service.GCInspector: GC for PS MarkSweep: 357 ms, 81144 reclaimed leaving 209802568 used; max is 1437138944 10/10/29 17:55:27 ERROR service.AbstractCassandraDaemon: Fatal exception in thread Thread[FlushWriter:1,5,main] java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java
more info CASSANDRA-1472
The stress.py command: 1. python contrib/py_stress/stress.py -C 32 -x keys_bitmap 2. more info from stack trace: 10/10/29 18:15:28 INFO db.Memtable: Writing memtable-standa...@23048841(17806140 bytes, 349140 operations) 10/10/29 18:15:29 INFO service.GCInspector: GC for PS MarkSweep: 789 ms, 7178512 reclaimed leaving 320182640 used; max is 1383202816 10/10/29 18:15:31 ERROR service.AbstractCassandraDaemon: Fatal exception in thread Thread[FlushWriter:1,5,main] java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:177) at org.apache.avro.generic.GenericDatumReader.readEnum(GenericDatumReader.java:172) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:115) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105) at org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:112) at org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.open(BitmapIndexReader.java:87) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:196) at org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:178) at org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:160) at org.apache.cassandra.db.Memtable.access$1(Memtable.java:152) at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:172) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 6 more 10/10/29 18:15:38 INFO service.GCInspector: GC for PS MarkSweep: 896 ms, 72939816 reclaimed leaving 341851792 used; max is 1359740928 10/10/29 18:15:38 INFO service.GCInspector: GC for PS Scavenge: 312 ms, 188201560 reclaimed leaving 414791608 used; max is 1359740928 10/10/29 18:15:40 INFO db.ColumnFamilyStore: switching in a fresh Memtable for Standard1 at CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288365280350.log', position=111458032) 10/10/29 18:15:40 INFO db.ColumnFamilyStore: Enqueuing flush of memtable-standa...@25762909(17806140 bytes, 349140 operations) 10/10/29 18:15:40 INFO db.Memtable: Writing memtable-standa...@25762909(17806140 bytes, 349140 operations) 10/10/29 18:15:43 INFO service.GCInspector: GC for PS MarkSweep: 1309 ms, 293768 reclaimed leaving 456751520 used; max is 1349451776 10/10/29 18:15:43 INFO service.GCInspector: Pool Name Active Pending 10/10/29 18:15:43 INFO service.GCInspector: ResponseStage 0 0 10/10/29 18:15:43 INFO service.GCInspector: ReadStage 0 0 10/10/29 18:15:43 INFO service.GCInspector: ReadRepair0 0 10/10/29 18:15:43 INFO service.GCInspector: MutationStage 3249 10/10/29 18:15:43 INFO service.GCInspector: GossipStage 0 0 10/10/29 18:15:43 INFO service.GCInspector: AntientropyStage 0 0 10/10/29 18:15:43 INFO service.GCInspector: MigrationStage0 0 10/10/29 18:15:43 INFO service.GCInspector: StreamStage 0 0 10/10/29 18:15:43 INFO service.GCInspector: MemtablePostFlusher 1 3 10/10/29 18:15:43 INFO service.GCInspector: FlushWriter 1 1 10/10/29 18:15:43 INFO service.GCInspector: MiscStage 0 0 10/10/29 18:15:43 INFO service.GCInspector: FlushSorter 0 0 10/10/29 18:15:43 INFO service.GCInspector: CompactionManager n/a 0 10/10/29 18:15:43 INFO service.GCInspector: MessagingService n/a 0,0 10/10/29 18:15:43 INFO service.GCInspector: ColumnFamily Memtable ops,data Row cache size/cap Key cache size/cap 10/10/29 18:15:43 INFO service.GCInspector: Keyspace1.Super1 0,0 0/0 0/20 10/10/29 18:15:43 INFO service.GCInspector: Keyspace1.Standard1 67080,3421080
Re: more info CASSANDRA-1472
I can't apply the patches on the latest trunk. Which trunk version (checksum) did you rebased? Thank you! On Mon, Nov 1, 2010 at 6:50 AM, Stu Hood wrote: > I can't reproduce this, but I've posted a rebased version on #1472 that you > can try out. Thanks for trying it out! > > > -Original Message----- > From: "dragos cernahoschi" > Sent: Friday, October 29, 2010 10:38am > To: dev@cassandra.apache.org > Subject: more info CASSANDRA-1472 > > The stress.py command: > > 1. python contrib/py_stress/stress.py -C 32 -x keys_bitmap > > 2. more info from stack trace: > > 10/10/29 18:15:28 INFO db.Memtable: Writing > memtable-standa...@23048841(17806140 > bytes, 349140 operations) > 10/10/29 18:15:29 INFO service.GCInspector: GC for PS MarkSweep: 789 ms, > 7178512 reclaimed leaving 320182640 used; max is 1383202816 > 10/10/29 18:15:31 ERROR service.AbstractCassandraDaemon: Fatal exception in > thread Thread[FlushWriter:1,5,main] > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 1 >at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) >at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >at java.util.concurrent.FutureTask.run(FutureTask.java:138) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 >at > org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:177) >at > > org.apache.avro.generic.GenericDatumReader.readEnum(GenericDatumReader.java:172) >at > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:115) >at > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118) >at > > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) >at > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) >at > > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) >at > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) >at > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105) >at > > org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:112) >at > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.open(BitmapIndexReader.java:87) >at > > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:196) >at > > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:178) >at > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:160) >at org.apache.cassandra.db.Memtable.access$1(Memtable.java:152) >at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:172) >at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) >... 6 more > 10/10/29 18:15:38 INFO service.GCInspector: GC for PS MarkSweep: 896 ms, > 72939816 reclaimed leaving 341851792 used; max is 1359740928 > 10/10/29 18:15:38 INFO service.GCInspector: GC for PS Scavenge: 312 ms, > 188201560 reclaimed leaving 414791608 used; max is 1359740928 > 10/10/29 18:15:40 INFO db.ColumnFamilyStore: switching in a fresh Memtable > for Standard1 at > > CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288365280350.log', > position=111458032) > 10/10/29 18:15:40 INFO db.ColumnFamilyStore: Enqueuing flush of > memtable-standa...@25762909(17806140 bytes, 349140 operations) > 10/10/29 18:15:40 INFO db.Memtable: Writing > memtable-standa...@25762909(17806140 > bytes, 349140 operations) > 10/10/29 18:15:43 INFO service.GCInspector: GC for PS MarkSweep: 1309 ms, > 293768 reclaimed leaving 456751520 used; max is 1349451776 > 10/10/29 18:15:43 INFO service.GCInspector: Pool Name > Active Pending > 10/10/29 18:15:43 INFO service.GCInspector: > ResponseStage 0 0 > 10/10/29 18:15:43 INFO service.GCInspector: > ReadStage 0 0 > 10/10/29 18:15:43 INFO service.GCInspector: > ReadRepair0 0 > 10/10/29 18:15:43 INFO service.GCInspector: MutationStage > 3249 > 10/10/29 18:15:43 INFO service.GCInspector: > GossipStage 0 0 > 10/10/29 18:15:43 INFO service.GCInspector: > AntientropyStage 0 0 &
Re: more info CASSANDRA-1472
Patches applied. Exception disappeared. On Mon, Nov 1, 2010 at 7:37 PM, Stu Hood wrote: > trunk @ r1029546 > > -Original Message- > From: "dragos cernahoschi" > Sent: Monday, November 1, 2010 12:20pm > To: dev@cassandra.apache.org > Subject: Re: more info CASSANDRA-1472 > > I can't apply the patches on the latest trunk. Which trunk version > (checksum) did you rebased? > > Thank you! > > On Mon, Nov 1, 2010 at 6:50 AM, Stu Hood wrote: > > > I can't reproduce this, but I've posted a rebased version on #1472 that > you > > can try out. Thanks for trying it out! > > > > > > -Original Message- > > From: "dragos cernahoschi" > > Sent: Friday, October 29, 2010 10:38am > > To: dev@cassandra.apache.org > > Subject: more info CASSANDRA-1472 > > > > The stress.py command: > > > > 1. python contrib/py_stress/stress.py -C 32 -x keys_bitmap > > > > 2. more info from stack trace: > > > > 10/10/29 18:15:28 INFO db.Memtable: Writing > > memtable-standa...@23048841(17806140 > > bytes, 349140 operations) > > 10/10/29 18:15:29 INFO service.GCInspector: GC for PS MarkSweep: 789 ms, > > 7178512 reclaimed leaving 320182640 used; max is 1383202816 > > 10/10/29 18:15:31 ERROR service.AbstractCassandraDaemon: Fatal exception > in > > thread Thread[FlushWriter:1,5,main] > > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 1 > >at > > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > >at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > >at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >at java.lang.Thread.run(Thread.java:662) > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 > >at > > org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:177) > >at > > > > > org.apache.avro.generic.GenericDatumReader.readEnum(GenericDatumReader.java:172) > >at > > > > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:115) > >at > > > > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:118) > >at > > > > > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) > >at > > > > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) > >at > > > > > org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:142) > >at > > > > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:114) > >at > > > > > org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:105) > >at > > > > > org.apache.cassandra.io.SerDeUtils.deserializeWithSchema(SerDeUtils.java:112) > >at > > > > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.open(BitmapIndexReader.java:87) > >at > > > > > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:196) > >at > > > > > org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:178) > >at > > org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:160) > >at org.apache.cassandra.db.Memtable.access$1(Memtable.java:152) > >at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:172) > >at > > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > >... 6 more > > 10/10/29 18:15:38 INFO service.GCInspector: GC for PS MarkSweep: 896 ms, > > 72939816 reclaimed leaving 341851792 used; max is 1359740928 > > 10/10/29 18:15:38 INFO service.GCInspector: GC for PS Scavenge: 312 ms, > > 188201560 reclaimed leaving 414791608 used; max is 1359740928 > > 10/10/29 18:15:40 INFO db.ColumnFamilyStore: switching in a fresh > Memtable > > for Standard1 at > > > > > CommitLogContext(file='/home/dragos/cassandra/commitlog/CommitLog-1288365280350.log', > > position=111458032) > > 10/10/29 18:15:40 INFO db.ColumnFamilyStore: Enqueuing flush of > > memtable-standa...@25762909(17806140 bytes, 349140 operations) > > 10/10/29
CASSANDRA-1472 (bitmap indexes)
Hi, I've got an exception during the following test: test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04 test scenario: - 1 column family - about 15 columns - 7 indexed columns (bitmap) - 26 million rows (insert operation went fine) - thrift "query" on 3 of the indexed columns with get_indexed_slices (count: 100) - got the following exception: 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in thread Thread[ReadStage:3,5,main] java.io.IOError: java.io.FileNotFoundException: /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open files) at org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78) at org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226) at org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214) at org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523) at org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103) at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371) at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:106) at org.apache.avro.file.SeekableFileInput.(SeekableFileInput.java:29) at org.apache.avro.file.DataFileReader.(DataFileReader.java:38) at org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72) ... 10 more 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in thread Thread[ReadStage:2,5,main] java.io.IOError: java.io.FileNotFoundException: /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open files) at org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68) at org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129) at org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1) at org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:49) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72) at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1082) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1052) at org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1378) at org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.FileNotFoundException: /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212) at java.io.RandomAccessFile.(RandomAccessFile.java:98) at org.apache.cassandra.io.util.BufferedRandomAccessFile.(BufferedRandomAccessFile.java:142) at org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:62) ... 16 more The same test worked fine with 1 million rows.
Re: CASSANDRA-1472 (bitmap indexes)
There are about 500 SSTables (12GB of data including index data, statistics...) The source data file had about 3GB/26 million rows. I only test with EQ expressions for now. Increasing the file limit resolved the problem, but now I'm getting TimedOutException(s) from thrift when "querying" even with slice size of 1. Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a test? I really have some interesting sets of data to test indexes with and I want to make a comparison between ordinary indexes and bitmap indexes. Thank you, Dragos On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood wrote: > Dragos, > > How many SSTables did you have on disk, and were any of your index > expressions GT(E)/LT(E)? > > I expect that you are bumping into a limitation of the current > implementation: it opens up to 128 file-handles per SSTable in the worst > case for a GT/LT query (one per index bucket). > > A future version might remove that requirement, but for now, you should > probably bump the file handle limit on your machine to at least 2^16. > > Thanks, > Stu > > > -Original Message- > From: "dragos cernahoschi" > Sent: Monday, November 8, 2010 10:05am > To: dev@cassandra.apache.org > Subject: CASSANDRA-1472 (bitmap indexes) > > Hi, > > I've got an exception during the following test: > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04 > > test scenario: > - 1 column family > - about 15 columns > - 7 indexed columns (bitmap) > - 26 million rows (insert operation went fine) > - thrift "query" on 3 of the indexed columns with get_indexed_slices > (count: > 100) > - got the following exception: > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in > thread Thread[ReadStage:3,5,main] > java.io.IOError: java.io.FileNotFoundException: > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open > files) >at > > org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78) >at > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226) >at > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214) >at > org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523) >at > > org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103) >at > org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371) >at > > org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41) >at > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >at java.lang.Thread.run(Thread.java:662) > Caused by: java.io.FileNotFoundException: > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many open > files) >at java.io.FileInputStream.open(Native Method) >at java.io.FileInputStream.(FileInputStream.java:106) >at > org.apache.avro.file.SeekableFileInput.(SeekableFileInput.java:29) >at org.apache.avro.file.DataFileReader.(DataFileReader.java:38) >at > > org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:72) >... 10 more > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception in > thread Thread[ReadStage:2,5,main] > java.io.IOError: java.io.FileNotFoundException: > /home/dragos/cassandra/data/keyspace/visit-e-1018-Index.db (Too many open > files) >at > > org.apache.cassandra.io.util.BufferedSegmentedFile.getSegment(BufferedSegmentedFile.java:68) >at > > org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:129) >at > > org.apache.cassandra.io.util.SegmentedFile$SegmentIterator.next(SegmentedFile.java:1) >at > > org.apache.cassandra.io.sstable.SSTableReader.getPosition(SSTableReader.java:455) >at > > org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:572) >at > > org.apache.cassandra.db.columniterator.SSTableSliceIterator.(SSTableSliceIterator.java:49) >at > > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:72) >at > > org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:84) >at > > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1190) >at > > org.apache.cassandra.db.ColumnFami
Re: CASSANDRA-1472 (bitmap indexes)
Meantime the number of SSTable(s) reduced to just 7. Initially the compaction thread suffered the same problem of "too many open files" and couldn't do any compaction. But I'm still not able to run my tests: TimedOutException :( On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood wrote: > Hmm, 500 sstables is definitely a degenerate case: did you disable > compaction? By default, Cassandra strives to keep the sstable count below > ~32, since accesses to separate sstables require seeks. > > In this case, the query will seek 500 times to check the secondary index > for each sstable: if it finds matches it will need to seek to find them in > the primary index, and seek again for the data file. > > -Original Message- > From: "dragos cernahoschi" > Sent: Tuesday, November 9, 2010 5:33am > To: dev@cassandra.apache.org > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > There are about 500 SSTables (12GB of data including index data, > statistics...) The source data file had about 3GB/26 million rows. > > I only test with EQ expressions for now. > > Increasing the file limit resolved the problem, but now I'm getting > TimedOutException(s) from thrift when "querying" even with slice size of 1. > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a > test? > > I really have some interesting sets of data to test indexes with and I want > to make a comparison between ordinary indexes and bitmap indexes. > > Thank you, > Dragos > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood wrote: > > > Dragos, > > > > How many SSTables did you have on disk, and were any of your index > > expressions GT(E)/LT(E)? > > > > I expect that you are bumping into a limitation of the current > > implementation: it opens up to 128 file-handles per SSTable in the worst > > case for a GT/LT query (one per index bucket). > > > > A future version might remove that requirement, but for now, you should > > probably bump the file handle limit on your machine to at least 2^16. > > > > Thanks, > > Stu > > > > > > -Original Message- > > From: "dragos cernahoschi" > > Sent: Monday, November 8, 2010 10:05am > > To: dev@cassandra.apache.org > > Subject: CASSANDRA-1472 (bitmap indexes) > > > > Hi, > > > > I've got an exception during the following test: > > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04 > > > > test scenario: > > - 1 column family > > - about 15 columns > > - 7 indexed columns (bitmap) > > - 26 million rows (insert operation went fine) > > - thrift "query" on 3 of the indexed columns with get_indexed_slices > > (count: > > 100) > > - got the following exception: > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal exception > in > > thread Thread[ReadStage:3,5,main] > > java.io.IOError: java.io.FileNotFoundException: > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many > open > > files) > >at > > > > > org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78) > >at > > > > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226) > >at > > > > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.iterator(BitmapIndexReader.java:214) > >at > > > org.apache.cassandra.io.sstable.SSTableReader.scan(SSTableReader.java:523) > >at > > > > > org.apache.cassandra.db.secindex.KeysBitmapIndex.iterator(KeysBitmapIndex.java:103) > >at > > > org.apache.cassandra.db.ColumnFamilyStore.scan(ColumnFamilyStore.java:1371) > >at > > > > > org.apache.cassandra.service.IndexScanVerbHandler.doVerb(IndexScanVerbHandler.java:41) > >at > > > > > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:51) > >at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >at java.lang.Thread.run(Thread.java:662) > > Caused by: java.io.FileNotFoundException: > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many > open > > files) > >at java.io.FileInputStream.open(Native Method) > >at java.io.FileInputStream.(FileInputStream.java:106) > >at > > org.apache.avro.file.SeekableFileInput.(SeekableFileInput.java:29) > &g
Re: CASSANDRA-1472 (bitmap indexes)
I'm running the query on three columns with cardinalities: 22, 17 and 10. Interesting, if combining columns with cardinalities: 22 + 17 => no exception 22 + 10 => no exception 10 + 17 => timed out exception 22 + 17 + 10 => timed out exception On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood wrote: > Can you tell me a little bit about your key distribution? How many unique > values are indexed (the cardinality)? > > Until the OrBiC projection I mention on 1472 is implemented, the bitmap > secondary indexes will perform terribly for high cardinality datasets. > > Thanks! > > > -Original Message- > From: "dragos cernahoschi" > Sent: Tuesday, November 9, 2010 10:14am > To: dev@cassandra.apache.org > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > Meantime the number of SSTable(s) reduced to just 7. Initially the > compaction thread suffered the same problem of "too many open files" and > couldn't do any compaction. > > But I'm still not able to run my tests: TimedOutException :( > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood wrote: > > > Hmm, 500 sstables is definitely a degenerate case: did you disable > > compaction? By default, Cassandra strives to keep the sstable count below > > ~32, since accesses to separate sstables require seeks. > > > > In this case, the query will seek 500 times to check the secondary index > > for each sstable: if it finds matches it will need to seek to find them > in > > the primary index, and seek again for the data file. > > > > -Original Message- > > From: "dragos cernahoschi" > > Sent: Tuesday, November 9, 2010 5:33am > > To: dev@cassandra.apache.org > > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > > > There are about 500 SSTables (12GB of data including index data, > > statistics...) The source data file had about 3GB/26 million rows. > > > > I only test with EQ expressions for now. > > > > Increasing the file limit resolved the problem, but now I'm getting > > TimedOutException(s) from thrift when "querying" even with slice size of > 1. > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such a > > test? > > > > I really have some interesting sets of data to test indexes with and I > want > > to make a comparison between ordinary indexes and bitmap indexes. > > > > Thank you, > > Dragos > > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood wrote: > > > > > Dragos, > > > > > > How many SSTables did you have on disk, and were any of your index > > > expressions GT(E)/LT(E)? > > > > > > I expect that you are bumping into a limitation of the current > > > implementation: it opens up to 128 file-handles per SSTable in the > worst > > > case for a GT/LT query (one per index bucket). > > > > > > A future version might remove that requirement, but for now, you should > > > probably bump the file handle limit on your machine to at least 2^16. > > > > > > Thanks, > > > Stu > > > > > > > > > -Original Message- > > > From: "dragos cernahoschi" > > > Sent: Monday, November 8, 2010 10:05am > > > To: dev@cassandra.apache.org > > > Subject: CASSANDRA-1472 (bitmap indexes) > > > > > > Hi, > > > > > > I've got an exception during the following test: > > > > > > test machine: core 2 duo 2.93 2GB RAM Ubuntu 10.04 > > > > > > test scenario: > > > - 1 column family > > > - about 15 columns > > > - 7 indexed columns (bitmap) > > > - 26 million rows (insert operation went fine) > > > - thrift "query" on 3 of the indexed columns with get_indexed_slices > > > (count: > > > 100) > > > - got the following exception: > > > > > > 10/11/08 17:52:40 ERROR service.AbstractCassandraDaemon: Fatal > exception > > in > > > thread Thread[ReadStage:3,5,main] > > > java.io.IOError: java.io.FileNotFoundException: > > > /home/dragos/cassandra/data/keyspace/visit-e-814-4-Bitidx.db (Too many > > open > > > files) > > >at > > > > > > > > > org.apache.cassandra.io.sstable.bitidx.SegmentIterator.open(SegmentIterator.java:78) > > >at > > > > > > > > > org.apache.cassandra.io.sstable.bitidx.BitmapIndexReader.openBin(BitmapIndexReader.java:226) > > >at > > > > > > > > > org.apache.cassandra.io.
Re: CASSANDRA-1472 (bitmap indexes)
Welcome. It seems exactly that: when running one of the queries that generates a timed out exception, cassandra enters some kind of infinite loop. Trace: DEBUG 12:15:22,307 scan DEBUG 12:15:22,348 restricted ranges for query [78703492656118554854272571946195123045,0] are [[78703492656118554854272571946195123045,0]] DEBUG 12:15:22,348 scan ranges are [78703492656118554854272571946195123045,0] DEBUG 12:15:22,380 reading org.apache.cassandra.db.indexscancomm...@1544e44from 1...@localhost /127.0.0.1 DEBUG 12:15:22,402 For operator EQ on Lynx 2.7 in rows (1481600,3203072): bins (12,12) of # DEBUG 12:15:22,422 For operator EQ on Lynx 2.7 in rows (1852032,4003840): bins (12,12) of # DEBUG 12:15:22,423 For operator EQ on Lynx 2.7 in rows (718336,1551616): bins (12,12) of # DEBUG 12:15:22,423 For operator EQ on Lynx 2.7 in rows (1482112,3203072): bins (12,12) of # DEBUG 12:15:22,424 For operator EQ on Lynx 2.7 in rows (370432,800768): bins (12,12) of # DEBUG 12:15:22,424 For operator EQ on Lynx 2.7 in rows (5755392,12436992): bins (12,12) of # DEBUG 12:15:22,425 For operator EQ on Lynx 2.7 in rows (369664,800768): bins (12,12) of # DEBUG 12:15:22,515 collecting 0 of 2147483647: 62726f77736572:false:8...@0 DEBUG 12:15:22,515 collecting 1 of 2147483647: 636f6e6e656374696f6e:false:3...@0 DEBUG 12:15:22,515 collecting 2 of 2147483647: 636f756e747279:false:7...@0 DEBUG 12:15:22,516 collecting 3 of 2147483647: 646f6d61696e:false:1...@0 DEBUG 12:15:22,518 collecting 4 of 2147483647: 6475726174696f6e:false:3...@0 DEBUG 12:15:22,521 collecting 5 of 2147483647: 6c696e65:false:4...@0 DEBUG 12:15:22,521 collecting 6 of 2147483647: 6f73:false:1...@0 DEBUG 12:15:22,521 collecting 7 of 2147483647: 7069:false:3...@0 DEBUG 12:15:22,521 collecting 8 of 2147483647: 74696d657374616d70:false:1...@0 DEBUG 12:15:22,522 collecting 9 of 2147483647: 75736572:false:1...@0 DEBUG 12:15:22,522 collecting 10 of 2147483647: 7a6970:false:5...@0 DEBUG 12:15:22,523 collecting 0 of 2147483647: 62726f77736572:false:8...@0 DEBUG 12:15:22,524 collecting 1 of 2147483647: 636f6e6e656374696f6e:false:3...@0 DEBUG 12:15:22,524 collecting 2 of 2147483647: 636f756e747279:false:7...@0 DEBUG 12:15:22,524 collecting 3 of 2147483647: 646f6d61696e:false:1...@0 DEBUG 12:15:22,524 collecting 4 of 2147483647: 6475726174696f6e:false:3...@0 DEBUG 12:15:22,525 collecting 5 of 2147483647: 6c696e65:false:4...@0 DEBUG 12:15:22,525 collecting 6 of 2147483647: 6f73:false:1...@0 DEBUG 12:15:22,525 collecting 7 of 2147483647: 7069:false:3...@0 ... goes forever. I'll try the KEYS indexes on the same scenario and let you know. Dragos On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood wrote: > Interesting, thanks for the info. > > Perhaps the limitation is that index queries involving multiple clauses are > currently implemented using brute-force filtering rather than an index join? > The bitmap indexes have native support for this type of join, but it's not > being used yet. > > To confirm: have you tried the same scenario with KEYS indexes? They use > the same codepath for multiple index expressions, and should experience the > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging > enabled, to ensure that we aren't going into some kind of infinite loop? > > Thanks for the help, > Stu > > -Original Message- > From: "dragos cernahoschi" > Sent: Tuesday, November 9, 2010 11:50am > To: dev@cassandra.apache.org > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > I'm running the query on three columns with cardinalities: 22, 17 and 10. > Interesting, if combining columns with cardinalities: > > 22 + 17 => no exception > 22 + 10 => no exception > 10 + 17 => timed out exception > 22 + 17 + 10 => timed out exception > > > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood wrote: > > > Can you tell me a little bit about your key distribution? How many unique > > values are indexed (the cardinality)? > > > > Until the OrBiC projection I mention on 1472 is implemented, the bitmap > > secondary indexes will perform terribly for high cardinality datasets. > > > > Thanks! > > > > > > -Original Message- > > From: "dragos cernahoschi" > > Sent: Tuesday, November 9, 2010 10:14am > > To: dev@cassandra.apache.org > > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > > > Meantime the number of SSTable(s) reduced to just 7. Initially the > > compaction thread suffered the same problem of "too many open files" and > > couldn't do any compaction. > > > > But I'm still not able to run my tests: TimedOutException :( > > > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood wrote: > > > > > Hmm, 500 sstables is definitely a degenerate case: did you disable > > > com
Re: CASSANDRA-1472 (bitmap indexes)
I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP indexes: time out/succeed on the same queries. By the way, the insert of my data set with KEYS_BITMAP is much faster than KEYS (about 5.5 times) and less gc intensive. Dragos On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood wrote: > Interesting, thanks for the info. > > Perhaps the limitation is that index queries involving multiple clauses are > currently implemented using brute-force filtering rather than an index join? > The bitmap indexes have native support for this type of join, but it's not > being used yet. > > To confirm: have you tried the same scenario with KEYS indexes? They use > the same codepath for multiple index expressions, and should experience the > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG logging > enabled, to ensure that we aren't going into some kind of infinite loop? > > Thanks for the help, > Stu > > -Original Message- > From: "dragos cernahoschi" > Sent: Tuesday, November 9, 2010 11:50am > To: dev@cassandra.apache.org > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > I'm running the query on three columns with cardinalities: 22, 17 and 10. > Interesting, if combining columns with cardinalities: > > 22 + 17 => no exception > 22 + 10 => no exception > 10 + 17 => timed out exception > 22 + 17 + 10 => timed out exception > > > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood wrote: > > > Can you tell me a little bit about your key distribution? How many unique > > values are indexed (the cardinality)? > > > > Until the OrBiC projection I mention on 1472 is implemented, the bitmap > > secondary indexes will perform terribly for high cardinality datasets. > > > > Thanks! > > > > > > -Original Message- > > From: "dragos cernahoschi" > > Sent: Tuesday, November 9, 2010 10:14am > > To: dev@cassandra.apache.org > > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > > > Meantime the number of SSTable(s) reduced to just 7. Initially the > > compaction thread suffered the same problem of "too many open files" and > > couldn't do any compaction. > > > > But I'm still not able to run my tests: TimedOutException :( > > > > On Tue, Nov 9, 2010 at 5:51 PM, Stu Hood wrote: > > > > > Hmm, 500 sstables is definitely a degenerate case: did you disable > > > compaction? By default, Cassandra strives to keep the sstable count > below > > > ~32, since accesses to separate sstables require seeks. > > > > > > In this case, the query will seek 500 times to check the secondary > index > > > for each sstable: if it finds matches it will need to seek to find them > > in > > > the primary index, and seek again for the data file. > > > > > > -Original Message- > > > From: "dragos cernahoschi" > > > Sent: Tuesday, November 9, 2010 5:33am > > > To: dev@cassandra.apache.org > > > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > > > > > There are about 500 SSTables (12GB of data including index data, > > > statistics...) The source data file had about 3GB/26 million rows. > > > > > > I only test with EQ expressions for now. > > > > > > Increasing the file limit resolved the problem, but now I'm getting > > > TimedOutException(s) from thrift when "querying" even with slice size > of > > 1. > > > Is my machine too small (core 2 duo 2.93 2GB RAM Ubuntu 10.04) for such > a > > > test? > > > > > > I really have some interesting sets of data to test indexes with and I > > want > > > to make a comparison between ordinary indexes and bitmap indexes. > > > > > > Thank you, > > > Dragos > > > > > > On Mon, Nov 8, 2010 at 6:42 PM, Stu Hood > wrote: > > > > > > > Dragos, > > > > > > > > How many SSTables did you have on disk, and were any of your index > > > > expressions GT(E)/LT(E)? > > > > > > > > I expect that you are bumping into a limitation of the current > > > > implementation: it opens up to 128 file-handles per SSTable in the > > worst > > > > case for a GT/LT query (one per index bucket). > > > > > > > > A future version might remove that requirement, but for now, you > should > > > > probably bump the file handle limit on your machine to at least 2^16. > > > > > > > > Thanks, > > > > Stu > &
Re: CASSANDRA-1472 (bitmap indexes)
No problem. I'll do the test on Monday. On Nov 14, 2010 2:35 AM, "Stu Hood" wrote: > Is it worth testing 0.7-branch-without-1472 to make sure of that? Dragos: if you have time, this would be helpful. If you already have a KEYS index created, you shouldn't need to re-load the data, as the file format hasn't changed. Thanks, Stu On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis wrote: > Is it worth testing 0...
Re: CASSANDRA-1472 (bitmap indexes)
I've tested 0.7-beta3 branch index feature without the 1472 patch. The queries on more than one column works better than the patched version, but definitely not correctly. 1. - query on 3 columns, start key 1, row count 1 => no results - query on same columns, start key 1, row count 10 => 8 results 2. - same query, start key 1, row count 2 => 1 result - query again, start key = max (keys from prev query) + 1, row count 2 =>*time out, infinite cycle * 3. Is there any example on the pagination feature? (without knowing the expected number of rows). Will the get_indexed_slices return an empty list when there is no more results? - query on 1 column, start key 1, row count 1000 => ok - same query, start key = max (keys from prev query) + 1, row count 1000 => ok ... - *at some point the max (keys from prev query) < startkey and my pagination loop runs forever* Maybe I'm missing something on this. 4. - query on 1 column, row count 1000 => ok - query on 3 columns, row count 100 => time out (there is no infinite loop, the thread eventually terminates) Dragos On Sun, Nov 14, 2010 at 2:34 AM, Stu Hood wrote: > > Is it worth testing 0.7-branch-without-1472 to make sure of that? > Dragos: if you have time, this would be helpful. If you already have a KEYS > index created, you shouldn't need to re-load the data, as the file format > hasn't changed. > > Thanks, > Stu > > On Sat, Nov 13, 2010 at 4:40 PM, Jonathan Ellis wrote: > > > Is it worth testing 0.7-branch-without-1472 to make sure of that? > > > > On Fri, Nov 12, 2010 at 10:28 AM, Stu Hood wrote: > > > Great, thanks for the variable Dragos: I'm fairly sure I broke this in > > the > > > refactoring I did in 1472 to fit in a second index type. > > > > > > > > > On Fri, Nov 12, 2010 at 4:03 AM, dragos cernahoschi < > > > dragos.cernahos...@gmail.com> wrote: > > > > > >> I confirm: the KEYS indexes have the same behavior as the KEYS_BITMAP > > >> indexes: time out/succeed on the same queries. > > >> > > >> By the way, the insert of my data set with KEYS_BITMAP is much faster > > than > > >> KEYS (about 5.5 times) and less gc intensive. > > >> > > >> Dragos > > >> > > >> On Tue, Nov 9, 2010 at 8:05 PM, Stu Hood > > wrote: > > >> > > >> > Interesting, thanks for the info. > > >> > > > >> > Perhaps the limitation is that index queries involving multiple > > clauses > > >> are > > >> > currently implemented using brute-force filtering rather than an > index > > >> join? > > >> > The bitmap indexes have native support for this type of join, but > it's > > >> not > > >> > being used yet. > > >> > > > >> > To confirm: have you tried the same scenario with KEYS indexes? They > > use > > >> > the same codepath for multiple index expressions, and should > > experience > > >> the > > >> > same timeouts. Also, can you rerun the KEYS_BITMAP test with DEBUG > > >> logging > > >> > enabled, to ensure that we aren't going into some kind of infinite > > loop? > > >> > > > >> > Thanks for the help, > > >> > Stu > > >> > > > >> > -Original Message- > > >> > From: "dragos cernahoschi" > > >> > Sent: Tuesday, November 9, 2010 11:50am > > >> > To: dev@cassandra.apache.org > > >> > Subject: Re: CASSANDRA-1472 (bitmap indexes) > > >> > > > >> > I'm running the query on three columns with cardinalities: 22, 17 > and > > >> > 10. > > >> > Interesting, if combining columns with cardinalities: > > >> > > > >> > 22 + 17 => no exception > > >> > 22 + 10 => no exception > > >> > 10 + 17 => timed out exception > > >> > 22 + 17 + 10 => timed out exception > > >> > > > >> > > > >> > On Tue, Nov 9, 2010 at 6:29 PM, Stu Hood > > wrote: > > >> > > > >> > > Can you tell me a little bit about your key distribution? How many > > >> unique > > >> > > values are indexed (the cardinality)? > > >> > > > > >> > > Until the OrBiC projection I mention on 1472 is implemented, the > > >> > > bitmap > > >> > > secondary indexes will perform terribly for
Re: CASSANDRA-1472 (bitmap indexes)
Back. I've tested the keys index pagination once again. 0.7 head. Smaller data set: 1 million rows. It seems there are still some issues: 1. *test*: query on one column, count: 1000, expected number of distinct results: 48251 *result*: 5 pages of 1000 results, than, after the 6th page, the results begin to repeat, I would expect that repetition begins after the 48251-th row 2. *test*: query on 3 columns, count: 10 (count 100, count 1000 failed with time out) *result*: 1 page of 10 results, than second page => time out 3. There are queries with combinations of 2, 3 columns that fail right away with time out (count 10, 100). Dragos On Mon, Nov 15, 2010 at 2:29 PM, Jonathan Ellis wrote: > On Mon, Nov 15, 2010 at 5:57 AM, dragos cernahoschi > wrote: > > I've tested 0.7-beta3 branch index feature without the 1472 patch. The > > queries on more than one column works better than the patched version, > but > > definitely not correctly. > > Please test 0.7 branch head, as you can see from the CHANGES there > have been a lot of fixes. > > > 1. > > 2. > > 4. > > Should be fixed in head. > > > 3. Is there any example on the pagination feature? (without knowing the > > expected number of rows). > > Same way you paginate through range slices or columns within a row, > set start to the last result you got w/ previous query. > > > Will the get_indexed_slices return an empty list when there is no more > > results? > > No, all queries are start-inclusive. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
Re: CASSANDRA-1472 (bitmap indexes)
I've tried to reproduce my test data and the failing queries with stress.py. So, I've slightly modified the stress.py and added 2 more indexes for insertion. The indexrangeslice query is also performed on 3 indexes. The insert is done using an uniform distribution of values. Then: 1. python contrib/py_stress/stress.py -r -C 32 -x keys 2. python contrib/py_stress/stress.py -C 32 -o indexedrangeslice -t 3 The queries fails as in the attachment: not on the first query but on the 3rd, 4th ... not allways the same. Dragos On Mon, Nov 22, 2010 at 9:39 PM, Jonathan Ellis wrote: > Let's start wth the low-hanging fruit: can you give steps to reproduce > queries that fail right away? > > On Wed, Nov 17, 2010 at 10:37 AM, dragos cernahoschi > wrote: > > Back. I've tested the keys index pagination once again. 0.7 head. Smaller > > data set: 1 million rows. It seems there are still some issues: > > > > 1. *test*: query on one column, count: 1000, expected number of distinct > > results: 48251 > >*result*: 5 pages of 1000 results, than, after the 6th page, the > results > > begin to repeat, I would expect that repetition begins after the 48251-th > > row > > > > 2. *test*: query on 3 columns, count: 10 (count 100, count 1000 failed > with > > time out) > >*result*: 1 page of 10 results, than second page => time out > > > > 3. There are queries with combinations of 2, 3 columns that fail right > away > > with time out (count 10, 100). > > > > Dragos > > > > > > On Mon, Nov 15, 2010 at 2:29 PM, Jonathan Ellis > wrote: > > > >> On Mon, Nov 15, 2010 at 5:57 AM, dragos cernahoschi > >> wrote: > >> > I've tested 0.7-beta3 branch index feature without the 1472 patch. The > >> > queries on more than one column works better than the patched version, > >> but > >> > definitely not correctly. > >> > >> Please test 0.7 branch head, as you can see from the CHANGES there > >> have been a lot of fixes. > >> > >> > 1. > >> > 2. > >> > 4. > >> > >> Should be fixed in head. > >> > >> > 3. Is there any example on the pagination feature? (without knowing > the > >> > expected number of rows). > >> > >> Same way you paginate through range slices or columns within a row, > >> set start to the last result you got w/ previous query. > >> > >> > Will the get_indexed_slices return an empty list when there is no more > >> > results? > >> > >> No, all queries are start-inclusive. > >> > >> -- > >> Jonathan Ellis > >> Project Chair, Apache Cassandra > >> co-founder of Riptano, the source for professional Cassandra support > >> http://riptano.com > >> > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > #!/usr/bin/python # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # expects a Cassandra server to be running and listening on port 9160. # (read tests expect insert tests to have run first too.) have_multiproc = False try: from multiprocessing import Array as array, Process as Thread from uuid import uuid1 as get_ident Thread.isAlive = Thread.is_alive have_multiproc = True except ImportError: from threading import Thread from thread import get_ident from array import array from hashlib import md5 import time, random, sys, os from random import randint, gauss from optparse import OptionParser from thrift.transport import TTransport from thrift.transport import TSocket from thrift.transport import THttpClient from thrift.protocol import TBinaryProtocol try: from cassandra import Cassandra from cassandra.ttypes import * except ImportError: # add cassandra directory to sys.path L = os.path.abspath(__file_