Compacting single file forever

2011-04-20 Thread Shotaro Kamio
Hi,

I found that our cluster repeats compacting a single file forever
(cassandra 0.7.5). We are wondering if compaction logic is wrong. I'd
like to have comments from you guys.

Situation:
- After trying to repair a column family, our cluster's disk usage is
quite high. Cassandra cannot compact all sstables at once. I think it
repeats compacting single file at the end. (you can check the attached
log below)
- Our data doesn't have deletes. So, the compaction of single file
doesn't make free disk space.

We are approaching to full-disk. But I believe that the repair
operation made a lot of duplicate data on the disk and it requires
compaction. However, most of nodes stuck on compacting a single file.
The only thing we can do is to restart the nodes.

My question is why the compaction doesn't stop.

I looked at the logic in CompactionManager.java:
-
String compactionFileLocation =
table.getDataFileLocation(cfs.getExpectedCompactedFileSize(sstables));
// If the compaction file path is null that means we have no
space left for this compaction.
// try again w/o the largest one.
List smallerSSTables = new
ArrayList(sstables);
while (compactionFileLocation == null && smallerSSTables.size() > 1)
{
logger.warn("insufficient space to compact all requested
files " + StringUtils.join(smallerSSTables, ", "));
smallerSSTables.remove(cfs.getMaxSizeFile(smallerSSTables));
compactionFileLocation =
table.getDataFileLocation(cfs.getExpectedCompactedFileSize(smallerSSTables));
}
if (compactionFileLocation == null)
{
logger.error("insufficient space to compact even the two
smallest files, aborting");
return 0;
}
-

The while condition: smallerSSTables.size() > 1
Is this should be "smallerSSTables.size() > 2" ?

In my understanding, compaction of single file makes free disk space
only when the sstable has a lot of tombstone and only if the tombstone
is removed in the compaction. If cassandra knows the sstable has
tombstones to be removed, it's worth to compact it. Otherwise, it
might makes free space if you are lucky. In worst case, it leads to
infinite loop like our case.

What do you think the code change?


Best regards,
Shotaro


* Cassandra compaction log
-
 WARN [CompactionExecutor:1] 2011-04-20 01:03:14,446
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(
path='foobar-f-3020-Data.db'), SSTableReader(path='foobar-f-3034-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 03:47:29,833
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3035-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 9,855,385ms.

 WARN [CompactionExecutor:1] 2011-04-20 03:48:11,308
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-3020-Data.db'),
SSTableReader(path='foobar-f-3035-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 06:31:41,193
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3036-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 9,809,882ms.

 WARN [CompactionExecutor:1] 2011-04-20 06:32:22,476
CompactionManager.java (line 405) insufficient space to compact all
requested files SSTableReader(path='foobar-f-3020-Data.db'),
SSTableReader(path='foobar-f-3036-Data.db')
 INFO [CompactionExecutor:1] 2011-04-20 09:20:29,903
CompactionManager.java (line 482) Compacted to
foobar-tmp-f-3037-Data.db.  260,646,760,319 to 260,646,760,319 (~100%
of original) bytes for 6,893,896 keys.  Time: 10,087,424ms.
-
You can see that compacted size is always the same. It repeats
compacting the same single sstable.


Jenkins build is back to normal : Cassandra-0.8 #28

2011-04-20 Thread Apache Hudson Server
See 




Build failed in Jenkins: Cassandra-0.8 #29

2011-04-20 Thread Apache Jenkins Server
See 

Changes:

[jbellis] recognize key type metadata in CLI
patch by Pavel Yaskevich; reviewed by jbellis for CASSANDRA-2497

[jbellis] merge from 0.7

[jbellis] merge from 0.7

--
[...truncated 1999 lines...]
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.52 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.commitlog.CommitLogHeaderTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.381 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.context.CounterContextTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.553 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.IntegerTypeTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.196 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.RoundTripTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.424 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.TimeUUIDTypeTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.146 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.TypeCompareTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.082 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.TypeValidationTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.341 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.UUIDTypeTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.103 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.migration.SerializationsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.638 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.BootStrapperTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.386 sec
[junit] 
[junit] - Standard Error -
[junit]  WARN 09:55:45,790 Generated random token 
Token(bytes[e81dcd90b34f4562949e2b9013d8d0ad]). Random tokens will result in an 
unbalanced ring; see http://wiki.apache.org/cassandra/Operations
[junit] -  ---
[junit] Testsuite: org.apache.cassandra.dht.ByteOrderedPartitionerTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.134 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.dht.CollatingOrderPreservingPartitionerTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.445 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.OrderPreservingPartitionerTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.169 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.RandomPartitionerTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.768 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.RangeTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.387 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.ArrivalWindowTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.124 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.GossipDigestTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.063 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.SerializationsTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.38 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.hadoop.ColumnFamilyInputFormatTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.2 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.BloomFilterTrackerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.364 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.CompactSerializerTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.414 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.LazilyCompactedRowTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 2.224 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.sstable.DescriptorTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.056 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.sstable.IndexHelperTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.064 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.sstable.LegacySSTableTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.372 sec
[junit] 
[junit] - Standard Error -
[junit]  WARN 09:56:09,826 Invalid file '.svn' in data directory 

[junit]  WARN 09:56:10,439 Invalid file '.svn' in data directory 


Jenkins build is back to normal : Cassandra-0.8 #30

2011-04-20 Thread Apache Jenkins Server
See 




Jenkins build became unstable: Cassandra-Coverage #33

2011-04-20 Thread Apache Jenkins Server
See 




Re: [VOTE] Apache Cassandra 0.8.0-beta1 (take #2)

2011-04-20 Thread Vijay
+1
Regards,




On Tue, Apr 19, 2011 at 6:35 PM, Eric Evans  wrote:

>
> Let's try this again.  I propose the following artifacts for release as
> 0.8.0 beta1.
>
> You will note the addition of three new artifacts, cql-1.0.0.tar.gz,
> txcql-1.0.0.tar.gz and apache-cassandra-cql-1.0.0.jar.  These are
> language drivers for CQL; Be sure to include them in your review.
>
> SVN:
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@r1095239
> 0.8.0-beta1 artifacts: http://people.apache.org/~eevans
>
> The vote will be open for 72 hours, longer if needed.
>
> Thanks!
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: [VOTE] Apache Cassandra 0.8.0-beta1 (take #2)

2011-04-20 Thread Jeremy Hanna
+1 (non-binding)

On Apr 19, 2011, at 8:35 PM, Eric Evans wrote:

> 
> Let's try this again.  I propose the following artifacts for release as
> 0.8.0 beta1.
> 
> You will note the addition of three new artifacts, cql-1.0.0.tar.gz,
> txcql-1.0.0.tar.gz and apache-cassandra-cql-1.0.0.jar.  These are
> language drivers for CQL; Be sure to include them in your review.
> 
> SVN:
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@r1095239
> 0.8.0-beta1 artifacts: http://people.apache.org/~eevans
> 
> The vote will be open for 72 hours, longer if needed.
> 
> Thanks!
> 
> -- 
> Eric Evans
> eev...@rackspace.com
> 



Re: [VOTE] Apache Cassandra 0.8.0-beta1 (take #2)

2011-04-20 Thread Chris Goffinet
+1

-Chris

On Apr 19, 2011, at 6:35 PM, Eric Evans wrote:

> 
> Let's try this again.  I propose the following artifacts for release as
> 0.8.0 beta1.
> 
> You will note the addition of three new artifacts, cql-1.0.0.tar.gz,
> txcql-1.0.0.tar.gz and apache-cassandra-cql-1.0.0.jar.  These are
> language drivers for CQL; Be sure to include them in your review.
> 
> SVN:
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@r1095239
> 0.8.0-beta1 artifacts: http://people.apache.org/~eevans
> 
> The vote will be open for 72 hours, longer if needed.
> 
> Thanks!
> 
> -- 
> Eric Evans
> eev...@rackspace.com
> 



Re: Limitations on number of secondary indexes

2011-04-20 Thread aaron morton
Moving to user.
Aaron

On 20 Apr 2011, at 10:45, Jason Kolb wrote:

> I apologize if this has been answered before, I've tried to do some pretty
> exhaustive searching of the archives and haven't been able to see if this
> question has been answered before.
> 
> I was wondering if anyone knows if there is a practical upper limit on the
> number of secondary indexes used, if they're sparsely populated (say, 10,000
> secondary indexes only 2 of which are populated per row).  My understanding
> is that Cassandra creates another column family for each secondary index in
> the background, so the real limitation would appear to be the number of
> column families.
> 
> Is this correct?  And if so (or even if not), does anyone know the answer to
> the question about the upper limit on the number of secondary indexes?
> 
> Thanks!
> Jason



Re: [VOTE] Apache Cassandra 0.8.0-beta1 (take #2)

2011-04-20 Thread Courtney Robinson

+1

-Original Message- 
From: Eric Evans 
Sent: Wednesday, April 20, 2011 2:35 AM 
To: dev@cassandra.apache.org 
Subject: [VOTE] Apache Cassandra 0.8.0-beta1 (take #2) 



Let's try this again.  I propose the following artifacts for release as
0.8.0 beta1.

You will note the addition of three new artifacts, cql-1.0.0.tar.gz,
txcql-1.0.0.tar.gz and apache-cassandra-cql-1.0.0.jar.  These are
language drivers for CQL; Be sure to include them in your review.

SVN:
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.8@r1095239
0.8.0-beta1 artifacts: http://people.apache.org/~eevans

The vote will be open for 72 hours, longer if needed.

Thanks!

--
Eric Evans
eev...@rackspace.com