[VETOED] was: [VOTE] 0.7.1 (3 times the charm?)

2011-02-10 Thread Eric Evans
On Mon, 2011-02-07 at 18:52 -0600, Jonathan Ellis wrote:
> Should be fixed in r1068241.  Thanks!

I guess it goes without saying that this one is a bust.  Expect a new
vote to be started presently.

-- 
Eric Evans
eev...@rackspace.com



Re: Does Ruby library returns the RowKey?

2011-02-10 Thread Ryan King
On Thu, Feb 10, 2011 at 2:17 AM, Joshua Partogi  wrote:
> Hi,
>
> Does the Ruby library currently returns the RowKey during a row get?
> From what I am seeing it seems like it is only returning an
> OrderedHash of the columns. Would it be possible to return the RowKey,
> or it doesn't make sense to do so?

Which method are you talking about? It doesn't make sense to return
the row key on a get or get_slice, but does for multiget and company
(which it should already).

-ryan


[VOTE] 0.7.1 (what are we at now, 4?)

2011-02-10 Thread Eric Evans

I propose the following for release as 0.7.1.

SVN:
https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r1069461
0.7.1 artifacts: http://people.apache.org/~eevans

The vote will be open for 72 hours.

[1]: http://goo.gl/5VAPP (CHANGES.txt)
[2]: http://goo.gl/C9M5W (NEWS.txt)
[3]: http://goo.gl/8dZUr

-- 
Eric Evans
eev...@rackspace.com



Build failed in Hudson: Cassandra-0.7 #275

2011-02-10 Thread Apache Hudson Server
See 

Changes:

[jbellis] copy DecoratedKey.key when inserting into caches
patch by mdennis; reviewed by jbellis for CASSANDRA-2102

[eevans] debian/changelog: freshen timestamp

[eevans] prepend missing license headers

--
[...truncated 1599 lines...]
[junit] Testsuite: org.apache.cassandra.db.RemoveColumnFamilyWithFlush2Test
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.744 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.RemoveColumnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.151 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.RemoveSubColumnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.967 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.RemoveSuperColumnTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.064 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.RowIterationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.768 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.RowTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.453 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.SerializationsTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.484 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.SuperColumnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.124 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.TableTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 3.917 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.TimeSortTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 6.787 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.commitlog.CommitLogHeaderTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.448 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.IntegerTypeTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.182 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.TimeUUIDTypeTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.071 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.TypeCompareTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.073 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.marshal.TypeValidationTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 0.294 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.db.migration.SerializationsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.667 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.BootStrapperTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.837 sec
[junit] 
[junit] - Standard Error -
[junit]  WARN 18:02:43,763 Generated random token 
Token(bytes[298437429cf4e70c8108ac8a05094de2]). Random tokens will result in an 
unbalanced ring; see http://wiki.apache.org/cassandra/Operations
[junit] -  ---
[junit] Testsuite: org.apache.cassandra.dht.ByteOrderedPartitionerTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 1.128 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.dht.CollatingOrderPreservingPartitionerTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 1.343 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.OrderPreservingPartitionerTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.168 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.RandomPartitionerTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.791 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.RangeTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.444 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.ArrivalWindowTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.128 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.GossipDigestTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.053 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.SerializationsTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.445 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.hadoop.ColumnFamilyInputFormatTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.183 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.BloomFilterTrackerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.388 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.CompactSerializerTest
[junit] Tests run: 2, Failure

Re: [VOTE] 0.7.1 (what are we at now, 4?)

2011-02-10 Thread Stephen Connolly
I'll restage central artifacts by tommorrow morning. hoping this is the last
take

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 10 Feb 2011 17:50, "Eric Evans"  wrote:
>
> I propose the following for release as 0.7.1.
>
> SVN:
> https://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7@r1069461
> 0.7.1 artifacts: http://people.apache.org/~eevans
>
> The vote will be open for 72 hours.
>
> [1]: http://goo.gl/5VAPP (CHANGES.txt)
> [2]: http://goo.gl/C9M5W (NEWS.txt)
> [3]: http://goo.gl/8dZUr
>
> --
> Eric Evans
> eev...@rackspace.com
>


Cross Pollination is a beautiful thing

2011-02-10 Thread Joshua D. Drake
How else did we get the duckbill platypus?

It is beer-thirty folks and PostgreSQL would love to hook up with you. I
know we are probably and a little old school for your tastes but where
we lack energy, we have experience. As you know, experience goes a long
way in helping a relationship build strong ties. So what do you say
Cassandra? You beautiful, young and energetic technology? Wanna hook up?

Last call for CFP (closes TODAY):

https://www.postgresqlconference.org/talk_types

JD 
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt



Hudson build is back to normal : Cassandra-0.7 #276

2011-02-10 Thread Apache Hudson Server
See 




Build failed in Hudson: Cassandra #722

2011-02-10 Thread Apache Hudson Server
See 

Changes:

[eevans] basic JDBC'ish (non-compliant) driver for CQL

Patch by Vivek Mishra; reviewed by eevans for CASSANDRA-2124

--
[...truncated 1765 lines...]
[junit] Testsuite: org.apache.cassandra.dht.OrderPreservingPartitionerTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 1.149 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.RandomPartitionerTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.799 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.dht.RangeTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.448 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.ArrivalWindowTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.112 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.GossipDigestTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.gms.SerializationsTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.455 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.hadoop.ColumnFamilyInputFormatTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.181 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.BloomFilterTrackerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.394 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.CompactSerializerTest
[junit] Tests run: 0, Failures: 0, Errors: 1, Time elapsed: 1.298 sec
[junit] 
[junit] Testcase: org.apache.cassandra.io.CompactSerializerTest:Caused 
an ERROR
[junit] java.lang.ClassNotFoundException: null.BaseTest
[junit] java.lang.RuntimeException: java.lang.ClassNotFoundException: 
null.BaseTest
[junit] at 
org.apache.cassandra.io.CompactSerializerTest$1DirScanner.scan(CompactSerializerTest.java:117)
[junit] at 
org.apache.cassandra.io.CompactSerializerTest$1DirScanner.scan(CompactSerializerTest.java:86)
[junit] at 
org.apache.cassandra.io.CompactSerializerTest.scanClasspath(CompactSerializerTest.java:128)
[junit] Caused by: java.lang.ClassNotFoundException: null.BaseTest
[junit] at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
[junit] at java.security.AccessController.doPrivileged(Native Method)
[junit] at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
[junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
[junit] at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
[junit] at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
[junit] at java.lang.Class.forName0(Native Method)
[junit] at java.lang.Class.forName(Class.java:169)
[junit] at 
org.apache.cassandra.io.CompactSerializerTest$1DirScanner.scan(CompactSerializerTest.java:95)
[junit] 
[junit] 
[junit] Test org.apache.cassandra.io.CompactSerializerTest FAILED
[junit] Testsuite: org.apache.cassandra.io.LazilyCompactedRowTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 2.048 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.sstable.LegacySSTableTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.329 sec
[junit] 
[junit] - Standard Error -
[junit]  WARN 16:00:23,554 Invalid file '.svn' in data directory 

[junit]  WARN 16:00:24,150 Invalid file '.svn' in data directory 

[junit] -  ---
[junit] Testsuite: org.apache.cassandra.io.sstable.SSTableReaderTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.803 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.sstable.SSTableTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.087 sec
[junit] 
[junit] Testsuite: 
org.apache.cassandra.io.sstable.SSTableWriterAESCommutativeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.257 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.sstable.SSTableWriterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.562 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.io.util.BufferedRandomAccessFileTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.186 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.DynamicEndpointSnitchTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 6.196 sec
[junit] 
[junit] Testsuite: org.apache.cassandra.locator.NetworkTopologyStrategyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Ti

RE: SEVERE Data Corruption Problems

2011-02-10 Thread Dan Hendry
Upgraded one node to 0.7. Its logging exceptions like mad (thousands per
minute). All like below (which is fairly new to me):

ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java
(line 114) Fatal exception in thread Threa
d[ReadStage:721,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
mesIterator.java:75)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
esQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
ter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1275)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1167)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1095)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
nd.java:60)
at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
ageProxy.java:473)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:48)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
java:108)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
sIterator.java:106)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
mesIterator.java:71)
... 12 more

Dan


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: February-09-11 18:14
To: dev
Subject: Re: SEVERE Data Corruption Problems

Hi Dan,

it would be very useful to test with 0.7 branch instead of 0.7.0 so at
least you're not chasing known and fixed bugs like CASSANDRA-1992.

As you say, there's a lot of people who aren't seeing this, so it
would also be useful if you can provide some kind of test harness
where you can say "point this at a cluster and within a few hours

On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry 
wrote:
> I have been having SEVERE data corruption issues with SSTables in my
> cluster, for one CF it was happening almost daily (I have since shut down
> the service using that CF as it was too much work to manage the Cassandra
> errors). At this point, I can’t see how it is anything but a Cassandra bug
> yet it’s somewhat strange and very scary that I am the only one who seems
to
> be having such serious issues. Most of my data is indexed in two ways so I
> have been able to write a validator which goes through and back fills
> missing data but it’s kind of defeating the whole point of Cassandra. The
> only way I have found to deal with issues when they crop up to prevent
nodes
> crashing from repeated failed compactions is delete the SSTable. My
cluster
> is running a slightly modified 0.7.0 version which logs what files errors
> for so that I can stop the node and delete them.
>
>
>
> The problem:
>
> -  Reads, compactions and hinted handoff fail with various
> exceptions (samples shown at the end of this email) which seem to indicate
> sstable corruption.
>
> -  I have seen failed reads/compactions/hinted handoff on 4 out of
4
> nodes (RF=2) for 3 different super column families and 1 standard column
> family (4 out of 11) and just now, the Hints system CF. (if it matters the
> ring has not changed since one CF which has been giving me trouble was
> created). I have check SMART disk info and run various diagnostics and
there
> does not seem to be any hardware issues, plus what are the chances of all
> four nodes having the same hardware problems at the same time when for all
> other purposes, they appear fine?
>
> -  I have added logging which outputs what sstable are causing
> exceptions to be thrown. The corrupt sstables have been both freshly
flushed
> memtables and the output of compaction (ie, 4 sstables which all seem to
be
> fine get compacted to 1 which is then corrupt). It seems that the majority
> of corrupt sstables are post-compacted (vs post-memtable flush).
>
> -  The one CF which was giving me the most problems was heavily
> written to (1000-1500 writes/second continually across the cluster). For
> that cf, was having to deleting 4-6 sstables a day across the cluster (and
> the number was going up, ev

Re: RE: SEVERE Data Corruption Problems

2011-02-10 Thread Aaron Morton
Looks like the bloom filter for the row is corrupted, does it happen for all reads or just for reads on one row ? After the upgrade to 0.7 (assuming an 0.7 nightly build) did you run anything like nodetool repair ? Have you tried asking on the #cassandra IRC room to see if their are any comitters around ? AaronOn 11 Feb, 2011,at 01:18 PM, Dan Hendry  wrote:Upgraded one node to 0.7. Its logging exceptions like mad (thousands per
minute). All like below (which is fairly new to me):

ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java
(line 114) Fatal exception in thread Threa
d[ReadStage:721,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
mesIterator.java:75)
at
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
esQueryFilter.java:59)
at
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
ter.java:80)
at
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
re.java:1275)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1167)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
java:1095)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at
org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
nd.java:60)
at
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
ageProxy.java:473)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:48)
at
org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
alizer.java:30)
at
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
java:108)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
sIterator.java:106)
at
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
mesIterator.java:71)
... 12 more

Dan


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: February-09-11 18:14
To: dev
Subject: Re: SEVERE Data Corruption Problems

Hi Dan,

it would be very useful to test with 0.7 branch instead of 0.7.0 so at
least you're not chasing known and fixed bugs like CASSANDRA-1992.

As you say, there's a lot of people who aren't seeing this, so it
would also be useful if you can provide some kind of test harness
where you can say "point this at a cluster and within a few hours

On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry 
wrote:
> I have been having SEVERE data corruption issues with SSTables in my
> cluster, for one CF it was happening almost daily (I have since shut down
> the service using that CF as it was too much work to manage the Cassandra
> errors). At this point, I can’t see how it is anything but a Cassandra bug
> yet it’s somewhat strange and very scary that I am the only one who seems
to
> be having such serious issues. Most of my data is indexed in two ways so I
> have been able to write a validator which goes through and back fills
> missing data but it’s kind of defeating the whole point of Cassandra. The
> only way I have found to deal with issues when they crop up to prevent
nodes
> crashing from repeated failed compactions is delete the SSTable. My
cluster
> is running a slightly modified 0.7.0 version which logs what files errors
> for so that I can stop the node and delete them.
>
>
>
> The problem:
>
> -  Reads, compactions and hinted handoff fail with various
> exceptions (samples shown at the end of this email) which seem to indicate
> sstable corruption.
>
> -  I have seen failed reads/compactions/hinted handoff on 4 out of
4
> nodes (RF=2) for 3 different super column families and 1 standard column
> family (4 out of 11) and just now, the Hints system CF. (if it matters the
> ring has not changed since one CF which has been giving me trouble was
> created). I have check SMART disk info and run various diagnostics and
there
> does not seem to be any hardware issues, plus what are the chances of all
> four nodes having the same hardware problems at the same time when for all
> other purposes, they appear fine?
>
> -  I have added logging which outputs what sstable are causing
> exceptions to be thrown. The corrupt sstables have been both freshly
flushed
> memtables and the output of compaction (ie, 4 sstables which all seem to
be
> fine get comp

Re: RE: SEVERE Data Corruption Problems

2011-02-10 Thread Jonathan Ellis
to me it looks like either we broke upgrading from older sstables in
https://issues.apache.org/jira/browse/CASSANDRA-1555 (unlikely) or
your system is hosed enough that you probably need to start from a
fresh 0.7.1 install.

On Thu, Feb 10, 2011 at 7:16 PM, Aaron Morton  wrote:
> Looks like the bloom filter for the row is corrupted, does it happen for all
> reads or just for reads on one row ? After the upgrade to 0.7 (assuming an
> 0.7 nightly build) did you run anything like nodetool repair ?
> Have you tried asking on the #cassandra IRC room to see if their are any
> comitters around ?
>
> Aaron
> On 11 Feb, 2011,at 01:18 PM, Dan Hendry  wrote:
>
> Upgraded one node to 0.7. Its logging exceptions like mad (thousands per
> minute). All like below (which is fairly new to me):
>
> ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java
> (line 114) Fatal exception in thread Threa
> d[ReadStage:721,5,main]
> java.io.IOError: java.io.EOFException
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
> mesIterator.java:75)
> at
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
> esQueryFilter.java:59)
> at
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
> ter.java:80)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
> re.java:1275)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
> java:1167)
> at
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
> java:1095)
> at org.apache.cassandra.db.Table.getRow(Table.java:384)
> at
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
> nd.java:60)
> at
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
> ageProxy.java:473)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
> va:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
> 08)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
> alizer.java:48)
> at
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
> alizer.java:30)
> at
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
> java:108)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
> sIterator.java:106)
> at
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
> mesIterator.java:71)
> ... 12 more
>
> Dan
>
>
> -Original Message-
> From: Jonathan Ellis [mailto:jbel...@gmail.com]
> Sent: February-09-11 18:14
> To: dev
> Subject: Re: SEVERE Data Corruption Problems
>
> Hi Dan,
>
> it would be very useful to test with 0.7 branch instead of 0.7.0 so at
> least you're not chasing known and fixed bugs like CASSANDRA-1992.
>
> As you say, there's a lot of people who aren't seeing this, so it
> would also be useful if you can provide some kind of test harness
> where you can say "point this at a cluster and within a few hours
>
> On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry 
> wrote:
>> I have been having SEVERE data corruption issues with SSTables in my
>> cluster, for one CF it was happening almost daily (I have since shut down
>> the service using that CF as it was too much work to manage the Cassandra
>> errors). At this point, I can’t see how it is anything but a Cassandra bug
>> yet it’s somewhat strange and very scary that I am the only one who seems
> to
>> be having such serious issues. Most of my data is indexed in two ways so I
>> have been able to write a validator which goes through and back fills
>> missing data but it’s kind of defeating the whole point of Cassandra. The
>> only way I have found to deal with issues when they crop up to prevent
> nodes
>> crashing from repeated failed compactions is delete the SSTable. My
> cluster
>> is running a slightly modified 0.7.0 version which logs what files errors
>> for so that I can stop the node and delete them.
>>
>>
>>
>> The problem:
>>
>> -  Reads, compactions and hinted handoff fail with various
>> exceptions (samples shown at the end of this email) which seem to indicate
>> sstable corruption.
>>
>> -  I have seen failed reads/compactions/hinted handoff on 4 out of
> 4
>> nodes (RF=2) for 3 different super column families and 1 standard column
>> family (4 out of 11) and just now, the Hints system CF. (if it matters the
>> ring has not changed since one CF which has been giving me trouble was
>> created). I have check SMART disk info and run various diagnostics and
> there
>> does not seem to be any hardware issues, plus what are the chances of all
>> four nodes having the same hardware problems at the same 

Re: SEVERE Data Corruption Problems

2011-02-10 Thread Jake Luciani
Can you show us sstable listing names? should be *-f-Data.db

On Thu, Feb 10, 2011 at 7:18 PM, Dan Hendry wrote:

> Upgraded one node to 0.7. Its logging exceptions like mad (thousands per
> minute). All like below (which is fairly new to me):
>
> ERROR [ReadStage:721] 2011-02-10 18:13:56,190 AbstractCassandraDaemon.java
> (line 114) Fatal exception in thread Threa
> d[ReadStage:721,5,main]
> java.io.IOError: java.io.EOFException
>at
>
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
> mesIterator.java:75)
>at
>
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(Nam
> esQueryFilter.java:59)
>at
>
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFil
> ter.java:80)
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilySto
> re.java:1275)
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
> java:1167)
>at
>
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.
> java:1095)
>at org.apache.cassandra.db.Table.getRow(Table.java:384)
>at
>
> org.apache.cassandra.db.SliceByNamesReadCommand.getRow(SliceByNamesReadComma
> nd.java:60)
>at
>
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(Stor
> ageProxy.java:473)
>at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
> va:886)
>at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
> 08)
>at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.EOFException
>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>at
>
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
> alizer.java:48)
>at
>
> org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSeri
> alizer.java:30)
>at
>
> org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.
> java:108)
>at
>
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableName
> sIterator.java:106)
>at
>
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNa
> mesIterator.java:71)
>... 12 more
>
> Dan
>
>
> -Original Message-
> From: Jonathan Ellis [mailto:jbel...@gmail.com]
> Sent: February-09-11 18:14
> To: dev
> Subject: Re: SEVERE Data Corruption Problems
>
> Hi Dan,
>
> it would be very useful to test with 0.7 branch instead of 0.7.0 so at
> least you're not chasing known and fixed bugs like CASSANDRA-1992.
>
> As you say, there's a lot of people who aren't seeing this, so it
> would also be useful if you can provide some kind of test harness
> where you can say "point this at a cluster and within a few hours
>
> On Wed, Feb 9, 2011 at 4:31 PM, Dan Hendry 
> wrote:
> > I have been having SEVERE data corruption issues with SSTables in my
> > cluster, for one CF it was happening almost daily (I have since shut down
> > the service using that CF as it was too much work to manage the Cassandra
> > errors). At this point, I can’t see how it is anything but a Cassandra
> bug
> > yet it’s somewhat strange and very scary that I am the only one who seems
> to
> > be having such serious issues. Most of my data is indexed in two ways so
> I
> > have been able to write a validator which goes through and back fills
> > missing data but it’s kind of defeating the whole point of Cassandra. The
> > only way I have found to deal with issues when they crop up to prevent
> nodes
> > crashing from repeated failed compactions is delete the SSTable. My
> cluster
> > is running a slightly modified 0.7.0 version which logs what files errors
> > for so that I can stop the node and delete them.
> >
> >
> >
> > The problem:
> >
> > -  Reads, compactions and hinted handoff fail with various
> > exceptions (samples shown at the end of this email) which seem to
> indicate
> > sstable corruption.
> >
> > -  I have seen failed reads/compactions/hinted handoff on 4 out
> of
> 4
> > nodes (RF=2) for 3 different super column families and 1 standard column
> > family (4 out of 11) and just now, the Hints system CF. (if it matters
> the
> > ring has not changed since one CF which has been giving me trouble was
> > created). I have check SMART disk info and run various diagnostics and
> there
> > does not seem to be any hardware issues, plus what are the chances of all
> > four nodes having the same hardware problems at the same time when for
> all
> > other purposes, they appear fine?
> >
> > -  I have added logging which outputs what sstable are causing
> > exceptions to be thrown. The corrupt sstables have been both freshly
> flushed
> > memtables and the output of compaction (ie, 4 sstables which all seem to
> be
> > fine get compacted to 1 which 

AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main]

2011-02-10 Thread lztaomin

Hi,
My two-node cluster, run the following error after a period of time
The  server log:
 
INFO [HintedHandoff:1] 2011-02-10 00:15:28,131 ColumnFamilyStore.java (line 
952) Enqueuing flush of Memtable-HintsColumnFamily@2096038135(3281 bytes, 77 
operations)
INFO [FlushWriter:1] 2011-02-10 00:15:28,132 Memtable.java (line 155) Writing 
Memtable-HintsColumnFamily@2096038135(3281 bytes, 77 operations)
INFO [CompactionExecutor:1] 2011-02-10 00:15:28,141 CompactionManager.java 
(line 272) Compacting 
[org.apache.cassandra.io.sstable.SSTableReader(path='/home/mengting/cassandra/data/system/HintsColumnFamily-e-109-Data.db'),org.apache.cassandra.io.sstable.SSTableReader(path='/home/mengting/cassandra/data/system/HintsColumnFamily-e-108-Data.db')]
ERROR [CompactionExecutor:1] 2011-02-10 00:17:07,926 
AbstractCassandraDaemon.java (line 91) Fatal exception in thread 
Thread[CompactionExecutor:1,1,main]
ERROR [HintedHandoff:1] 2011-02-10 00:17:07,927 AbstractCassandraDaemon.java 
(line 91) Fatal exception in thread Thread[HintedHandoff:1,1,main]
INFO [FlushWriter:1] 2011-02-10 00:17:07,988 Memtable.java (line 162) Completed 
flushing /home/mengting/cassandra/data/system/HintsColumnFamily-e-110-Data.db 
(2939 bytes)
INFO [COMMIT-LOG-WRITER] 2011-02-10 00:17:07,989 CommitLog.java (line 470) 
Discarding obsolete commit 
log:CommitLogSegment(/home/mengting/cassandra/commitlog/CommitLog-1297186486611.log)
 
Thanks 


Hudson build is back to normal : Cassandra #723

2011-02-10 Thread Apache Hudson Server
See