Hi,

I'm using Cassandra to store some aggregated data in a structure like this:

KEY - product_id 
SUPER COLUMN NAME - timestamp
and in the super column, I have a few columns with actual data.

I am using a scan operation to find the latest super column 
(start=Long.MAX_VALUE, reversed=true, count=1) for a key, which worked fine for 
quite some time.
But recently I needed to remove some of the columns within the super columns.
After that things got weird: for some keys, the scan for latest super column 
work normally, but for some of them they stopped returning any results. I 
checked the data using the CLI and the data is obviously there. I can get it if 
I specify the super column name, but scanning for latest does not work. If I 
scan for previous data (start=some other timestamp less than maximum timestamp 
in cassandra), it works fine. 
I compared the data for keys that work, and those that don't, but there is no 
difference - the super column names are exactly the same and they contain the 
same amounts of columns.

But the really weird thing is that the scans did not stop working immediately 
after some columns were removed. I was able to scan for the data and verify 
that the columns were removed correctly and only after a couple of minutes some 
scans stopped returning data. When I looked in the log, I've seen that 
Cassandra has been doing some compacting, flushing and deleting of .db files 
more or less at the time that the scans stopped working.
I tried restarting Cassandra, but it did not help. 
Anyone had a similar problem?

regards
Pawel Dabrowski 

Reply via email to