Hi, I'm using Cassandra to store some aggregated data in a structure like this:
KEY - product_id SUPER COLUMN NAME - timestamp and in the super column, I have a few columns with actual data. I am using a scan operation to find the latest super column (start=Long.MAX_VALUE, reversed=true, count=1) for a key, which worked fine for quite some time. But recently I needed to remove some of the columns within the super columns. After that things got weird: for some keys, the scan for latest super column work normally, but for some of them they stopped returning any results. I checked the data using the CLI and the data is obviously there. I can get it if I specify the super column name, but scanning for latest does not work. If I scan for previous data (start=some other timestamp less than maximum timestamp in cassandra), it works fine. I compared the data for keys that work, and those that don't, but there is no difference - the super column names are exactly the same and they contain the same amounts of columns. But the really weird thing is that the scans did not stop working immediately after some columns were removed. I was able to scan for the data and verify that the columns were removed correctly and only after a couple of minutes some scans stopped returning data. When I looked in the log, I've seen that Cassandra has been doing some compacting, flushing and deleting of .db files more or less at the time that the scans stopped working. I tried restarting Cassandra, but it did not help. Anyone had a similar problem? regards Pawel Dabrowski