We've seen a lot of errors that could be exactly this. Would this fit the mold? If so, I can confirm this happens in the wild, we've gotten hundreds of these in the last few months:
ERROR [ReadStage:58705] 2014-02-12 01:06:18,254 CassandraDaemon.java (line 192) Exception in thread Thread[ReadStage:58705,5,main] java.lang.IllegalArgumentException: Unknown CF d2225e32-bec7-373d-bdf8- 4642896f0755 On Wed, Feb 26, 2014 at 9:27 AM, Dan LaRocque <dal...@hopcount.org> wrote: > Hi, > > I think cfIdMap in config/Schema.java may be subject to unsynchronized > access by distinct threads. > > Say one thread adds a CF, maybe triggering resizes on cfIdMap's internal > tables. What guarantees that other threads calling Schema.instance.getId > concurrent with CF addition see an internally-consistent cfIdMap? > HashBiMap is not threadsafe, and Schema's methods that touch cfIdMap have > no explicit synchronization or locking, except for clear(). I think this > scenario could lead to spurious and rare "Unknown table/cf" exceptions on > reads/writes during unrelated schema migrations in 1.2 (reworded to > "Unknown keyspace/cf" in 2.0), which is how I got here in the first place. > > I could be misreading the access pattern, maybe by missing external > synchronization somewhere. I brought this to the list instead of JIRA > because I'm uncertain about the problem. I'm hoping for a sanity check. > > If this is actually a bug and not a misunderstanding, then a fix should be > pretty straightforward. Even though Maps.synchronizedBiMap could be deemed > unacceptable for read throughput reasons, it should be possible to get > decent reads by changing cfIdMap into a volatile reference to an > unmodifiable bimap and guarding all modifications with a single write lock. > > thanks, > Dan >