Fwd: goods search with cassandra
Can someone give me some advices? Thanks! -- Forwarded message -- From: Chen Xinli Date: 2010/7/19 Subject: goods search with cassandra To: u...@cassandra.apache.org Hi, I want to implement goods search with cassandra; and I have some confusings. Can someone help me out? The case is that: There are about 1 million shops, every shop with about 10,000 goods, every goods with property like "title", "price" etc.. The search is like "give me 10 goods in a specific shop and the price of the goods should be less than 10$" For the data model, I use shop name as the key; goods id as the column name and "title", "price" are special encoded as column value . There are too many goods in one shop, filtering the data in thrift client is impossible for network transferring reason. I want to implement a special ColumnValueFilter extends QueryFilter to get the result in "local". Is this the best way? Insertion of goods is about 100/second for the whole cluster, so a thrift client for insertion is ok. For reads, latency and qps are important and I must provide a http service for user searching. Embedding a thrift client in such a service will involve another network transferring, so I want to build the service on top of cassandra directly. I reviewed the code of ClientOnlyExample.java. What makes me confusing is that: insertion through thrift client and reading through using cassandra directly, is data consistency promised and how? Any help is appreciated. Thanks! -- Best Regards, Chen Xinli -- Best Regards, Chen Xinli
Re: cassandra for a inbox search with high reading qps
Hi, Despite our use cases, is't it a good feature to disable reading when a node is doing hinted handoff, just like bootstraping? It will be very useful for READ ONE consitency level. Or it can be an option in storage-conf.xml, and user can set it when necessary. I'd like to implement this feature if it's useful. 2010/8/18 Chen Xinli > Thanks for your reply. > > Cassandra, in our case, is used for searching purposes not for data > storage. > We can build the cassandra keyspace data daily/weekly when system load is > lower. > > We have modified the cassandra code to add a value filter which makes the > data-repair not working. > The value filter, as I say, is to filter the columns of a key, and only the > desired column is returned. > The filter is done in local cassandra, not in thrift client; So we have to > disable data-repair. > > Cassandra has met most of our needs except that: > if a node fails, after a while, recovers, joins the cluster and doing > hinted handoff, then a reading is forward to this node, the data returned is > out of date. > > The node failure is not frequently; if it happens unfortunately, we should > keep the reading consitency. > > > 2010/8/18 Benjamin Black > > On Tue, Aug 17, 2010 at 7:55 PM, Chen Xinli wrote: >> > Hi, >> > >> > We are going to use cassandra for searching purpose like inbox search. >> > The reading qps is very high, we'd like to use ConsitencyLevel.One for >> > reading and disable read-repair at the same time. >> > >> >> In 0.7 you can set a probability for read repair, but disabling it is >> a spectacularly bad idea. Any write problems on a node will result in >> persistent inconsistency. >> >> > For reading consistency in this condition, the writing should use >> > ConsistencyLevel.ALL. But the writing will fail if one node fails. >> >> You are free to read and write with consistency levels where R+W < N, >> it just means you have weaker consistency guarantees. >> >> > We want such a ConsistencyLevel for writing/reading that : >> > 1. writing will success if there is node alive for this key >> > 2. reading will not forward to a node that's just recovered and doing >> hinted >> > handoff >> > >> > So that, if some node fails, others nodes for replica will receive the >> data >> > and surve reading successfully; >> > when the failure node recovers, it will receive hinted handoff from >> other >> > nodes and it'll not surve reading until hinted handoff is done. >> > >> > Does cassandra support the cases already? or should I modify the code to >> > meet our requirements? >> > >> >> You are phrasing these requirements in terms of a specific >> implementation. What are your actual consistency goals? If node >> failure is such a common occurrence in your system, you are going to >> have _numerous_ problems. >> >> >> b >> > > > > -- > Best Regards, > Chen Xinli > -- Best Regards, Chen Xinli
Re: cassandra for a inbox search with high reading qps
2010/8/19 Oleg Anastasjev > Chen Xinli gmail.com> writes: > > > > > Hi, > > > > Despite our use cases, is't it a good feature to disable reading when a > node > > is doing hinted handoff, just like bootstraping? > > It will be very useful for READ ONE consitency level. > > > > Or it can be an option in storage-conf.xml, and user can set it when > > necessary. > > > > I'd like to implement this feature if it's useful. > > Yes, we also use this failback policy, but currently doing it manually - > returning node back to clients nodelist only after hinted handoff is > completed. > So at least for us this should be handy, if you'd implement this to happen > automatically. > > would you pls describe the manual operation with more details? I have not found any related information. -- Best Regards, Chen Xinli
Re: cassandra for a inbox search with high reading qps
Thanks for the update. I got the idea; it also works in our case. I read the last post by Rob; this feature has been posted before, and there seems no such an easy way to figure it out. Any way, the manual operation can solve our problem. Thanks a lot! 2010/8/20 Oleg Anastasjev > Chen Xinli gmail.com> writes: > > > would you pls describe the manual operation with more details? > > I have not found any related information. > > > Um, this is code of our in-house implementation of cassandra client > libraries. > The main idea is that normally clients query ring and work directly with > nodes > found in ring until they detect failure or slow down of a particular node. > Then clients fail over to the next node in ring automatically. Failed node > is > placed by clients to failed nodes list and will not ever be used by clients > operator's command. This command is not to cassandra servers, but to > clients > saying to exclude node from failed nodes list. > > And there is procedure, that operator should inspect, did failed node > finished > hinted handoff prior issuing this command to clients. > > If we'd have possibility to inspect this condition automatically, we'd > eliminated this manual inspection from our workflow. > > > > -- Best Regards, Chen Xinli
when will cassandra 0.7 be realeased?
Hi, We are going to use cassandra in our production env, and want to use the feature defining keyspace on the fly. When the 0.7 version will be released? or just beta is ok? Thanks. -- Best Regards, Chen Xinli
Re: when will cassandra 0.7 be realeased?
Thanks Jonathan. I can do testing with beta, then upgrade to final version in Oct. 2010/9/17 Jonathan Ellis > I don't recommend using the betas for anything but testing. > > We should see 0.7 final in October. > > On Wed, Sep 15, 2010 at 10:27 PM, Chen Xinli wrote: > > Hi, > > > > We are going to use cassandra in our production env, and want to use the > > feature defining keyspace on the fly. > > When the 0.7 version will be released? or just beta is ok? > > > > Thanks. > > > > -- > > Best Regards, > > Chen Xinli > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > -- Best Regards, Chen Xinli
a bug in Hinted handoff
Hi, I'm using 0.6.6 and find hinted handoff never work. In HintedHandOffManager.java # sendMessage(InetAddress endPoint, String tableName, String key) QueryFilter filter = new SliceQueryFilter(tableName, new QueryPath(cfs.getColumnFamilyName()), startColumn, ArrayUtils.EMPTY_BYTE_ARRAY, false, PAGE_SIZE); ColumnFamily cf = cfs.getColumnFamily(filter); if (pagingFinished(cf, startColumn)) break; tableName should be key. -- Best Regards, Chen Xinli
Re: a bug in Hinted handoff
2010/10/22 Brandon Williams > On Fri, Oct 22, 2010 at 3:28 AM, Chen Xinli wrote: > > > Hi, > > > > I'm using 0.6.6 and find hinted handoff never work. > > > > In HintedHandOffManager.java # sendMessage(InetAddress endPoint, String > > tableName, String key) > > > > QueryFilter filter = new SliceQueryFilter(tableName, new > > QueryPath(cfs.getColumnFamilyName()), startColumn, > > ArrayUtils.EMPTY_BYTE_ARRAY, false, PAGE_SIZE); > >ColumnFamily cf = cfs.getColumnFamily(filter); > >if (pagingFinished(cf, startColumn)) > >break; > > > > tableName should be key. > > > > In 0.6, the HH schema is a SCF, where the row key is the keyspace for which > the hints belong, the supercolumn name is the row key the hint belongs to, > and the subcolumns are the IP addresses of the destinations. What > sendMessage is doing here is using the keyspace (tableName) as the row key, > which is correct. > > -Brandon > You are talking about fetching row keys and IP from HINTS column family; using tableName as row key is correct there. This logic is in method deliverHintsToEndpoint In sendMessage, using tableName as row key obviously is wrong. I think it was caused by copying-sourcecode. -- Best Regards, Chen Xinli