nodetool repair of large partition

2017-01-30 Thread Jimmy Lin
hi, if i have a row in a table that contain large data (not necessary super wide row), say 10 G and a replication factor of 3. During a repair, if the data of the row in each of the node is simply off by 1 byte, is cassandra smart enough to stream only partial of the data (maybe based on a range o

testing retry policy

2016-08-31 Thread Jimmy Lin
hi all, I have some customized retry policies that want to test. In my single node local cluster, is there anyway to simulate the read/write timeout and or unavailable exception? I tried to kill the Cassandra process but it won't result in unavailable exception but no host available exception and s

how expensive is light weight transaction: if not exists

2016-04-27 Thread Jimmy Lin
hi all, we like to consider using light weight transaction like the following: begin batch: update table set x=y where id=A if not exists; update table set x=y where id=B if not exists; update table set x=y where id=C if not exists; update table set x=y where id=D if not exists; apply batch (using

Limit 1

2016-04-20 Thread Jimmy Lin
I have a following table(using default sized tier compaction) that its column get TTLed every hour(as we want to keep only the last 1 hour events) And I do Select * from mytable where object_id = ‘’ LIMIT 1; And since query only interested in last/latest value, will cassandra need to scan m

datastax java driver Batch vs BatchStatement

2016-03-24 Thread Jimmy Lin
Hi all, What is the difference between datastax driver Batch and BatchStatement? In particular, BatchStatment call out that it needs native protocol of version 2 or above. What is the advantage using native protocol 2.0 for batch execution? Will any of these two api smart enough to split a big b

Re: how to read parent_repair_history table?

2016-02-29 Thread Jimmy Lin
* from repair_history where keyspace = 'ks' columnfamily_name = > 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants > CONTAINS 'node_IP'; > > > > 2016-02-25 16:22 GMT-03:00 Jimmy Lin : > >> hi Paulo, >> >> one more fo

Re: Checking replication status

2016-02-29 Thread Jimmy Lin
nd updates _on > the query being performed_. > 3) Repair. > > If a machine goes down for longer than max_hint_window_in_ms, AFAIK you > _will_ have missing data. If you cannot tolerate this situation, you need > to take a look at your tunable consistency and/or trigger a repair. >

Re: Checking replication status

2016-02-25 Thread Jimmy Lin
aemeon C.M. ReiydelleUSA (+1) 415.501.0198 > <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 > <%28%2B44%29%20%280%29%2020%208144%209872>* > > On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin wrote: > >> hi all, >> >> what are the better ways to check repl

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
* from repair_history where keyspace = 'ks' columnfamily_name = > 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants > CONTAINS 'node_IP'; > > > > 2016-02-25 16:22 GMT-03:00 Jimmy Lin : > >> hi Paulo, >> >

Checking replication status

2016-02-25 Thread Jimmy Lin
hi all, what are the better ways to check replication overall status of cassandra cluster? within a single DC, unless a node is down for long time, most of the time i feel it is pretty much non-issue and things are replicated pretty fast. But when a node come back from a long offline, is there

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
different keyspaces will have different > repair session ids. > > 2016-02-25 15:04 GMT-03:00 Jimmy Lin : >> hi Paulo, >> follow up on the # of entries question... >> why each job repair execution will have 2 entries? I thought it will be one >> entry, begin

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
> cluster? > > Check if repair is being executed on all nodes within gc_grace_seconds, and > tune that value or troubleshoot problems otherwise. > > > Scanning through parent_repair_history and making sure all the known > > keyspaces has a good repair run in recent d

Re: how to read parent_repair_history table?

2016-02-25 Thread Jimmy Lin
all nodes within gc_grace_seconds, and > tune that value or troubleshoot problems otherwise. > > > Scanning through parent_repair_history and making sure all the known > > keyspaces has a good repair run in recent days? > > Sounds good. > > You can check https://

how to read parent_repair_history table?

2016-02-24 Thread Jimmy Lin
hi all, few questions regarding how to read or digest the system_distributed.parent_repair_history CF, that I am very intereted to use to find out our repair status... - Is every invocation of nodetool repair execution will be recorded as one entry in parent_repair_history CF regardless if it is a

Re: timeout creating table

2015-04-23 Thread Jimmy Lin
com> wrote: > That is a problem, you should not have RF > N. > > Do an alter table to fix it. > > This will affect your reads and writes if you're doing anything > CL 1 --> > timeouts. > On Apr 23, 2015 4:35 AM, "Jimmy Lin" wrote: > >> Als

Re: timeout creating table

2015-04-23 Thread Jimmy Lin
Also I am not sure it matters, but I just realized the keyspace created has replication factor of 2 when my Cassandra is really just a single node. Is Cassandra smart enough to ignore the RF of 2 and work with only 1 single node? On Mon, Apr 20, 2015 at 8:23 PM, Jimmy Lin wrote: > hi, >

Re: timeout creating table

2015-04-20 Thread Jimmy Lin
world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such

Re: timeout creating table

2015-04-20 Thread Jimmy Lin
> > Software Engineer in Test | jim.witsc...@datastax.com > > On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin wrote: > > hi, > > we have some unit tests that run parallel that will create tmp keyspace, > and > > tables and then drop them after tests are done. > &

timeout creating table

2015-04-19 Thread Jimmy Lin
hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. >From time to time, our create table statement run into "All hosts(s) for query failed... Timeout during read" (from datastax driver) error. We later turn on tracing, a

timeout when using secondary index

2015-03-06 Thread Jimmy Lin
Hi, Ran into RPC timeout exception when execution a query that involve secondary index of a Boolean column when for example the company has more than 1k person. select * from company where company_id= and isMale = true; such extreme low cardinality of secondary index like the other docs stat

Re: read repair across DC and latency

2014-11-19 Thread Jimmy Lin
s wrote: > > On Sun, Nov 16, 2014 at 5:13 PM, Jimmy Lin wrote: > >> I have read that read repair suppose to be running as background, but >> does the co-ordinator node need to wait for the response(along with other >> normal read tasks) before return the entire result

read repair across DC and latency

2014-11-16 Thread Jimmy Lin
I have a CF that use the default, read_repair_chance (0.1) and dc_read_repair_chance(0). Our read and write is all local_quorum, on one of the 2 DC, replication of 3. so a read will have 10% chance trigger a read repair to other DC. # I have read that read repair suppose to be running as back

Re: query tracing

2014-11-15 Thread Jimmy Lin
om Mailbox <https://www.dropbox.com/mailbox> > > > On Sat, Nov 15, 2014 at 9:40 AM, Jimmy Lin wrote: > >> Well we are able to do the tracing under normal load, but not yet able >> to turn on tracing on demand during heavy load from client side(due to hard >> to

Re: query tracing

2014-11-15 Thread Jimmy Lin
>> wrote: >> >>> It saves a lot of information for each request thats traced so there is >>> significant overhead. If you start at a low probability and move it up >>> based on the load impact it will provide a lot of insight and you can >>> control

query tracing

2014-11-07 Thread Jimmy Lin
is there any significant performance penalty if one turn on Cassandra query tracing, through DataStax java driver (say, per every query request of some trouble query)? More sampling seems better but then doing so may also slow down the system in some other ways? thanks

Re: tuning concurrent_reads param

2014-11-06 Thread Jimmy Lin
I see, thanks for explaining what that means. If we are using SSD, then reordering/merging has less impact than traditional mechanical hard disk, so using SSD drive probably can deal with increased concurrent_read better. (?)

Re: tuning concurrent_reads param

2014-11-05 Thread Jimmy Lin
are actually all > busy or not. If its near 32 (or whatever you set it at) all the time it > may be a bottleneck. > > --- > Chris Lohfink > > On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin wrote: > >> Hi, >> looking at the docs, the default value for concurrent_re

tuning concurrent_reads param

2014-10-29 Thread Jimmy Lin
Hi, looking at the docs, the default value for concurrent_reads is 32, which seems bit small to me (comparing to say http server)? because if my node is receiving slight traffic, any more than 32 concurrent read query will have to wait.(?) Recommend rule is, 16* number of drives. Would that be dif

frequently update/read table and level compaction

2014-10-20 Thread Jimmy Lin
Hi, I have a column family/ table that has frequent update on one of the column, and one column that has infrequent update. Rest of the columns never changed. Our application also read frequently on this table. We have seen some read latency issue on this table and plan to switch to use level comp

Re: row caching for frequently updated column

2014-04-29 Thread Jimmy Lin
thanks all for the pointers. let' me see if I can put the sequences of event together 1.2 people mis-understand/mis-use row cache, that cassandra cached the entire row of data even if you are only looking for small subset of the row data. e.g select single_column from a_wide_row_table will r

Re: row caching for frequently updated column

2014-04-29 Thread Jimmy Lin
y to "preheat" key and page cache, but I > don't believe this is possible for row cache. > > Hope that helps. > > Jonathan > > > Jonathan Lacefield > Solutions Architect, DataStax > (404) 822 3487 > <http://www.linkedin.com/in/jlacefield> > >

row caching for frequently updated column

2014-04-28 Thread Jimmy Lin
I am wondering if there is any negative impact on Cassandra write operation, if I turn on row caching for a table that has mostly 'static columns' but few frequently write columns (like timestamp). The application will frequently write to a few columns, and the application will also frequently que

fixed size collection possible?

2014-04-21 Thread Jimmy Lin
hi, look at the collection type support in cql3, e.g http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html we can append or remove using "+" and "-" operator UPDATE users SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo'; UPDATE users SET top_places = t

cql IN clause question

2014-01-29 Thread Jimmy Lin
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333') is there a limit on how many item you can specify inside IN clause? CQL IN clause will help reduce the round trip traffic otherwise needed if use multiple select statement, correct? but how about the co-ordinate node that

Re: question about secondary index or not

2014-01-28 Thread Jimmy Lin
> wrote: > >> Generally indexes on binary fields true/false male/female are not >> terrible effective. >> >> >> On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin wrote: >> >>> I have a simple column family like the following >>> >

question about secondary index or not

2014-01-28 Thread Jimmy Lin
I have a simple column family like the following create table people( company_id text, employee_id text, gender text, primary key(company_id, employee_id) ); if I want to find out all the "male" employee given a company id, I can do 1/ select * from people where company_id=' and loop through

Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
e from last key, but doesn't do anything good to the token function. The argument to the token should really be the actual key value. On Tue, Oct 1, 2013 at 9:32 AM, Jimmy Lin wrote: > thanks, yea i am aware of that, and have already taken care. > > I just also found out a similar

Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
ages' can get truncated in > the middle of a wide row. > > See > https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ > > Jan > > > > On 01.10.2013, at 18:12, Jimmy Lin wrote: > > > unfortunately, i have to stick

Re: paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
:30 AM, Jan Algermissen < > jan.algermis...@nordsc.com> wrote: > >> Jimmy, >> >> On 01.10.2013, at 17:26, Jimmy Lin wrote: >> >> > i have a table like the following: >> > >> > CREATE TABLE log ( >> > mykey timeuuid, >> >

paging through a table with timeuuid primary key

2013-10-01 Thread Jimmy Lin
i have a table like the following: CREATE TABLE log ( mykey timeuuid, type text, msg text, primary key(mykey, type) ); I want to page through all the results from the table using select * from log where token(mykey) > token(maxTimeuuid(x)) limit 100; (where xxx is 0 for the first query, and

changing the primary key type of a table

2013-09-27 Thread Jimmy Lin
hi, we have a table that its primary key is uuid type. Now we decide that we need to use text type as it is more flexible for our application. #1 is there any downside using text as primary key? any performance impact on the partition ? #2 There is no way to alter a table's primary key with a cql

CQL consistency level using astyanax

2013-09-20 Thread Jimmy Lin
hi, i am using astyanax to access a multi nodes cassandra cluster. In my connnection configuration setup, i already declared a global consistency read/write level by setting: AstanaxConfiguration.setDefaultWriteConsistencyLevel() AstanaxConfiguration.setDefaultReadConsistencyLevel() however, fro

read consistency and clock drift and ntp

2013-09-10 Thread Jimmy Lin
hi, I have few question around the area how Cassandra use record's timestamp to determine which one to return from its replicated nodes ... - A record's timestamp is determined by the Cassandra server node's system timestamp when the request arrive the server and NOT by the client timestamp who ma

Re: get all row keys of a table using CQL3

2013-07-23 Thread Jimmy Lin
; Check out the token function: > > > http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results > > You can use it to page through your rows. > > Blake > > > On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote: > > hi, > I want to fetch all

get all row keys of a table using CQL3

2013-07-23 Thread Jimmy Lin
hi, I want to fetch all the row keys of a table using CQL3: e.g select id from mytable limit 999 #1 For this query, does the node need to wait for all rows return from all other nodes before returning the data to the client(I am using astyanax) ? In other words, will this operation create a

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
t; > > > > -Original Message- > From: y2k...@gmail.com on behalf of Jimmy Lin > Sent: Thu 11-Jul-13 13:09 > To: user@cassandra.apache.org > Subject: Re: data model question : finding out the n most recent changes > items > > what I mean is, I really just w

Re: data model question : finding out the n most recent changes items

2013-07-11 Thread Jimmy Lin
hanges. I basically end up pulling out series of > modification timestamp for the same directory. > Not sure I understand the problem. > > Cheers > > > - > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http:/

data model question : finding out the n most recent changes items

2013-07-09 Thread Jimmy Lin
I have an application that need to find out the n most recent modified files for a given user id. I started out few tables but still couldn't get what i want, I hope someone get point to some right direction... See my tables below. #1 won't work, because file_id's timeuuid contains creation time,