hi,
if i have a row in a table that contain large data (not necessary super
wide row), say 10 G and a replication factor of 3.
During a repair, if the data of the row in each of the node is simply off
by 1 byte, is cassandra smart enough to stream only partial of the data
(maybe based on a range o
hi all,
I have some customized retry policies that want to test.
In my single node local cluster, is there anyway to simulate the read/write
timeout and or unavailable exception?
I tried to kill the Cassandra process but it won't result in unavailable
exception but no host available exception and s
hi all,
we like to consider using light weight transaction like the following:
begin batch:
update table set x=y where id=A if not exists;
update table set x=y where id=B if not exists;
update table set x=y where id=C if not exists;
update table set x=y where id=D if not exists;
apply batch
(using
I have a following table(using default sized tier compaction) that its column
get TTLed every hour(as we want to keep only the last 1 hour events)
And I do
Select * from mytable where object_id = ‘’ LIMIT 1;
And since query only interested in last/latest value, will cassandra need to
scan m
Hi all,
What is the difference between datastax driver Batch and BatchStatement?
In particular, BatchStatment call out that it needs native protocol of
version 2 or above.
What is the advantage using native protocol 2.0 for batch execution?
Will any of these two api smart enough to split a big b
* from repair_history where keyspace = 'ks' columnfamily_name =
> 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants
> CONTAINS 'node_IP';
>
>
>
> 2016-02-25 16:22 GMT-03:00 Jimmy Lin :
>
>> hi Paulo,
>>
>> one more fo
nd updates _on
> the query being performed_.
> 3) Repair.
>
> If a machine goes down for longer than max_hint_window_in_ms, AFAIK you
> _will_ have missing data. If you cannot tolerate this situation, you need
> to take a look at your tunable consistency and/or trigger a repair.
>
aemeon C.M. ReiydelleUSA (+1) 415.501.0198
> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
> <%28%2B44%29%20%280%29%2020%208144%209872>*
>
> On Thu, Feb 25, 2016 at 11:36 AM, Jimmy Lin wrote:
>
>> hi all,
>>
>> what are the better ways to check repl
* from repair_history where keyspace = 'ks' columnfamily_name =
> 'cf' and id > mintimeuuid(now() - gc_grace_seconds/2) AND participants
> CONTAINS 'node_IP';
>
>
>
> 2016-02-25 16:22 GMT-03:00 Jimmy Lin :
>
>> hi Paulo,
>>
>
hi all,
what are the better ways to check replication overall status of cassandra
cluster?
within a single DC, unless a node is down for long time, most of the time i
feel it is pretty much non-issue and things are replicated pretty fast. But
when a node come back from a long offline, is there
different keyspaces will have different
> repair session ids.
>
> 2016-02-25 15:04 GMT-03:00 Jimmy Lin :
>> hi Paulo,
>> follow up on the # of entries question...
>> why each job repair execution will have 2 entries? I thought it will be one
>> entry, begin
> cluster?
>
> Check if repair is being executed on all nodes within gc_grace_seconds, and
> tune that value or troubleshoot problems otherwise.
>
> > Scanning through parent_repair_history and making sure all the known
> > keyspaces has a good repair run in recent d
all nodes within gc_grace_seconds, and
> tune that value or troubleshoot problems otherwise.
>
> > Scanning through parent_repair_history and making sure all the known
> > keyspaces has a good repair run in recent days?
>
> Sounds good.
>
> You can check https://
hi all,
few questions regarding how to read or digest the
system_distributed.parent_repair_history CF, that I am very intereted to
use to find out our repair status...
-
Is every invocation of nodetool repair execution will be recorded as one
entry in parent_repair_history CF regardless if it is a
com> wrote:
> That is a problem, you should not have RF > N.
>
> Do an alter table to fix it.
>
> This will affect your reads and writes if you're doing anything > CL 1 -->
> timeouts.
> On Apr 23, 2015 4:35 AM, "Jimmy Lin" wrote:
>
>> Als
Also I am not sure it matters, but I just realized the keyspace created has
replication factor of 2 when my Cassandra is really just a single node.
Is Cassandra smart enough to ignore the RF of 2 and work with only 1 single
node?
On Mon, Apr 20, 2015 at 8:23 PM, Jimmy Lin wrote:
> hi,
>
world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such
>
> Software Engineer in Test | jim.witsc...@datastax.com
>
> On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin wrote:
> > hi,
> > we have some unit tests that run parallel that will create tmp keyspace,
> and
> > tables and then drop them after tests are done.
> &
hi,
we have some unit tests that run parallel that will create tmp keyspace,
and tables and then drop them after tests are done.
>From time to time, our create table statement run into "All hosts(s) for
query failed... Timeout during read" (from datastax driver) error.
We later turn on tracing, a
Hi,
Ran into RPC timeout exception when execution a query that involve
secondary index of a Boolean column when for example the company has more
than 1k person.
select * from company where company_id= and isMale = true;
such extreme low cardinality of secondary index like the other docs
stat
s wrote:
>
> On Sun, Nov 16, 2014 at 5:13 PM, Jimmy Lin wrote:
>
>> I have read that read repair suppose to be running as background, but
>> does the co-ordinator node need to wait for the response(along with other
>> normal read tasks) before return the entire result
I have a CF that use the default, read_repair_chance (0.1) and
dc_read_repair_chance(0).
Our read and write is all local_quorum, on one of the 2 DC, replication of
3.
so a read will have 10% chance trigger a read repair to other DC.
#
I have read that read repair suppose to be running as back
om Mailbox <https://www.dropbox.com/mailbox>
>
>
> On Sat, Nov 15, 2014 at 9:40 AM, Jimmy Lin wrote:
>
>> Well we are able to do the tracing under normal load, but not yet able
>> to turn on tracing on demand during heavy load from client side(due to hard
>> to
>> wrote:
>>
>>> It saves a lot of information for each request thats traced so there is
>>> significant overhead. If you start at a low probability and move it up
>>> based on the load impact it will provide a lot of insight and you can
>>> control
is there any significant performance penalty if one turn on Cassandra
query tracing, through DataStax java driver (say, per every query request
of some trouble query)?
More sampling seems better but then doing so may also slow down the system
in some other ways?
thanks
I see, thanks for explaining what that means.
If we are using SSD, then reordering/merging has less impact than
traditional mechanical hard disk, so using SSD drive probably can deal
with increased concurrent_read better. (?)
are actually all
> busy or not. If its near 32 (or whatever you set it at) all the time it
> may be a bottleneck.
>
> ---
> Chris Lohfink
>
> On Wed, Oct 29, 2014 at 10:41 PM, Jimmy Lin wrote:
>
>> Hi,
>> looking at the docs, the default value for concurrent_re
Hi,
looking at the docs, the default value for concurrent_reads is 32, which
seems bit small to me (comparing to say http server)? because if my node is
receiving slight traffic, any more than 32 concurrent read query will have
to wait.(?)
Recommend rule is, 16* number of drives. Would that be dif
Hi,
I have a column family/ table that has frequent update on one of the
column, and one column that has infrequent update. Rest of the columns
never changed. Our application also read frequently on this table.
We have seen some read latency issue on this table and plan to switch to
use level comp
thanks all for the pointers.
let' me see if I can put the sequences of event together
1.2
people mis-understand/mis-use row cache, that cassandra cached the entire
row of data even if you are only looking for small subset of the row data.
e.g
select single_column from a_wide_row_table
will r
y to "preheat" key and page cache, but I
> don't believe this is possible for row cache.
>
> Hope that helps.
>
> Jonathan
>
>
> Jonathan Lacefield
> Solutions Architect, DataStax
> (404) 822 3487
> <http://www.linkedin.com/in/jlacefield>
>
>
I am wondering if there is any negative impact on Cassandra write
operation, if I turn on row caching for a table that has mostly 'static
columns' but few frequently write columns (like timestamp).
The application will frequently write to a few columns, and the application
will also frequently que
hi,
look at the collection type support in cql3,
e.g
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
we can append or remove using "+" and "-" operator
UPDATE users
SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
UPDATE users
SET top_places = t
select * from mytable where mykey IN('xxx', 'yyy', 'zzz','111',222','333')
is there a limit on how many item you can specify inside IN clause?
CQL IN clause will help reduce the round trip traffic otherwise needed if
use multiple select statement, correct?
but how about the co-ordinate node that
> wrote:
>
>> Generally indexes on binary fields true/false male/female are not
>> terrible effective.
>>
>>
>> On Tue, Jan 28, 2014 at 12:40 PM, Jimmy Lin wrote:
>>
>>> I have a simple column family like the following
>>>
>
I have a simple column family like the following
create table people(
company_id text,
employee_id text,
gender text,
primary key(company_id, employee_id)
);
if I want to find out all the "male" employee given a company id, I can do
1/
select * from people where company_id='
and loop through
e from last key, but doesn't do
anything good to the token function. The argument to the token should
really be the actual key value.
On Tue, Oct 1, 2013 at 9:32 AM, Jimmy Lin wrote:
> thanks, yea i am aware of that, and have already taken care.
>
> I just also found out a similar
ages' can get truncated in
> the middle of a wide row.
>
> See
> https://groups.google.com/a/lists.datastax.com/d/msg/java-driver-user/lHQ3wKAZgM4/DnlXT4IzqsQJ
>
> Jan
>
>
>
> On 01.10.2013, at 18:12, Jimmy Lin wrote:
>
> > unfortunately, i have to stick
:30 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> Jimmy,
>>
>> On 01.10.2013, at 17:26, Jimmy Lin wrote:
>>
>> > i have a table like the following:
>> >
>> > CREATE TABLE log (
>> > mykey timeuuid,
>> >
i have a table like the following:
CREATE TABLE log (
mykey timeuuid,
type text,
msg text,
primary key(mykey, type)
);
I want to page through all the results from the table using
select * from log where token(mykey) > token(maxTimeuuid(x)) limit 100;
(where xxx is 0 for the first query, and
hi,
we have a table that its primary key is uuid type. Now we decide that we
need to use text type as it is more flexible for our application.
#1
is there any downside using text as primary key? any performance impact on
the partition ?
#2
There is no way to alter a table's primary key with a cql
hi,
i am using astyanax to access a multi nodes cassandra cluster.
In my connnection configuration setup, i already declared a global
consistency read/write level by setting:
AstanaxConfiguration.setDefaultWriteConsistencyLevel()
AstanaxConfiguration.setDefaultReadConsistencyLevel()
however, fro
hi,
I have few question around the area how Cassandra use record's timestamp to
determine which one to return from its replicated nodes ...
-
A record's timestamp is determined by the Cassandra server node's system
timestamp when the request arrive the server and NOT by the client
timestamp who ma
; Check out the token function:
>
>
> http://www.datastax.com/docs/1.1/dml/using_cql#paging-through-non-ordered-partitioner-results
>
> You can use it to page through your rows.
>
> Blake
>
>
> On Jul 23, 2013, at 10:18 PM, Jimmy Lin wrote:
>
> hi,
> I want to fetch all
hi,
I want to fetch all the row keys of a table using CQL3:
e.g
select id from mytable limit 999
#1
For this query, does the node need to wait for all rows return from all
other nodes before returning the data to the client(I am using astyanax) ?
In other words, will this operation create a
t;
>
>
>
> -Original Message-
> From: y2k...@gmail.com on behalf of Jimmy Lin
> Sent: Thu 11-Jul-13 13:09
> To: user@cassandra.apache.org
> Subject: Re: data model question : finding out the n most recent changes
> items
>
> what I mean is, I really just w
hanges. I basically end up pulling out series of
> modification timestamp for the same directory.
> Not sure I understand the problem.
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http:/
I have an application that need to find out the n most recent modified
files for a given user id. I started out few tables but still couldn't get
what i want, I hope someone get point to some right direction...
See my tables below.
#1 won't work, because file_id's timeuuid contains creation time,
48 matches
Mail list logo