Re: Looking for pointers about replication internal working

2021-09-02 Thread DuyHai Doan
As far as I remember, Apache Cassandra wanted to be self-sufficient and avoid pulling yet-another-piece-of-external-software for its internal work. With lightweight transactions since 3.0, it has the sufficient primitive for some scenarios that require linearizability My 2 cents Duy Hai DOAN On

Re: What does the community think of the DataStax 4.x Java driver changes?

2020-10-29 Thread DuyHai Doan
Just my 2 cents Because of the tremendous breaking changes in terms of API as well as public facing classes (QueryBuilder for ex) I have stopped the development of the Achilles framework. Migrating to the 4.x version would require almost the complete rewrite of the framework, an effort which I ca

Re: Dynamo autoscaling: does it beat cassandra?

2019-12-09 Thread DuyHai Doan
Out of curiosity, does DynamoDB autoscaling allows you to exceed the partition limits (e.g. push more data than it is allowed for some outlier heavy partitions) ? If yes, it can be interesting (I guess DynamoDB is doing some kind of rebalancing behind the scene). If no, it's just an artificial cap

Re: TTL on UDT

2019-12-09 Thread DuyHai Doan
It depends on.. Latest version of Cassandra allows unfrozen UDT. The individual fields of UDT are updated atomically and they are stored effectively in distinct physical columns inside the partition, thus applying ttl() on them makes sense. I'm not sure however if the CQL parser allows this syntax

Re: Cluster sizing for huge dataset

2019-10-04 Thread DuyHai Doan
ution and limitations. > > Note: that would also probably help you with your init-load/TWCS issue . > > My2c. > Cedrick > > On Tue, Oct 1, 2019 at 11:49 PM DuyHai Doan wrote: > >> The client wants to be able to access cold data (2 years old) in the >> same cluster so

Re: Challenge with initial data load with TWCS

2019-10-01 Thread DuyHai Doan
Thanks Alex for confirming Le 30 sept. 2019 09:17, "Oleksandr Shulgin" a écrit : > On Sun, Sep 29, 2019 at 9:42 AM DuyHai Doan wrote: > >> Thanks Jeff for sharing the ideas. I have some question though: >> >> - CQLSSTableWriter and explicitly break betwee

Re: Cluster sizing for huge dataset

2019-10-01 Thread DuyHai Doan
> Regards > Julien > > Le lun. 30 sept. 2019 à 22:03, DuyHai Doan a écrit : >> >> Thanks all for your reply >> >> The target deployment is on Azure so with the Nice disk snapshot feature, >> replacing a dead node is easier, no streaming from Cassandra &

Re: Cluster sizing for huge dataset

2019-09-30 Thread DuyHai Doan
Thanks all for your reply The target deployment is on Azure so with the Nice disk snapshot feature, replacing a dead node is easier, no streaming from Cassandra About compaction overhead, using TwCs with a 1 day bucket and removing read repair and subrange repair should be sufficient Now the onl

Re: Challenge with initial data load with TWCS

2019-09-29 Thread DuyHai Doan
u do your > historical load using this method > > > > > On Sep 28, 2019, at 1:31 PM, DuyHai Doan wrote: > > > > Hello users > > > > TWCS works great for permanent state. It creates SSTables of roughly > > fixed size if your insertion rate is pretty const

Re: Cluster sizing for huge dataset

2019-09-29 Thread DuyHai Doan
sor data is similar / compressible. > > > On Sep 28, 2019, at 1:23 PM, DuyHai Doan wrote: > > > > Hello users > > > > I'm facing with a very challenging exercise: size a cluster with a huge > > dataset. > > > > Use-case = IoT > > >

Challenge with initial data load with TWCS

2019-09-28 Thread DuyHai Doan
Hello users TWCS works great for permanent state. It creates SSTables of roughly fixed size if your insertion rate is pretty constant. Now the big deal is about the initial load. Let's say we configure a TWCS with window unit = day and window size = 1, we would have 1 SSTable per day and with TT

Cluster sizing for huge dataset

2019-09-28 Thread DuyHai Doan
Hello users I'm facing with a very challenging exercise: size a cluster with a huge dataset. Use-case = IoT Number of sensors: 30 millions Frequency of data: every 10 minutes Estimate size of a data: 100 bytes (including clustering columns) Data retention: 2 years Replication factor: 3 (pretty s

Re: Is it possible to build multi cloud cluster for Cassandra

2019-09-05 Thread DuyHai Doan
Hello all I've given a thought to this multi-cloud marketing buzz with Cassandra Theoretically feasible (with GossipingPropertyFileSnitch) but practically a headache if you want a minimum of performance and security The problem comes from the network "devils in the details" Suppose DC1 in AWS i

Re: Using Cassandra as an object store

2019-04-19 Thread DuyHai Doan
Idea: To guarantee data integrity, you can store an MD5 of all chunks data as static column in the partition that contains the chunks On Fri, Apr 19, 2019 at 9:18 AM cclive1601你 wrote: > we have use cassandra as object store for some years, you can just split > the object into some small pieces

Re: Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread DuyHai Doan
Ok thanks John On Thu, Feb 14, 2019 at 8:51 PM Jonathan Haddad wrote: > Create the first node, setting the tokens manually. > Create the keyspace. > Add the rest of the nodes with the allocate tokens uncommented. > > On Thu, Feb 14, 2019 at 11:43 AM DuyHai Doan wrote: > >&

Usage of allocate_tokens_for_keyspace for a new cluster

2019-02-14 Thread DuyHai Doan
Hello users By looking at the mailing list archive, there was already some questions about the flag "allocate_tokens_for_keyspace" from cassandra.yaml I'm starting a fresh new cluster (with 0 data). The keyspace used by the project is raw_data so I set allocate_tokens_for_keyspace = raw_data in

Re: Make large partitons lighter on select without changing primary partition formation.

2019-02-13 Thread DuyHai Doan
Plain answer is NO There is a slight hope that the JIRA https://issues.apache.org/jira/browse/CASSANDRA-9754 gets into 4.0 release But right now, there seems to be few interest in this ticket, the last comment 23/Feb/2017 old ... On Wed, Feb 13, 2019 at 1:18 PM Vsevolod Filaretov wrote: > Hi

Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
Doesn’t the old sstables drop by itself? One ttl and gc grace seconds >> past whole sstable will have only tombstones. >> >> >> Regards, >> >> Nitan >> >> Cell: 510 449 9629 >> >> On Feb 11, 2019, at 2:23 PM, DuyHai Doan wrote: >> >> Purging data is also straightforward, just dropping SSTables (by a >> script) where create date is older than a threshold, we don't even need to >> rely on TTL >> >>

Re: Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
ws and every read touches all of the windows, > you’re going to have a bad time. > > -- > Jeff Jirsa > > > On Feb 11, 2019, at 12:12 PM, DuyHai Doan wrote: > > Hello users > > On the official documentation for TWCS ( > http://cassandra.apache.org/doc/latest/oper

Max number of windows when using TWCS

2019-02-11 Thread DuyHai Doan
Hello users On the official documentation for TWCS ( http://cassandra.apache.org/doc/latest/operating/compaction.html#time-window-compactionstrategy) it is advised to select the windows unit and size so that the total number of windows intervals is around 20-30. Is there any explanation for this

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
rk with " >>>> spark.cassandra.output.ignoreNulls=true" >>>> This will not cover the situation when a value have to be overwriten >>>> with null. >>>> >>>> I found one possible solution - change the schema to keep only primary >

Re: [EXTERNAL] Howto avoid tombstones when inserting NULL values

2019-01-04 Thread DuyHai Doan
"The problem is I can't know the combination of set/unset values" --> Just for this requirement, Achilles has a working solution for many years using INSERT_NOT_NULL_FIELDS strategy: https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy Or you can use the Update API that by design only perf

Re: Tombstone removal optimization and question

2018-11-06 Thread DuyHai Doan
Thanks for the confirmation Kurt Le 6 nov. 2018 11:59, "kurt greaves" a écrit : > Yes it does. Consider if it didn't and you kept writing to the same > partition, you'd never be able to remove any tombstones for that partition. > > On Tue., 6 Nov. 2018, 19:40 D

Re: Query With Limit Clause

2018-11-06 Thread DuyHai Doan
Cassandra will execute such request using a Partition Range Scan. See more details here http://www.doanduyhai.com/blog/?p=13191, chapter E Cluster Read Path (look at the formula of Concurrency Factor) On Tue, Nov 6, 2018 at 8:21 AM shalom sagges wrote: > Hi All, > > If I run for example: > se

Tombstone removal optimization and question

2018-11-06 Thread DuyHai Doan
Hello all I have tried to sum up all rules related to tombstone removal: -- Given a tombstone written at timestamp (t) for a partition key (P) in SSTable (S1). This tombstone will be removed: 1) after gc_grace_secon

Re: comprehensive list of checks before rolling version upgrades

2018-10-30 Thread DuyHai Doan
To add to your excellent list: - no topology change (joining/leaving/decommissioning) nodes - no rebuild of index/MV under way On Tue, Oct 30, 2018 at 4:35 PM Carl Mueller wrote: > Does anyone have a pretty comprehensive list of these? Many that I don't > currently know how to check but I'm res

Re: Aggregation of Set Data Type

2018-10-23 Thread DuyHai Doan
You will need to use user defined aggregates for this Le 23 oct. 2018 16:46, "Joseph Wonesh" a écrit : > Hello all, > > I am trying to aggregate rows which each contain a column of Set. > I would like the result to contain the sum of all sets, where null would be > equivalent to the empty set.

Re: Released an ACID-compliant transaction library on top of Cassandra

2018-10-16 Thread DuyHai Doan
I think it does use LWT under the hood: https://github.com/scalar-labs/scalardb/blob/master/src/main/java/com/scalar/database/transaction/consensuscommit/CommitMutationComposer.java#L74-L79 return new Put(base.getPartitionKey(), getClusteringKey(base, result).orElse(null)) .forNamespace(b

Re: About UDF/UDA

2018-09-27 Thread DuyHai Doan
> -- > {'': (365, 870, 617, 2), ''': (381, 11668, 6024, 2)} > > I would like to have something lke: > | item| min | max| average | count | > -

Re: About UDF/UDA

2018-09-26 Thread DuyHai Doan
A hint to answer your Q3 is to use a final function to perform the flattening or transformation on the result of the aggregation The syntax of an UDA is: CREATE [OR REPLACE] AGGREGATE [IF NOT EXISTS] aggregateName(type1, type2, …) SFUNC accumulatorFunction STYPE stateType [FINALFUNC finalFunction

Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan
Also for the record, I remember Datastax having something called Tiered Storage that does move data around (folders/disk volume) based on data age. To be checked On Mon, Sep 17, 2018 at 10:23 PM, DuyHai Doan wrote: > Sean > > Without transactions à la SQL, how can you guarantee

Re: [EXTERNAL] Re: cold vs hot data

2018-09-17 Thread DuyHai Doan
Sean Without transactions à la SQL, how can you guarantee atomicity between both tables for upserts ? I mean, one write could succeed with hot table and fail for cold table The only solution I see is using logged batch, with a huge overhead and perf hit on for the writes On Mon, Sep 17, 2018 at

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-12 Thread DuyHai Doan
ahul Singh >> wrote: >> >>> You know what they say: Go big or go home. >>> >>> Right now candidates are Cassandra itself but embedded or on the side >>> not on the actual data clusters, zookeeper (yuck) , Kafka (which needs >>> zookeepe

Re: Using CDC Feature to Stream C* to Kafka (Design Proposal)

2018-09-10 Thread DuyHai Doan
Also using Calvin means having to implement a distributed monotonic sequence as a primitive, not trivial at all ... On Mon, Sep 10, 2018 at 3:08 PM, Rahul Singh wrote: > In response to mimicking Advanced replication in DSE. I understand the > goal. Although DSE advanced replication does one way,

Re: A blog about Cassandra in the IoT arena

2018-08-24 Thread DuyHai Doan
ombstones and a threshold, it would be dedicated to deletion. It may be an > edge case , but people face issues with tombstones all the time because > they don’t know better. > > Rahul > On Aug 23, 2018, 11:50 AM -0500, DuyHai Doan , > wrote: > > As I used to tell some people, th

Re: A blog about Cassandra in the IoT arena

2018-08-23 Thread DuyHai Doan
As I used to tell some people, the day we make : 1. partition size unlimited, or at least huge partition easily manageable (compaction, repair, streaming, partition index file) 2. tombstone a non-issue that day, Cassandra will dominate any other IoT technology out there Until then ... On Thu, A

Re: full text search on some text columns

2018-07-31 Thread DuyHai Doan
I had SASI in mind before stopping myself from replying to this thread. Actually the OP needs to index clustering column and partition key, and as far as I remember, I've myself opened a JIRA and pushed a patch for SASI to support indexing composite partition key but there are some issues so far pr

Re: which driver to use with cassandra 3

2018-07-20 Thread DuyHai Doan
Spring data cassandra is so so ... It has less features (at last at the time I looked at it) than the default Java driver For driver, right now most of people are using Datastax's ones On Fri, Jul 20, 2018 at 3:36 PM, Vitaliy Semochkin wrote: > Hi, > > Which driver to use with cassandra 3 > > t

Re: default_time_to_live vs TTL on insert statement

2018-07-12 Thread DuyHai Doan
for an entire table by setting the table's >>> default_time_to_live >>> <https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableDefaultTTL> >>> property. If you try to set a TTL for a specific column that is longer >>> than the

Re: default_time_to_live vs TTL on insert statement

2018-07-11 Thread DuyHai Doan
default_time_to_live property applies if you don't specify any TTL on your CQL statement However you can always override the default_time_to_live

Re: [ANNOUNCE] LDAP Authenticator for Cassandra

2018-07-05 Thread DuyHai Doan
Super great, thank you for this contribution Kurt! On Thu, Jul 5, 2018 at 1:49 PM, kurt greaves wrote: > We've seen a need for an LDAP authentication implementation for Apache > Cassandra so we've gone ahead and created an open source implementation > (ALv2) utilising the pluggable auth support

Re: Write performance degradation

2018-06-18 Thread DuyHai Doan
Maybe the disk I/O cannot keep up with the high mutation rate ? Check the number of pending compactions On Sun, Jun 17, 2018 at 9:24 AM, onmstester onmstester wrote: > Hi, > > I was doing 500K inserts + 100K counter update in seconds on my cluster of > 12 nodes (20 core/128GB ram/4 * 600 HDD 10

Re: Data Proxy for Cassandra

2018-06-11 Thread DuyHai Doan
Hello Chidamber When you said "In addition, the data proxy is distributed based on consistent hashing and using gossip between data proxy nodes to keep the cached data unique (per node) and consistent", did you re-implement Consistent hashing and gossip algorithm from scratch in your proxy layer ?

Re: what's the read cl of list read-on-write operations?

2018-04-20 Thread DuyHai Doan
the item to set finally is not b but d, > which is unexpected from the perspective of the previous read. > > Why Cassandra do not read from cluster with somehow read CL before > updating the list? > > > 2018-04-20 16:12 GMT+08:00 DuyHai Doan : > > The read operation on the

Re: what's the read cl of list read-on-write operations?

2018-04-20 Thread DuyHai Doan
The read operation on the list column is done locally on each replica so replication factor does not really apply here On Fri, Apr 20, 2018 at 7:37 AM, Jinhua Luo wrote: > Hi All, > > Some list operations, like set by index, needs to read the whole list > before update. > So what's the read cons

Re: Does Cassandra supports ACID txn

2018-04-19 Thread DuyHai Doan
No ACID transaction any soon in Cassandra On Thu, Apr 19, 2018 at 7:35 AM, Rajesh Kishore wrote: > Hi, > > I am bit confused by reading different articles, does recent version of > Cassandra supports ACID transaction ? > > I found BATCH command , but not sure if it supports rollback, consider >

Re: where does c* store the schema?

2018-04-16 Thread DuyHai Doan
There is a system_schema keyspace to store all the schema information https://docs.datastax.com/en/cql/3.3/cql/cql_using/useQuerySystem.html#useQuerySystem__table_bhg_1bw_4v On Mon, Apr 16, 2018 at 10:48 AM, Jinhua Luo wrote: > Hi All, > > Does c* use predefined keyspace/tables to store the use

Re: Can I sort it as a result of group by?

2018-04-09 Thread DuyHai Doan
No, sorting by column other than clustering column is not possible On Mon, Apr 9, 2018 at 11:42 AM, Eunsu Kim wrote: > Hello, everyone. > > I am using 3.11.0 and I have the following table. > > CREATE TABLE summary_5m ( > service_key text, > hash_key int, > instance_hash int, > c

Re: Text or....

2018-04-04 Thread DuyHai Doan
Compressing client-side is better because it will save: 1) a lot of bandwidth on the network 2) a lot of Cassandra CPU because no decompression server-side 3) a lot of Cassandra HEAP because the compressed blob should be relatively small (text data compress very well) compared to the raw size On

Re: Text or....

2018-04-04 Thread DuyHai Doan
Compress it and stores it as a blob. Unless you ever need to index it but I guess even with SASI indexing a so huge text block is not a good idea On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges wrote: > Hi All, > > A certain application is writing ~55,000 characters for a single row. Most > of the

Re: Cassandra filter with ordering query modeling

2018-03-01 Thread DuyHai Doan
https://www.slideshare.net/doanduyhai/datastax-day-2016-cassandra-data-modeling-basics On Thu, Mar 1, 2018 at 3:48 PM, Valentina Crisan wrote: > 1) I created another table for Query#2/3. The partition Key was StartTime > and clustering key was name. When I execute my queries, I get an exception

Re: Secondary Indexes C* 3.0

2018-02-22 Thread DuyHai Doan
Read this: http://www.doanduyhai.com/blog/?p=13191 On Thu, Feb 22, 2018 at 6:44 PM, Akash Gangil wrote: > To provide more context, I was going through this > https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html# > useWhenIndex__highCardCol > > On Thu, Feb 22, 2018 at 9:35 AM,

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
So before buying any marketing claims from Microsoft or whoever, maybe should you try to use it extensively ? And talking about backup, have a look at DynamoDB: http://i68.tinypic.com/n1b6yr.jpg >From my POV, if a multi-billions company like Amazon doesn't get it right or can't make it easy for e

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-21 Thread DuyHai Doan
For UI and interactive data exploration there is already the Cassandra interpreter for Apache Zeppelin that is more than decent for the job On Wed, Feb 21, 2018 at 9:19 AM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > But what does this video really show? That Microsoft m

Re: LWT broken?

2018-02-11 Thread DuyHai Doan
Mahdi , the issue in your code is here: else // we lost LWT, fetch the winning value 9existing_id = SELECT id FROM hash_id WHERE hash=computed_hash | consistency = ONE You lost LWT, it means that there is a concurrent LWT that has won the Paxos round and has applied the value using QUORUM/SE

Re: GDPR, Right to Be Forgotten, and Cassandra

2018-02-09 Thread DuyHai Doan
Or use the new user-defined compaction option recently introduced, provided you can determine over which SSTables a partition is spread On Fri, Feb 9, 2018 at 5:23 PM, Jon Haddad wrote: > Give this a read through: > > https://github.com/protectwise/cassandra-util/tree/master/deleting- > compacti

Re: group by select queries

2018-02-01 Thread DuyHai Doan
Worth digging into the source code of GROUP BY but as far as I remember, using GROUP BY without any aggregation function will lead to C* picking just the first row (or maybe last, not sure on this point) row at hand. About ordering, since the grouping is on a component of partition key, do not exp

Re: Too many tombstones using TTL

2018-01-10 Thread DuyHai Doan
"The question is why Cassandra creates a tombstone for every column instead of single tombstone per row?" --> Simply because technically it is possible to set different TTL value on each column of a CQL row On Wed, Jan 10, 2018 at 2:59 PM, Python_Max wrote: > Hello, C* users and experts. > > I

Re: CQL Map vs clustering keys

2017-11-15 Thread DuyHai Doan
Yes, your remark is correct. However, once CASSANDRA-7396 (right now in 4.0 trunk) get released, you will be able to get a slice of map values using their (sorted) keys SELECT map[fromKey ... toKey] FROM TABLE ... Needless to say, it will be also possible to get a single element from the map by

Re: Securing Cassandra database

2017-11-13 Thread DuyHai Doan
You can pass in login/password from the client side and encrypt the client / cassandra connection... Le 13 nov. 2017 12:16, "Mokkapati, Bhargav (Nokia - IN/Chennai)" < bhargav.mokkap...@nokia.com> a écrit : Hi Team, We are using Apache Cassandra 3.0.13 version. As part of Cassandra database

Re: Cassandra using a ton of native memory

2017-11-03 Thread DuyHai Doan
8Gb of RAM being a recommended production setting for most of the workload out there. Having only 16Gb of RAM, and because Cassandra is relying a lot on system page cache, there should be no surprise that your 16Gb being eaten up. On Fri, Nov 3, 2017 at 5:40 PM, Austin Sharp wrote: > I’ve invest

Re: Would User Defined Type(UDT) nested in a LIST collections column type give good read performance

2017-10-30 Thread DuyHai Doan
Hello Bill First if you don't care about insertion order it's better to use Set rather than list. List implementation requires read before write for some operations. Second, the read performance of the collection itself depends on 2 factors : 1) collection cardinality e.g. the number of elements

Re: Golang + Cassandra + Text Search

2017-10-24 Thread DuyHai Doan
There is already a full text search index in Cassandra called SASI On Tue, Oct 24, 2017 at 6:50 AM, Ridley Submission < ridley.submission2...@gmail.com> wrote: > Hi, > > Quick question, I am wondering if anyone here who works with Go has > specific recommendations for as simple framework to add t

Re: Does NTP affects LWT's ballot UUID?

2017-10-10 Thread DuyHai Doan
The ballot UUID is obtained using QUORUM agreement between replicas for a given partition key and we use this TimeUUID ballot as write-time for the mutation. The only scenario where I can see a problem is that NTP goes backward in time on a QUORUM of replicas, which would break the contract of mon

Re: new question ;-) // RE: understanding batch atomicity

2017-09-29 Thread DuyHai Doan
ns (A) & (B) done in an atomic way (all or nothing) ? > > Thanks. > > Dominique > > > > [@@ THALES GROUP INTERNAL @@] > > *De :* DuyHai Doan [mailto:doanduy...@gmail.com ] > *Envoyé :* vendredi 29 septembre 2017 17:10 > *À :* user > *Objet :* Re:

Re: understanding batch atomicity

2017-09-29 Thread DuyHai Doan
All updates here means all mutations == INSERT/UPDATE or DELETE On Fri, Sep 29, 2017 at 5:07 PM, DE VITO Dominique < dominique.dev...@thalesgroup.com> wrote: > Hi, > > > > About BATCH, the Apache doc https://cassandra.apache.org/ > doc/latest/cql/dml.html?highlight=atomicity says : > > > > “*Th

Re: data loss in different DC

2017-09-28 Thread DuyHai Doan
If you're writing into DC1 with CL = LOCAL_xxx, there is no guarantee to be sure to read the same data in DC2. Only repair will help you On Thu, Sep 28, 2017 at 11:41 AM, Peng Xiao <2535...@qq.com> wrote: > Dear All, > > We have a cluster with one DC1:RF=3,another DC DC2:RF=1 only for ETL,but > w

Re: Datastax Driver Mapper & Secondary Indexes

2017-09-26 Thread DuyHai Doan
If you're looking for schema generation from Bean annotations: https://github.com/doanduyhai/Achilles/wiki/DDL-Scripts-Generation On Tue, Sep 26, 2017 at 2:50 PM, Daniel Hölbling-Inzko < daniel.hoelbling-in...@bitmovin.com> wrote: > Hi, I also just figured out that there is no schema generation o

Re: Self-healing data integrity?

2017-09-11 Thread DuyHai Doan
; Jeff Jirsa > > > On Sep 9, 2017, at 12:59 PM, Jeff Jirsa wrote: > > There is, but they aren't consulted on the streaming paths (only on normal > reads) > > > -- > Jeff Jirsa > > > On Sep 9, 2017, at 12:02 PM, DuyHai Doan wrote: > > Jeff, > &g

Re: Self-healing data integrity?

2017-09-09 Thread DuyHai Doan
Jeff, With default compression enabled on each table, isn't there CRC files created along side with SSTables that can help detecting bit-rot ? On Sat, Sep 9, 2017 at 7:50 PM, Jeff Jirsa wrote: > Cassandra doesn't do that automatically - it can guarantee consistency on > read or write via Cons

Re: Lightweight transaction in Multi DC

2017-09-09 Thread DuyHai Doan
; > > > On Fri, Sep 8, 2017 at 2:33 PM, Charulata Sharma (charshar) < > chars...@cisco.com> wrote: > > Yes …it is with LOCAL_SERIAL. Should I be using SERIAL ? > > > > Thanks, > > Charu > > > > *From: *DuyHai Doan > *Reply-To: *"user@cassa

Re: Lightweight transaction in Multi DC

2017-09-08 Thread DuyHai Doan
Are you using CAS with SERIAL consistency level for your multi-DC setup ? On Fri, Sep 8, 2017 at 9:27 PM, Charulata Sharma (charshar) < chars...@cisco.com> wrote: > Hi, > > We are facing a serious issue with CAS in a multi DC setup and I > wanted to get some input on it from the forum. > >

Re: No columns are defined for Materialized View other than primary key

2017-09-07 Thread DuyHai Doan
s one more column "data" here in MView? >> >> On 7 Sep 2017 7:49 p.m., "DuyHai Doan" wrote: >> >>> The answer of your question is in the error message. For once it's very >>> clear. The primary key of your materialized view is EXACTLY t

Re: No columns are defined for Materialized View other than primary key

2017-09-07 Thread DuyHai Doan
The answer of your question is in the error message. For once it's very clear. The primary key of your materialized view is EXACTLY the same as for your base table. So the question is what's the point creating this materialized view ... On Thu, Sep 7, 2017 at 4:01 PM, Alex Kotelnikov < alex.kot

[ANNOUNCE] Achilles 5.3.0

2017-08-26 Thread DuyHai Doan
Hello Cassandra users I'm happy to announce the release of Achilles 5.3.0 The new added features are - Support for Cassandra up to 3.11.0 and Datastax Enterprise up to 5.1.2 - Support for new Duration type (CASSANDRA-11873) - Support for literal value in (CASSANDRA-10783) - Support for GROUP BY

Re: SASI and secondary index simultaniously

2017-07-12 Thread DuyHai Doan
In the original source code Sasi will be chosen instead of secondary index Le 12 juil. 2017 09:13, "Vlad" a écrit : > Hi, > > it's possible to create both regular secondary index and SASI on the same > column: > > > > > *CREATE TABLE ks.tb (id int PRIMARY KEY, name text);CREATE CUSTOM INDEX > t

Re: timeoutexceptions with UDF causing cassandra forceful exits

2017-07-03 Thread DuyHai Doan
Beside the config of user_function_timeout_policy, I would say having an UDF that times out badly is generally an indication that you should review your UDF code On Mon, Jul 3, 2017 at 7:58 PM, Jeff Jirsa wrote: > > > On 2017-06-29 17:00 (-0700), Akhil Mehra wrote: > > By default user_function_

Re: UDF for sorting

2017-07-03 Thread DuyHai Doan
Plain answer is no you can't The reason is that UDF only transform column values on each row but does not have the ability to modify rows ordering On Mon, Jul 3, 2017 at 10:14 PM, techpyaasa . wrote: > Hi all, > > I have a table like > > CREATE TABLE ks.cf ( pk1 bigint, cc1 bigint, disp_name te

Re: SASI index on datetime column does not filter on minutes

2017-06-19 Thread DuyHai Doan
The + in the date format is necessary to specify timezone On Mon, Jun 19, 2017 at 5:38 PM, Hannu Kröger wrote: > Hello, > > I tried the same thing with 3.10 which I happened to have at hand and that > seems to work. > > cqlsh:test> select lastname,firstname,dateofbirth from individuals where

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan
12, 2017 at 10:03 AM DuyHai Doan wrote: > >> For all those promoting ES as a PRIMARY datastore, please read this >> before: >> >> https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13 >> >> There are a lot of warning before recomme

Re: Reg:- Cassandra Data modelling for Search

2017-06-12 Thread DuyHai Doan
For all those promoting ES as a PRIMARY datastore, please read this before: https://discuss.elastic.co/t/elasticsearch-as-a-primary-database/85733/13 There are a lot of warning before recommending ES as a datastore. The answer from Pilato, ES official evangelist: - You absolutely care about

Re: Cassandra & Spark

2017-06-08 Thread DuyHai Doan
Interesting Tobias, when you said "Instead we transferred the data to Apache Kudu", did you transfer all Cassandra data into Kudu from with a single migration and then tap into Kudo for aggregation or did you run data import every day/week/month from Cassandra into Kudu ? >From my point of view,

Re: Understanding the limitation to only one non-PK column in MV-PK

2017-06-06 Thread DuyHai Doan
All the explanation for why just 1 non PK column can be used as PK for MV is here: https://skillsmatter.com/skillscasts/7446-cassandra-udf-and-materialised-views-in-depth Skip to 19:18 for the explanation On Mon, May 8, 2017 at 8:08 PM, Fridtjof Sander < fridtjof.san...@googlemail.com> wrote: >

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan
econds. > > > > My impression was that Spark is aimed at larger scale analytics. > > > > I am ok with the limitation on “group by”. I am intending to use async > queries and token-aware load balancing to partition the query and execute > it in parallel on each nod

Re: Order by for aggregated values

2017-06-06 Thread DuyHai Doan
First Group By is only allowed on partition keys and clustering columns, not on arbitrary column. The internal implementation of group by tries to fetch data on clustering order to avoid having to "re-sort" them in memory which would be very expensive Second, group by works best when restricted to

Re: Reg:- Generate dummy data for Cassandra Tables

2017-06-04 Thread DuyHai Doan
Personally I'm using https://github.com/Marak/faker.js/ to generate various kind of dataset. That's the most comprehensive "free" data generator I've found so far but it's in JS. On Mon, Jun 5, 2017 at 7:13 AM, Jeff Jirsa wrote: > On 2017-06-04 20:03 (-0700), "@Nandan@" > wrote: > > Hi All, >

Re: Apache Cassandra - Configuration Management

2017-05-17 Thread DuyHai Doan
For configuration management there are tons of tools out there: - ansible - chef - puppet - saltstack I surely forgot a few others On Wed, May 17, 2017 at 6:33 PM, ZAIDI, ASAD A wrote: > Good Morning Folks – > > > > I’m running 14 nodes Cassandra cluster in two data centers , each node is > h

Re: Reg:- DSE 5.1.0 Issue

2017-05-16 Thread DuyHai Doan
Nandan Since you have asked many times questions about DSE on this OSS mailing list, I suggest you to contact directly Datastax if you're using their enterprise edition. Every Datastax customer has access to their support. If you're a sub-contractor for a final customer that is using DSE, ask your

Re: Testing Stratio Index Queries with Cassandra-Stress Tool

2017-04-25 Thread DuyHai Doan
Use Gatling with the CQL plugin: https://github.com/gatling-cql/GatlingCql On Tue, Apr 25, 2017 at 2:36 PM, Akshay Suresh < akshay.sur...@unotechsoft.com> wrote: > Hi > > I have a set of tables with Stratio Index. > > Is there anyway to test Stratio based SELECT queries using the > cassandra-stre

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
Using MV and put id as partition key is your best bet right now. SASI will be too expensive for this simple use case On Thu, Feb 16, 2017 at 3:21 PM, Micha wrote: > > > it's like having a table (sha256 blob primary key, id timeuuid, data1 > text, ., ) > > So both, sha256 and id are unique. >

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
[image: Inline image 1] On Thu, Feb 16, 2017 at 3:08 PM, Micha wrote: > > > On 16.02.2017 14:30, DuyHai Doan wrote: > > Why indexing BLOB data ? It does not make any sense > > My partition key is a secure hash sum, I don't index a blob. > > > > >

Re: sasi index question (read timeout on many selects)

2017-02-16 Thread DuyHai Doan
Why indexing BLOB data ? It does not make any sense "I thought sasi index is globally held, in contrast to the normal secondary index.." --> Who said that ? It's just wrong On Thu, Feb 16, 2017 at 1:50 PM, Micha wrote: > Hi, > > > my table has (among others) three columns, which are unique blob

Re: Time series data model and tombstones

2017-02-08 Thread DuyHai Doan
now, and things are stable so far. I also had a patch to > the application code to implement date partitioning ready to go, but I > wanted to see how things went with only making the compaction changes. > > On Sun, Jan 29, 2017 at 4:05 PM, DuyHai Doan wrote: > >> In theory, you

Re: Why does CockroachDB github website say Cassandra has no Availability on datacenter failure?

2017-02-07 Thread DuyHai Doan
The link you posted doesn't say anything about Cassandra Le 7 févr. 2017 11:41, "Kant Kodali" a écrit : > Why does CockroachDB github website say Cassandra has no Availability on > datacenter failure? > > https://github.com/cockroachdb/cockroach >

Re: Global TTL vs Insert TTL

2017-02-01 Thread DuyHai Doan
rustyrazorblade ~/dev/cassandra/data/data/test$ > ../../../tools/bin/sstablemetadata a-7bca6b50e8a511e6869a5596edf4dd > 35/mc-1-big-Data.db > . > SSTable max local deletion time: 1485980862 > > On Wed, Feb 1, 2017 at 6:59 AM DuyHai Doan wrote: > >> Global TTL is better than dynamic

Re: Global TTL vs Insert TTL

2017-02-01 Thread DuyHai Doan
Global TTL is better than dynamic runtime TTL Why ? Because Global TTL is a table property and Cassandra can perform optimization when compacting. For example if it can see than the maxTimestamp of an SSTable is older than the table Global TTL, the SSTable can be entirely dropped during compact

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
enarios in which Cassandra will have to read cells where time < 50? In > particular I am wondering if compression might have any affect. > > On Sun, Jan 29, 2017 at 3:01 PM DuyHai Doan wrote: > >> "Should the data be sorted by my time column regardless of the >>

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
en able to review some SSTables with > sstablemetadata and I can see that old/expired data is definitely living > with live data. > > > On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan wrote: > >> Ok so give it a try with TWCS. Since STCS does not sort data based on >> t

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
has to scan over a lot tombstones to fetch the correct range of data thus your issue On Sun, Jan 29, 2017 at 8:19 PM, John Sanda wrote: > It was with STCS. It was on a 2.x version before TWCS was available. > > On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan wrote: > >> Did you get

Re: Time series data model and tombstones

2017-01-29 Thread DuyHai Doan
Did you get this Overwhelming tombstonne behavior with STCS or with TWCS ? If you're using DTCS, beware of its weird behavior and tricky configuration. On Sun, Jan 29, 2017 at 3:52 PM, John Sanda wrote: > Your partitioning key is text. If you have multiple entries per id you are >> likely hitti

  1   2   3   4   5   6   >