Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Erick Ramirez
> > *Thanks but there’s no DSE License.* FWIW it was announced just before Christmas that both DSBulk (DataStax Bulk Loader) and the DataStax Apache Kafka connector are now both freely available to all developers and will work with open-source Apache Cassandra. For details, see https://www.datast

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Dor Laor
Another option instead of raw sstables is to use the Spark Migrator [1]. It reads a source cluster, can make some transformations (like table/column naming) and writes to a target cluster. It's a very convenient tool, OSS and free of charge. [1] https://github.com/scylladb/scylla-migrator On Fri,

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Erick Ramirez
> > > *In terms of speed, the sstableloader should be faster correct?Maybe the > DSE BulkLoader finds application when you want a slice of the data and not > the entire cake. Is it correct?* There's no real direct comparison because DSBulk is designed for operating on data in CSV or JSON as a rep

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Sergio
Hi everyone, Is the DSE BulkLoader faster than the sstableloader? Sometimes I need to make a cluster snapshot and replicate a Cluster A to a Cluster B with fewer performance capabilities but the same data size. In terms of speed, the sstableloader should be faster correct? Maybe the DSE BulkLo

RE: [EXTERNAL] Re: COPY command with where condition

2020-01-17 Thread Durity, Sean R
sstablekeys (in the tools directory?) can extract the actual keys from your sstables. You have to run it on each node and then combine and de-dupe the final results, but I have used this technique with a query generator to extract data more efficiently. Sean Durity From: Chris Splinter Sent:

RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
Not sure what you mean by “online” migration. You can load data into the same name table in cluster B. If the primary keys match, data will be overwritten (effectively, not actually on disk). I think you can pipe the output of a dsbulk unload to a dsbulk load and make the data transfer very quic

Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
Do you know your partition keys? One option could be to enumerate that list of partition keys in separate cmds to make the individual operations less expensive for the cluster. For example: Say your partition key column is called id and the ids in your database are [1,2,3] You could do ./dsbulk

RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
I don't really know for the moment in production environment, but for developpment environment the table contains more than 10.000.000 rows. But we need just a sub dataset of this table not the entirety ... De : Chris Splinter Envoyé : vendredi 17 janvier 2020 17:

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Hi Sean, You got all valid points. Please see my answers below - 1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region completely. 2. Cluster names in 'A' and 'B' are different. 3. DSbulk - Is there anyway I can do online migration? - I still need to get clarity on whethe

Re: COPY command with where condition

2020-01-17 Thread Michael Shuler
On 1/17/20 9:50 AM, adrien ruffie wrote: Thank you very much,  so I do this request with for example --> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump But I get the following erro

Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
What you are seeing there is a standard read timeout, how many rows do you expect back from that query? On Fri, Jan 17, 2020 at 9:50 AM adrien ruffie wrote: > Thank you very much, > > so I do this request with for example --> > > ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "S

RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
Thank you very much, so I do this request with for example --> ./dsbulk unload --dsbulk.schema.keyspace 'dev_keyspace' -query "SELECT * FROM probe_sensors WHERE localisation_id = 208812 ALLOW FILTERING" -url /home/dump But I get the following error com.datastax.dsbulk.executor.api.exception.B

Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Jeff Jirsa
The migration requirements are impossible given the current state of the database You probably can’t join two distinct clusters without app changes and without downtime unless you’re very lucky (same cluster name, app using quorum but not local quorum, both clusters using NetworkTopologyStrateg

Re: COPY command with where condition

2020-01-17 Thread Chris Splinter
DSBulk has an option that lets you specify the query ( including a WHERE clause ) See Example 19 in this blog post for details: https://www.datastax.com/blog/2019/06/datastax-bulk-loader-unloading On Fri, Jan 17, 2020 at 7:34 AM Jean Tremblay < jean.tremb...@zen-innovations.com> wrote: > Did you

RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Durity, Sean R
A couple things to consider: * A separation of apps into their own clusters is typically a better model to avoid later entanglements * Dsbulk (1.4.1) is now available for only open source clusters. It is a great tool for unloading/loading * What data problem are you trying to solve w

Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Hi Upasana, Thanks for your response. I’d love to do that as a first strategy but since they are both separate clusters , how would I do that? Keyspaces already have networktopologystrategy with RF=3. — Ankit On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com> wrote: > Hi,

Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Upasana Sharma
Hi, Did you consider adding Cassandra nodes from cluster B, into cluster A as a different data center ? Your keyspace would than be on Network topology data strategy. In this case, all data can be synced between both data centers by Cassandra using rebalancing. At client/application level you

Re: COPY command with where condition

2020-01-17 Thread Jean Tremblay
Did you think about using a Materialised View to generate what you want to keep, and then use DSBulk to extract the data? > On 17 Jan 2020, at 14:30 , adrien ruffie wrote: > > Sorry I come back to a quick question about the bulk loader ... > > https://www.datastax.com/blog/2018/05/introducing-

RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
Sorry I come back to a quick question about the bulk loader ... https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader I read this : "Operations such as converting strings to lowercase, arithmetic on input columns, or filtering out rows based on some criteria, are not supported.

Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names

2020-01-17 Thread Ankit Gadhiya
Thanks but there’s no DSE License. Wondering how sstableloader will help as some oh the Keyspace and tables names are same. Also how do i sync few system keyspaces. Thanks & Regards, Ankit On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov wrote: > Loader* > > https://www.datastax.com/blog/2018/05

RE: COPY command with where condition

2020-01-17 Thread adrien ruffie
Thank a lot ! It's a good news for DSBulk ! I will take a look around this solution. best regards, Adrian De : Erick Ramirez Envoyé : vendredi 17 janvier 2020 10:02 À : user@cassandra.apache.org Objet : Re: COPY command with where condition The COPY command does

Re: Cassandra failing with "Local host name unknown" even when specifying IP's for listen and rpc addresses

2020-01-17 Thread Erick Ramirez
FWIW there was a long discussion on ASF Slack about this topic earlier this week (starting here ) with driftx, exlt & myself and the recommendation was to make the hostname resolve locally as best practice. Cheers! On Wed, Jan 15, 202

Re: COPY command with where condition

2020-01-17 Thread Erick Ramirez
The COPY command doesn't support filtering and it doesn't perform well for large tables. Have you considered the DSBulk tool from DataStax? Previously, it only worked with DataStax Enterprise but a few weeks ago, it was made free and works with open-source Apache Cassandra. For details, see this b