Re: Suggestions for migrating data from cassandra

2018-05-16 Thread Jing Meng
We would try migration for some small keyspaces (with data of serveral gigabytes across a dc) first, but ultimately migration for several large keyspaces with data size ranged from 100G to 5T, some tables having >1T data, would be scheduled too. As for StreamSets/Talend, personally I doubt if usin

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread Joseph Arriola
Hi Jing. How much information do you need to migrate? in volume and number of tables? With Spark could you do the follow: - Read the data and export directly to MySQL. - Read the data and export to csv files and after load to MySQL. Could you use other paths such as: - StreamSets

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread Arbab Khalil
Both C* and mysql support is available in Spark. For C*, datastax:spark-cassandra-connector is needed. It is very simple to read and write data in Spark. To read C* table use: df = spark.read.format("org.apache.spark.sql.cassandra")\ .options(keyspace = 'test', table = 'test_table').load() a

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread kurt greaves
COPY might work but over hundreds of gigabytes you'll probably run into issues if you're overloaded. If you've got access to Spark that would be an efficient way to pull down an entire table and dump it out using the spark-cassandra-connector. On 15 May 2018 at 10:59, Jing Meng wrote: > Hi guys,

Re: Suggestions for migrating data from cassandra

2018-05-15 Thread Michael Dykman
I don't know that there are any projects out there addressing this but I advise you to study LOAD ... INFILE in the MySQL manual specific to your target version. It basically describes a CSV format, where a given file represents a subset of data for a specific table. It is far and away the fastest

Suggestions for migrating data from cassandra

2018-05-15 Thread Jing Meng
Hi guys, for some historical reason, our cassandra cluster is currently overloaded and operating on that somehow becomes a nightmare. Anyway, (sadly) we're planning to migrate cassandra data back to mysql... So we're not quite clear how to migrating the historical data from cassandra. While as I