Learn Apache Cassandra® 5.0 Data Modeling

2025-04-07 Thread Patrick McFadin
Tomorrow, I'm starting another data modeling series that assumes starting with version 5. The last time I did something this comprehensive was for Cassandra 3. Needless to say, there have been a LOT of updates since then. There will be five parts, and each one has its own signup for the

RE: data modeling qu: use a Map datatype, or just simple rows... ?

2020-10-01 Thread Durity, Sean R
-up would require a separate process, too. I don’t think you can expire rows within a map column using TTL. Sean Durity From: Rahul Singh Sent: Saturday, September 19, 2020 10:41 AM To: user@cassandra.apache.org; Attila Wind Subject: [EXTERNAL] Re: data modeling qu: use a Map datatype, or just

Re: data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-19 Thread Rahul Singh
with natural string keys like “email.” Best regards, Rahul Singh From: Sagar Jambhulkar Sent: Saturday, September 19, 2020 6:45:25 AM To: user@cassandra.apache.org ; Attila Wind Subject: Re: data modeling qu: use a Map datatype, or just simple rows... ? Don't really see a difference in two op

Re: data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-19 Thread Sagar Jambhulkar
Don't really see a difference in two options. Won't the partitioner run on user id and create a hash for you? Unless your hash function is better than partitioner. On Fri, 18 Sep 2020, 21:33 Attila Wind, wrote: > Hey guys, > > I'm curious about your experiences r

Re: data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-18 Thread onmstester onmstester
wrote Hey guys, I'm curious about your experiences regarding a data modeling question we are facing with. At the moment we see 2 major different approaches in terms of how to build the tables But I'm googling around already for days with no luck to find any usefu

data modeling qu: use a Map datatype, or just simple rows... ?

2020-09-18 Thread Attila Wind
Hey guys, I'm curious about your experiences regarding a data modeling question we are facing with. At the moment we see 2 major different approaches in terms of how to build the tables But I'm googling around already for days with no luck to find any useful material explaining t

Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad
Well, generally speaking I like to understand the problem before trying to fit a solution. If you're looking to set up millions of appointments for a business, that might quality for some amount of partitioning / bucketing. That said, you might be better off using time based buckets, say monthly o

Re: data modeling appointment scheduling

2018-11-04 Thread I PVP
For people(invitee), you are correct. They will not have millions of appointments. But, the organizer is a business.. a chain of businesses (Franchisor and Franchisees) that together across the country have dozens of thousands of appointments per day. Do you suggest removing the bucket , making

Re: data modeling appointment scheduling

2018-11-04 Thread Jonathan Haddad
Maybe I’m missing something, but it seems to me that the bucket might be a little overkill for a scheduling system. Do you expect people to have millions of appointments? On Sun, Nov 4, 2018 at 12:46 PM I PVP wrote: > Could you please provide advice on the modeling approach for the following >

data modeling appointment scheduling

2018-11-04 Thread I PVP
Could you please provide advice on the modeling approach for the following appointment scheduling scenario? I am struggling to model in an way that allows to satisfy the requirement to be able to update an appointment, specially to be able to change the start datetime and consequently the buc

Re: C* data modeling for time series

2018-06-18 Thread mm
t;) meaning C* will read all the data and then filter for time. Spark jobs runs for hours even for smaller time frames. what is the right approach for data modeling for such queries?. I want to get a general idea of things to look for when modeling such data. really appreciate all the help from th

Re: C* data modeling for time series

2018-06-18 Thread Affan Syed
understanding this runs a full table scan. as shown in spark UI >> (from DAG visualization "Scan org.apache.spark.sql.cassandra >> .CassandraSourceRelation@32bb7d65") meaning C* will read all the data >> and then filter for time. Spark jobs runs for hours even for smaller time >> frames. >> >> what is the right approach for data modeling for such queries?. I want to >> get a general idea of things to look for when modeling such data. >> really appreciate all the help from this community :). if you need any >> extra details please ask me here. >> >> Regards, >> Junaid >> >> >> > >

Reg:- limitation as PROS and CONS of Using Collections in Data modeling

2018-01-01 Thread @Nandan@
Hi All, I want to know, what will be a limitation in case of using Collections such as SET, LIST, MAP? Like in my condition, which inserting Video details, I have to insert language based such as Language:- English Title:- Video Name Language:- Hindi Title:- Video_name in Hindi Language:- Chinese T

Re: C* data modeling for time series

2017-07-26 Thread Jeff Jirsa
a full table scan. as shown in spark UI > (from DAG visualization "Scan > org.apache.spark.sql.cassandra.CassandraSourceRelation@32bb7d65") meaning > C* will read all the data and then filter for time. Spark jobs runs for > hours even for smaller time frames. > > what is the right approach for data modeling for s

Re: C* data modeling for time series

2017-07-26 Thread CPC
d65") meaning C* will read all the data and > then filter for time. Spark jobs runs for hours even for smaller time > frames. > > what is the right approach for data modeling for such queries?. I want to > get a general idea of things to look for when modeling such data. > really appreciate all the help from this community :). if you need any > extra details please ask me here. > > Regards, > Junaid > > >

Re: C* data modeling for time series

2017-07-26 Thread Junaid Nasir
a full table scan. as shown in spark UI > (from DAG visualization "Scan org.apache.spark.sql.cassandra > .CassandraSourceRelation@32bb7d65") meaning C* will read all the data and > then filter for time. Spark jobs runs for hours even for smaller time > frames. > > what is t

Re: C* data modeling for time series

2017-07-26 Thread CPC
from DAG visualization "Scan org.apache.spark.sql.cassandra. CassandraSourceRelation@32bb7d65") meaning C* will read all the data and then filter for time. Spark jobs runs for hours even for smaller time frames. what is the right approach for data modeling for such queries?. I want to get a g

C* data modeling for time series

2017-07-26 Thread Junaid Nasir
er for time. Spark jobs runs for hours even for smaller time frames. what is the right approach for data modeling for such queries?. I want to get a general idea of things to look for when modeling such data. really appreciate all the help from this community :). if you need any extra details

Re: Help on temporal data modeling

2016-09-23 Thread Peter Lin
yes it would. Whether next_billing_date is timestamp or date wouldn't make any difference on scanning all partitions. If you want to them to be on the same node, you can use composite key, but there's a trade off. The nodes may get unbalanced, so you have to do the math to figure out if your specif

Re: Help on temporal data modeling

2016-09-23 Thread Denis Mikhaylov
gt; health insurance is with effective/expiration day. Commonly called > bi-temporal data modeling. > > How people model bi-temporal models varies quite a bit from first hand > experience, but the common thing is to have transaction timestamp, effective > day and expiration day. Thi

Re: Help on temporal data modeling

2016-09-23 Thread Peter Lin
Ignoring noSql for a minute, the standard way of modeling this in car and health insurance is with effective/expiration day. Commonly called bi-temporal data modeling. How people model bi-temporal models varies quite a bit from first hand experience, but the common thing is to have transaction

Re: Help on temporal data modeling

2016-09-23 Thread Alain RODRIGUEZ
Hi Denis, You might want to have a look at - Materialized views http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views - Secondary index https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html My 2 cents: make sure to understand the implications before moving forwa

Help on temporal data modeling

2016-09-23 Thread Denis Mikhaylov
Hi! I have question regarding data modelling. Let’s say that I have `subscriptions` table with two columns `subscription_id text` and `next_billing_date timestamp`. How do I model a table to efficiently query all subscriptions due today (something like `where next_billing_date <= today`)

Re: Cassandra data modeling for a social network

2016-05-31 Thread vincent gromakowski
Or use graphframes (Spark) over cassandra to store separately a graph of users and followers and next a table of tweet. You will be able to join data between those 2 structures using spark. 2016-05-31 14:27 GMT+02:00 : > Hello, > > >* First, Is this data modeling correct for

RE: Cassandra data modeling for a social network

2016-05-31 Thread aeljami.ext
Hello, > First, Is this data modeling correct for follow base (follower, following actions) social network? For social network, I advise you to see Graph Databases, over Cassandra Example : https://academy.datastax.com/resources/getting-started-graph-databases De : Mohammad Kerm

Cassandra data modeling for a social network

2016-05-30 Thread Mohammad Kermani
We are using Cassandra for our social network and we are designing/data modeling tables we need, it is confusing for us and we don't know how to design some tables and we have some little problems! *As we understood for every query we have to have different tables*, and for example user

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-06 Thread Jim Ancona
On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad wrote: > You could keep a "num_buckets" value associated with the client's account, > which can be adjusted accordingly as usage increases. > Yes, but the adjustment problem is tricky when there are multiple concurrent writers. What happens when yo

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jonathan Haddad
You could keep a "num_buckets" value associated with the client's account, which can be adjusted accordingly as usage increases. On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote: > On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com> wrote: > >> What sort of dat

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > What sort of data is your clustering key composed of? That might help some > in determining a way to achieve what you're looking for. > Just a UUID that acts as an object identifier. > > Clint > On Jan

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Clint Martin
What sort of data is your clustering key composed of? That might help some in determining a way to achieve what you're looking for. Clint On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > Hi Nate, > > Yes, I've been thinking about treating customers as either small or big, > where "small" ones have

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will hap

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Jack, Thanks for your response. My answers inline... On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky wrote: > Jim, I don't quite get why you think you would need to query 50 partitions > to return merely hundreds or thousands of rows. Please elaborate. I mean, > sure, for that extreme 100th

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Nate McCall
> > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. >

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jack Krupansky
Jim, I don't quite get why you think you would need to query 50 partitions to return merely hundreds or thousands of rows. Please elaborate. I mean, sure, for that extreme 100th percentile, yes, you would query a lot of partitions, but for the 90th percentile it would be just one. Even the 99th per

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Thanks for responding! My natural partition key is a customer id. Our customers have widely varying amounts of data. Since the vast majority of them have data that's small enough to fit in a single partition, I'd like to avoid imposing unnecessary overhead on the 99% just to avoid issues with the

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Clint Martin
You should endeavor to use a repeatable method of segmenting your data. Swapping partitions every time you "fill one" seems like an anti pattern to me. but I suppose it really depends on what your primary key is. Can you share some more information on this? In the past I have utilized the consiste

Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Jim Ancona
A problem that I have run into repeatedly when doing schema design is how to control partition size while still allowing for efficient multi-row queries. We want to limit partition size to some number between 10 and 100 megabytes to avoid operational issues. The standard way to do that is to figur

RE: Data Modeling for 2.1 Cassandra

2015-04-30 Thread Peer, Oded
of parentheses ); INSERT INTO users(first_name, last_name, id) values (‘neha’, ‘dave’, 1); SELECT * FROM users where first_name = 'rob' and last_name = 'abb'; From: Neha Trivedi [mailto:nehajtriv...@gmail.com] Sent: Thursday, April 30, 2015 10:16 AM To: user@cassandra.apac

Data Modeling for 2.1 Cassandra

2015-04-30 Thread Neha Trivedi
Helle all, I was wondering which data model of the Three describe below better in terms of performance. Seems 3 is good. *#1. log with 3 Index* CREATE TABLE log ( id int PRIMARY KEY, first_name set, last_name set, dob set ); CREATE INDEX log_firstname_index ON test.log

Re: Queries required before data modeling?

2015-01-06 Thread Srinivasa T N
te: > >> Hi All, >>I was just googling around and reading the various articles on data >> modeling in cassandra. All of them talk about working backwards, i.e., >> first now what type of queries you are going to make and select a right >> data model which can suppor

Re: Queries required before data modeling?

2015-01-06 Thread Ryan Svihla
e going to need to model in a scalable way. On Tue, Jan 6, 2015 at 11:47 AM, Srinivasa T N wrote: > Hi All, >I was just googling around and reading the various articles on data > modeling in cassandra. All of them talk about working backwards, i.e., > first now what type of qu

Re: Queries required before data modeling?

2015-01-06 Thread James Rothering
tional approach where you get whatever you want, but your performance may degrade when you do the costly joins. Regards, James On Tue, Jan 6, 2015 at 9:47 AM, Srinivasa T N wrote: > Hi All, >I was just googling around and reading the various articles on data > modeling in cassandra.

Queries required before data modeling?

2015-01-06 Thread Srinivasa T N
Hi All, I was just googling around and reading the various articles on data modeling in cassandra. All of them talk about working backwards, i.e., first now what type of queries you are going to make and select a right data model which can support those queries efficiently. But one thing I

Re: Data modeling for Pinterest-like application

2014-05-17 Thread ziju feng
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594539.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Data modeling for Pinterest-like application

2014-05-17 Thread DuyHai Doan
ds of user, if we also > denormalize the like board, everytime that pin is liked by another user we > would have to update the like count in thousands of like boards. > > Does normalize work better in this case or cassandra can handle this kind > of > write load? > > > >

Re: Data modeling for Pinterest-like application

2014-05-16 Thread ziju feng
://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594517.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.

Re: Data modeling for Pinterest-like application

2014-05-16 Thread DuyHai Doan
QRS pattern will help you to separate the write and read stages, also heavy unit and integration testing. On Fri, May 16, 2014 at 5:14 AM, ziju feng wrote: > Hello, > > I'm working on data modeling for a Pinterest-like project. There are > basically two main concepts: Pin a

Data modeling for Pinterest-like application

2014-05-16 Thread ziju feng
Hello, I'm working on data modeling for a Pinterest-like project. There are basically two main concepts: Pin and Board, just like Pinterest, where pin is an item containing an image, description and some other information such as a like count, and each board should contain a sorted list of

Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
You’re right. I didn’t catch that. No need to have email in the PRIMARY KEY. On Jan 21, 2014, at 5:11 PM, Jon Ribbens wrote: > On Tue, Jan 21, 2014 at 10:40:39AM -0800, Drew Kutcharian wrote: >> Thanks, I was actually thinking of doing that. Something along the lines >> of >> CREATE TABLE

Re: Data modeling users table with CQL

2014-01-21 Thread Jon Ribbens
On Tue, Jan 21, 2014 at 10:40:39AM -0800, Drew Kutcharian wrote: >Thanks, I was actually thinking of doing that. Something along the lines >of >CREATE TABLE user ( > idtimeuuid PRIMARY KEY, > emailtext, > nametext, > ... >); >CREATE TABLE user_ema

Re: Data modeling users table with CQL

2014-01-21 Thread Tupshin Harper
It's a broad topic, but I mean all of the best practices alluded to by writeups like this. http://www.technicalinfo.net/papers/WebBasedSessionManagement.html -Tupshin On Jan 21, 2014 11:37 AM, "Drew Kutcharian" wrote: > Cool. BTW, what do you mean by have additional session tracking ids? > What

Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
Cool. BTW, what do you mean by have additional session tracking ids? What’d that be for? - Drew On Jan 21, 2014, at 10:48 AM, Tupshin Harper wrote: > It does sound right. > > You might want to have additional session tracking id's, separate from the > user id, but that is an additional imp

Re: Data modeling users table with CQL

2014-01-21 Thread Tupshin Harper
It does sound right. You might want to have additional session tracking id's, separate from the user id, but that is an additional implementation detail, and could be external to Cassandra. But the approach you describe accurately describes what I would do as a first pass, at least. -Tupshin On

Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
Thanks, I was actually thinking of doing that. Something along the lines of CREATE TABLE user ( idtimeuuid PRIMARY KEY, emailtext, nametext, ... ); CREATE TABLE user_email_index ( email text, id timeuuid, PRIMARY KEY (email, id) ); And during registration, I would ju

Re: Data modeling users table with CQL

2014-01-21 Thread Tupshin Harper
One CQL row per user, keyed off of the UUID. Another table keyed off of email, with another column containing the UUID for lookups in the first table. Only registration will require a lightweight transaction, and only for the purpose of avoiding duplicate email registration race conditions. -Tup

Re: Data modeling users table with CQL

2014-01-21 Thread Drew Kutcharian
A shameful bump ;) > On Jan 20, 2014, at 2:14 PM, Drew Kutcharian wrote: > > Hey Guys, > > I’m new to CQL (but have been using C* for a while now). What would be the > best way to model a users table using CQL/Cassandra 2.0 Lightweight > Transactions where we would like to have: > - A unique

Data modeling users table with CQL

2014-01-20 Thread Drew Kutcharian
Hey Guys, I’m new to CQL (but have been using C* for a while now). What would be the best way to model a users table using CQL/Cassandra 2.0 Lightweight Transactions where we would like to have: - A unique TimeUUID as the primary key of the user - A unique email address used for logging in In t

Data Modeling help for representing a survey form.

2013-08-30 Thread John Anderson
on I would like to get it on cassandra is because currently at peak times this is an extremely write heavy application since people are registering for a conference that launched or filling out a new survey, so everyone comes in all at once. Also, if anyone is in the bay area and wants to discus

Re: data modeling from batch_mutate point of view

2013-04-11 Thread aaron morton
> b) the "batch_mutate" advantages are better, for the communication > "client<=>coordinator node" __and__ for the communications "coordinator > node<=>replicas". Yes. A single row mutation can write to many CFs. > Is there any expe

RE: data modeling from batch_mutate point of view

2013-04-09 Thread DE VITO Dominique
Thanks Aaron. It helped. Let's me rephrase a little bit my questions. It's about data modeling impact on "batch_mutate" advantages. I have one CF for storing data, and ~10 (all different) CF used for indexing that data. when adding a piece of data, I need to add indexes t

Re: data modeling from batch_mutate point of view

2013-04-09 Thread aaron morton
> So, one alternative design for indexing CF could be: > rowkey = folder_id > colname = (indexed value, timestamp, file_id) > colvalue = "" > If you always search in a folder what about rowkey = colname = (That's closer to secondary indexes in cassandra with the addition of the folder_id) >

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-08 Thread aaron morton
If you create a reverse index on all column names, where the single row has a key something like "the_index" and each column name is the column name that has been used else where, you are approaching the "twitter global timeline anti pattern"(™). Basically you will end up with a hot row that h

data modeling from batch_mutate point of view

2013-04-08 Thread DE VITO Dominique
Hi, I have a use case that sounds like storing data associated with files. So, I store them with the CF: rowkey = (folder_id, file_id) colname = property name (about the file corresponding to file_id) colvalue = property value And I have CF for "manual" indexing: rowkey = (folder_id, indexed val

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-05 Thread Drew Kutcharian
One thing I can do is to have a client-side cache of the keys to reduce the number of updates. On Apr 5, 2013, at 6:14 AM, Edward Capriolo wrote: > Since there are few column names what you can do is this. Make a reverse > index, low read repair chance, Be aggressive with compaction. It will

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-05 Thread Edward Capriolo
Since there are few column names what you can do is this. Make a reverse index, low read repair chance, Be aggressive with compaction. It will be many extra writes but that is ok. Other option is turn on row cache and try read before write. It is a good case for row cache because it is a very smal

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-04 Thread Drew Kutcharian
I don't really need to answer "what rows contain column named X", so no need for a reverse index here. All I want is a distinct set of all the column names, so I can answer "what are all the available column names" On Apr 4, 2013, at 4:20 PM, Edward Capriolo wrote: > Your reverse index of "wh

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-04 Thread Edward Capriolo
Your reverse index of "which rows contain a column named X" will have very wide rows. You could look at cassandra's secondary indexing, or possibly look at a solandra/solr approach. Another option is you can shift the problem slightly, "which rows have column X that was added between time y and tim

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-04 Thread Drew Kutcharian
Hi Edward, I anticipate that the column names will be reused a lot. For example, key1 will be in many rows. So I think the number of distinct column names will be much much smaller than the number of rows. Is there a way to have a separate CF that keeps track of the column names? What I was t

Re: Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-04 Thread Edward Capriolo
You can not get only the column name (which you are calling a key) you can use get_range_slice which returns all the columns. When you specify an empty byte array (new byte[0]{}) as the start and finish you get back all the columns. From there you can return only the columns to the user in a format

Data Modeling: How to keep track of arbitrarily inserted column names?

2013-04-04 Thread Drew Kutcharian
Hey Guys, I'm working on a project and one of the requirements is to have a schema free CF where end users can insert arbitrary key/value pairs per row. What would be the best way to know what are all the "keys" that were inserted (preferably w/o any locking). For example, Row1 => key1 -> XXX,

RE: Data Modeling: Comments with Voting

2012-10-01 Thread Roshni Rajagopal
my suggestions. From: aa...@thelastpickle.com Subject: Re: Data Modeling: Comments with Voting Date: Tue, 2 Oct 2012 10:39:42 +1300 To: user@cassandra.apache.org You cannot (and probably do not want to) sort continually when the voting is going on. You can store the votes using CounterColumnTypes in

Re: Data Modeling: Comments with Voting

2012-10-01 Thread aaron morton
y with a >> dummy row id 'sort_by_votes_list' and column names can be a composite of >> number of votes , and comment id ( as more than 1 comment can have the same >> votes) >> >> >> Regards, >> Roshni >> >> > Date: Wed, 26 Sep 20

Re: Data Modeling: Comments with Voting

2012-09-29 Thread Drew Kutcharian
t can have the same > votes) > > > Regards, > Roshni > > > Date: Wed, 26 Sep 2012 17:36:13 -0700 > > From: k...@mustardgrain.com > > To: user@cassandra.apache.org > > CC: d...@venarc.com > > Subject: Re: Data Modeling: Comments with Voting >

RE: Data Modeling: Comments with Voting

2012-09-27 Thread Roshni Rajagopal
che.org > CC: d...@venarc.com > Subject: Re: Data Modeling: Comments with Voting > > Depending on your needs, you could simply duplicate the comments in two > separate CFs with the column names including time in one and the vote in > the other. If you allow for updates to the comment

Re: Data Modeling: Comments with Voting

2012-09-26 Thread Kirk True
Depending on your needs, you could simply duplicate the comments in two separate CFs with the column names including time in one and the vote in the other. If you allow for updates to the comments, that would pose some issues you'd need to solve at the app level. On 9/26/12 4:28 PM, Drew Kutch

Data Modeling: Comments with Voting

2012-09-26 Thread Drew Kutcharian
Hi Guys, Wondering what would be the best way to model a flat (no sub comments, i.e. twitter) comments list with support for voting (where I can sort by create time or votes) in Cassandra? To demonstrate: Sorted by create time: - comment 1 (5 votes) - comment 2 (1 votes) - comment 3 (no votes)

Re: Data Modeling - JSON vs Composite columns

2012-09-21 Thread Bill
wants to update individual fields , while B is better if one wants easier paging, reading multiple items at once in one read. etc. The details are in this discussion thread http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-another-question-td7581967.html I had an a

Re: Data Modeling - JSON vs Composite columns

2012-09-20 Thread Sylvain Lebresne
On Wed, Sep 19, 2012 at 3:32 PM, Brian O'Neill wrote: > That said, I'm keeping a close watch on: > https://issues.apache.org/jira/browse/CASSANDRA-3647 > > But if this is CQL only, I'm not sure how much use it will be for us > since we're coming in from different clients. > Anyone know how/if coll

Re: Data Modeling - JSON vs Composite columns

2012-09-20 Thread Sylvain Lebresne
On Wed, Sep 19, 2012 at 2:00 PM, Roshni Rajagopal wrote: > Hi, > > There was a conversation on this some time earlier, and to continue it > > Suppose I want to associate a user to an item, and I want to also store 3 > commonly used attributes without needing to go to an entity item column > famil

Re: Data Modeling - JSON vs Composite columns

2012-09-19 Thread Michael Kjellman
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-another-question-td7581967.html I had an additional question, as its being said, that CQL is the direction in which cassandra is moving, and there's a lot of effort in making CQL the standard, How does approach

Re: Data Modeling - JSON vs Composite columns

2012-09-19 Thread Brian O'Neill
gt; = {name: Betty Crocker,descr: Cake, Qty: 5}, > ={name: Nutella,descr: Choc spread, Qty: 15} > } > > Essentially A is better if one wants to update individual fields , while B > is better if one wants easier paging, reading multiple items at once in one > read. etc. The details are in this

Re: Data Modeling- another question

2012-08-28 Thread samal
yes, you are right, it depend on use cases. I suggested it is a better choice not only choice. JSON will be better if any filed change re-write whole data without reading. I tend to use JSON more, where my data does not change or very rarely, Like storing demoralized JSON data for analytic purpose.

Re: Data Modeling- another question

2012-08-27 Thread Guy Incognito
i would respectfully disagree, what you have said is true but it really depends on the use case. 1) do you expect to be doing updates to individual fields of an item, or will you always update all fields at once? if you are doing separate updates then the first is definitely easier to handle

Re: Data Modeling- another question

2012-08-24 Thread samal
First is better choice, each filed can be updated separately(write only). Second you have to take care json yourself (read first-modify-then write). On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal < roshni.rajago...@wal-mart.com> wrote: > Hi, > > Suppose I have a column family to associate a us

Data Modeling- another question

2012-08-24 Thread Roshni Rajagopal
Hi, Suppose I have a column family to associate a user to a dynamic list of items. I want to store 5-10 key information about the item, & no specific sorting requirements are there. I have two options A) use composite columns UserId1 : { : = Betty Crocker, : = Cake : = 5 : = Nutella, : = C

Re: Data modeling question

2012-06-29 Thread Peter Hsu
Just read up on composite keys and what looks like future deprecation of super column families. I guess Option 2 would now be: - column family with composite key from grouping and location > e.g. > '0:0,0': { meta } > ... > '0:10,10' : { meta } > '1:10,0' : {meta} > … > '1:20, 10': {meta}

Data modeling question

2012-06-29 Thread Peter Hsu
I have a question on what the best way is to store the data in my schema. The data I have millions of nodes, each with a different cartesian coordinate. The keys for the nodes are hashed based on the coordinate. My search is a proximity search. I'd like to find all the nodes within a given di

Re: Data modeling for read performance

2012-05-20 Thread aaron morton
I would bucket the time stats as well. If you write all the attributes at the same time, and always want to read them together, storing them in something like a JSON blob is legitimate approach. Other Aaron, can you elaborate on > I'm not using composite row keys (it's just > AsciiType) as th

Re: Data modeling for read performance

2012-05-17 Thread Aaron Turner
On Thu, May 17, 2012 at 8:55 AM, jason kowalewski wrote: > We have been attempting to change our data model to provide more > performance in our cluster. > > Currently there are a couple ways to model the data and i was > wondering if some people out there could help us out. > > We are storing tim

Re: Data modeling advice (time series)

2012-05-02 Thread Aaron Turner
On Wed, May 2, 2012 at 8:22 AM, Tim Wintle wrote: > On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote: >> Tens or a few hundred MB per row seems reasonable.  You could do >> thousands/MB if you wanted to, but that can make things harder to >> manage. > > thanks (Both Aarons) > >> Depending on

Re: Data modeling advice (time series)

2012-05-02 Thread Tim Wintle
On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote: > Tens or a few hundred MB per row seems reasonable. You could do > thousands/MB if you wanted to, but that can make things harder to > manage. thanks (Both Aarons) > Depending on the size of your data, you may find that the overhead of > ea

Re: Data modeling advice (time series)

2012-05-01 Thread aaron morton
I would try to avoid 100's on MB's per row. It will take longer to compact and repair. 10's is fine. Take a look at in_memory_compaction_limit and thrift_frame_size in the yaml file for some guidance. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpi

Re: Data modeling advice (time series)

2012-05-01 Thread Aaron Turner
On Tue, May 1, 2012 at 10:20 AM, Tim Wintle wrote: > I believe that the general design for time-series schemas looks > something like this (correct me if I'm wrong): > > (storing time series for X dimensions for Y different users) > > Row Keys:  "{USET_ID}_{TIMESTAMP/BUCKETSIZE}" > Columns: "{DIME

Data modeling advice (time series)

2012-05-01 Thread Tim Wintle
I believe that the general design for time-series schemas looks something like this (correct me if I'm wrong): (storing time series for X dimensions for Y different users) Row Keys: "{USET_ID}_{TIMESTAMP/BUCKETSIZE}" Columns: "{DIMENSION_ID}_{TIMESTAMP%BUCKETSIZE}" -> {Counter} But I've not fou

Re: Data Modeling

2012-02-21 Thread aaron morton
9A0C-0305E82C3301") = "FR", >>> ("country", "21EC2020-3AEA-1069-A2DD-08002B30309D") = "EN", >>> ("firstName", "3F2504E0-4F89-11D3-9A0C-0305E82C3301") = "Carl", >>> ("firstName&qu

Re: Data Modeling

2012-02-20 Thread alexis coudeyras
EC2020-3AEA-1069-A2DD-08002B30309D") = "Doe", >> ... >> }] >> >> >> As far as i understand it seems to be the fastest way to retrieve all values >> of a field in the same order. >> To update, i don't need to read before writing. >> >> Problem : the row will be very large : 300 000 000 of columns. I can split >> it in different rows based on the value of the specific field, for example >> country. >> >> --- >> Solution 3: >> >> Wide Row by field >> >> Column Family : customers >> One row by field : so 300 rows >> Columns : ID = FieldValue >> >> Benefits : >> The row will be smaller, 1 000 000 colums. >> >> Problem : >> Update seems more expensive, for every customer to update, i need to update >> 300 rows. >> >> --- >> >> Witch solution seems to be the good one ? Does Cassandra is really a good >> fit for this use case ? >> >> Thanks >> >> Alexis Coudeyras >> >> -- >> View this message in context: >> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-tp7300846p7300846.html >> Sent from the cassandra-u...@incubator.apache.org mailing list archive at >> Nabble.com. >

Re: Data Modeling

2012-02-20 Thread aaron morton
nt rows based on the value of the specific field, for example > country. > > --- > Solution 3: > > Wide Row by field > > Column Family : customers > One row by field : so 300 rows > Columns : ID = FieldValue > > Benefits : > The row will be smaller, 1 000 000 colums. > > Problem : > Update seems more expensive, for every customer to update, i need to update > 300 rows. > > --- > > Witch solution seems to be the good one ? Does Cassandra is really a good > fit for this use case ? > > Thanks > > Alexis Coudeyras > > -- > View this message in context: > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-tp7300846p7300846.html > Sent from the cassandra-u...@incubator.apache.org mailing list archive at > Nabble.com.

Re: data modeling question

2011-11-30 Thread Deno Vichas
here's what i ended up, this seems to work for me. @Test public void readAndWriteSettingTTL() throws Exception { int ttl = 2; String columnFamily = "Quote"; Set symbols = new HashSet(){{ add("appl");

Re: data modeling question

2011-11-30 Thread David McNelis
You wouldn't query for all the keys that have a column name x exactly. Instead what you would do is for sector x grab your list of symbols S. Then you would get the last column for each of those symbols (which you do in different ways depending on the API), and then test if that date is within yo

Re: data modeling question

2011-11-30 Thread Deno Vichas
with the quote CF below how would one query for all keys that have a column name value that have a timeuuid of later than x minutes? i need to be able to find all symbols that have not been fetch in x minutes by sector. i know i get list of symbol by sector from my sector CF. thanks, deno O

  1   2   >