Apache Cassandra - Question about data model
Hi, Just got your mail from the #cassandra channel on the web chat because i couldn't get an answer... I have a question that i'll be glad if you can help me or give me a direction. I have an activity feed like the activity feed on Instagram. When user (lets say UserA) enters his page he can see all the activities that are related to him, for example, user B liked your post.user C commented on your post etc... the cassandra data model that i thought about is: userID UDID (partition key) datetimeadded timestamp (clustering column DESC) userID_Name text userID_Picture_URL text userID_From UDID (this is userB from the example) userID_From_Name text userID_From_Picture_URL With this structure i can get the different activities to a user and it works just fine. My problem is that userID_From can change his name and his pictire and i need this data to be updated all arround the different tables because i want to show the current right values. The problem is that the update is a table scan and it's not efficient. Should i hold only the ID and every time that i select a slice of the data and get a several ID's i'll do a nother query to query about the values of the users name and picture path? Should i do something else? Best regards, Lior
Re: Apache Cassandra - Question about data model
Hi Lior, how about something like this where you separate the user fields into a separate USER_TABLE: FEED_TABLE userID UDID (partition key) datetimeadded timestamp (clustering column DESC) userID_From UDID (this is userB from the example) USER_TABLE userID UDID (partition key) userID_Name text userID_Picture_URL text You have an extra query but you can change the name and picture in one place. Matthias On Thu, Dec 31, 2015 at 5:36 AM, Lior Menashe wrote: > Hi, > > Just got your mail from the #cassandra channel on the web chat because i > couldn't get an answer... > > I have a question that i'll be glad if you can help me or give me a > direction. > > I have an activity feed like the activity feed on Instagram. When user > (lets say UserA) enters his page he can see all the activities that are > related to him, > for example, user B liked your post.user C commented on your post etc... > > the cassandra data model that i thought about is: > > userID UDID (partition key) > datetimeadded timestamp (clustering column DESC) > userID_Name text > userID_Picture_URL text > userID_From UDID (this is userB from the example) > userID_From_Name text > userID_From_Picture_URL > > With this structure i can get the different activities to a user and it > works just fine. My problem is that userID_From can change his name and his > pictire and i need this data to be updated all arround the different tables > because i want to show the current right values. > > The problem is that the update is a table scan and it's not efficient. > Should i hold only the ID and every time that i select a slice of the data > and get a several ID's i'll do a nother query to query about > the values of the users name and picture path? Should i do something else? > > Best regards, > Lior >
Re: Apache Cassandra - Question about data model
It's best to ask usage and data modeling questions on the user email list - this list is the dev list, for development of Cassandra itself, not for development of applications. See: http://cassandra.apache.org/ -- Jack Krupansky On Thu, Dec 31, 2015 at 8:36 AM, Lior Menashe wrote: > Hi, > > Just got your mail from the #cassandra channel on the web chat because i > couldn't get an answer... > > I have a question that i'll be glad if you can help me or give me a > direction. > > I have an activity feed like the activity feed on Instagram. When user > (lets say UserA) enters his page he can see all the activities that are > related to him, > for example, user B liked your post.user C commented on your post etc... > > the cassandra data model that i thought about is: > > userID UDID (partition key) > datetimeadded timestamp (clustering column DESC) > userID_Name text > userID_Picture_URL text > userID_From UDID (this is userB from the example) > userID_From_Name text > userID_From_Picture_URL > > With this structure i can get the different activities to a user and it > works just fine. My problem is that userID_From can change his name and his > pictire and i need this data to be updated all arround the different tables > because i want to show the current right values. > > The problem is that the update is a table scan and it's not efficient. > Should i hold only the ID and every time that i select a slice of the data > and get a several ID's i'll do a nother query to query about > the values of the users name and picture path? Should i do something else? > > Best regards, > Lior >
Re: Apache Cassandra - Question about data model
Hi Matthias, Thanks for your answer. According to what you've wrote, if i will select the first 30 lines from the feed table to a user i'll need to perform up to 30 more queries to the user table in order to get the users data. Isn't it better to use Cassandra for the feed and Some Sql Server to get the users data in one query? BR, Lior 2015-12-31 17:58 GMT+02:00 Matthias Eichstaedt < matthias.eichsta...@gmail.com>: > Hi Lior, > how about something like this where you separate the user fields into a > separate USER_TABLE: > > FEED_TABLE > userID UDID (partition key) > datetimeadded timestamp (clustering column DESC) > userID_From UDID (this is userB from the example) > > USER_TABLE > userID UDID (partition key) > userID_Name text > userID_Picture_URL text > > You have an extra query but you can change the name and picture in one > place. > > Matthias > > On Thu, Dec 31, 2015 at 5:36 AM, Lior Menashe > wrote: > > > Hi, > > > > Just got your mail from the #cassandra channel on the web chat because i > > couldn't get an answer... > > > > I have a question that i'll be glad if you can help me or give me a > > direction. > > > > I have an activity feed like the activity feed on Instagram. When user > > (lets say UserA) enters his page he can see all the activities that are > > related to him, > > for example, user B liked your post.user C commented on your post etc... > > > > the cassandra data model that i thought about is: > > > > userID UDID (partition key) > > datetimeadded timestamp (clustering column DESC) > > userID_Name text > > userID_Picture_URL text > > userID_From UDID (this is userB from the example) > > userID_From_Name text > > userID_From_Picture_URL > > > > With this structure i can get the different activities to a user and it > > works just fine. My problem is that userID_From can change his name and > his > > pictire and i need this data to be updated all arround the different > tables > > because i want to show the current right values. > > > > The problem is that the update is a table scan and it's not efficient. > > Should i hold only the ID and every time that i select a slice of the > data > > and get a several ID's i'll do a nother query to query about > > the values of the users name and picture path? Should i do something > else? > > > > Best regards, > > Lior > > > -- ליאור מנשה