Tomorrow, I'm starting another data modeling series that assumes starting
with version 5. The last time I did something this comprehensive was for
Cassandra 3. Needless to say, there have been a LOT of updates since then.
There will be five parts, and each one has its own signup for the
-up would require a separate process,
too. I don’t think you can expire rows within a map column using TTL.
Sean Durity
From: Rahul Singh
Sent: Saturday, September 19, 2020 10:41 AM
To: user@cassandra.apache.org; Attila Wind
Subject: [EXTERNAL] Re: data modeling qu: use a Map datatype, or just
with natural string keys like “email.”
Best regards,
Rahul Singh
From: Sagar Jambhulkar
Sent: Saturday, September 19, 2020 6:45:25 AM
To: user@cassandra.apache.org ; Attila Wind
Subject: Re: data modeling qu: use a Map datatype, or just simple rows... ?
Don't really see a difference in two op
Don't really see a difference in two options. Won't the partitioner run on
user id and create a hash for you? Unless your hash function is better than
partitioner.
On Fri, 18 Sep 2020, 21:33 Attila Wind, wrote:
> Hey guys,
>
> I'm curious about your experiences r
wrote
Hey guys,
I'm curious about your experiences regarding a data modeling
question we are facing with.
At the moment we see 2 major different approaches in terms of how
to build the tables
But I'm googling around already for days with no luck to find any
usefu
Hey guys,
I'm curious about your experiences regarding a data modeling question we
are facing with.
At the moment we see 2 major different approaches in terms of how to
build the tables
But I'm googling around already for days with no luck to find any useful
material explaining t
Well, generally speaking I like to understand the problem before trying to
fit a solution. If you're looking to set up millions of appointments for a
business, that might quality for some amount of partitioning / bucketing.
That said, you might be better off using time based buckets, say monthly o
For people(invitee), you are correct. They will not have millions of
appointments. But, the organizer is a business.. a chain of businesses
(Franchisor and Franchisees) that together across the country have dozens of
thousands of appointments per day.
Do you suggest removing the bucket , making
Maybe I’m missing something, but it seems to me that the bucket might be a
little overkill for a scheduling system. Do you expect people to have
millions of appointments?
On Sun, Nov 4, 2018 at 12:46 PM I PVP wrote:
> Could you please provide advice on the modeling approach for the following
>
Could you please provide advice on the modeling approach for the following
appointment scheduling scenario?
I am struggling to model in an way that allows to satisfy the requirement to be
able to update an appointment, specially to be able to change the start
datetime and consequently the buc
t;)
meaning C* will read all the data and then filter for time. Spark
jobs runs for hours even for smaller time frames.
what is the right approach for data modeling for such queries?. I
want to get a general idea of things to look for when modeling such
data.
really appreciate all the help from th
understanding this runs a full table scan. as shown in spark UI
>> (from DAG visualization "Scan org.apache.spark.sql.cassandra
>> .CassandraSourceRelation@32bb7d65") meaning C* will read all the data
>> and then filter for time. Spark jobs runs for hours even for smaller time
>> frames.
>>
>> what is the right approach for data modeling for such queries?. I want to
>> get a general idea of things to look for when modeling such data.
>> really appreciate all the help from this community :). if you need any
>> extra details please ask me here.
>>
>> Regards,
>> Junaid
>>
>>
>>
>
>
Hi All,
I want to know, what will be a limitation in case of using Collections such
as SET, LIST, MAP?
Like in my condition, which inserting Video details, I have to insert
language based such as
Language:- English
Title:- Video Name
Language:- Hindi
Title:- Video_name in Hindi
Language:- Chinese
T
a full table scan. as shown in spark UI
> (from DAG visualization "Scan
> org.apache.spark.sql.cassandra.CassandraSourceRelation@32bb7d65") meaning
> C* will read all the data and then filter for time. Spark jobs runs for
> hours even for smaller time frames.
>
> what is the right approach for data modeling for s
d65") meaning C* will read all the data and
> then filter for time. Spark jobs runs for hours even for smaller time
> frames.
>
> what is the right approach for data modeling for such queries?. I want to
> get a general idea of things to look for when modeling such data.
> really appreciate all the help from this community :). if you need any
> extra details please ask me here.
>
> Regards,
> Junaid
>
>
>
a full table scan. as shown in spark UI
> (from DAG visualization "Scan org.apache.spark.sql.cassandra
> .CassandraSourceRelation@32bb7d65") meaning C* will read all the data and
> then filter for time. Spark jobs runs for hours even for smaller time
> frames.
>
> what is t
from DAG visualization "Scan org.apache.spark.sql.cassandra.
CassandraSourceRelation@32bb7d65") meaning C* will read all the data and
then filter for time. Spark jobs runs for hours even for smaller time
frames.
what is the right approach for data modeling for such queries?. I want to
get a g
er for time. Spark jobs runs for
hours even for smaller time frames.
what is the right approach for data modeling for such queries?. I want to
get a general idea of things to look for when modeling such data.
really appreciate all the help from this community :). if you need any
extra details
yes it would. Whether next_billing_date is timestamp or date wouldn't make
any difference on scanning all partitions. If you want to them to be on the
same node, you can use composite key, but there's a trade off. The nodes
may get unbalanced, so you have to do the math to figure out if your
specif
gt; health insurance is with effective/expiration day. Commonly called
> bi-temporal data modeling.
>
> How people model bi-temporal models varies quite a bit from first hand
> experience, but the common thing is to have transaction timestamp, effective
> day and expiration day. Thi
Ignoring noSql for a minute, the standard way of modeling this in car and
health insurance is with effective/expiration day. Commonly called
bi-temporal data modeling.
How people model bi-temporal models varies quite a bit from first hand
experience, but the common thing is to have transaction
Hi Denis,
You might want to have a look at
- Materialized views
http://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views
- Secondary index
https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html
My 2 cents: make sure to understand the implications before moving forwa
Hi!
I have question regarding data modelling.
Let’s say that I have `subscriptions` table with two columns `subscription_id
text` and `next_billing_date timestamp`.
How do I model a table to efficiently query all subscriptions due today
(something like `where next_billing_date <= today`)
Or use graphframes (Spark) over cassandra to store separately a graph of
users and followers and next a table of tweet. You will be able to join
data between those 2 structures using spark.
2016-05-31 14:27 GMT+02:00 :
> Hello,
>
> >* First, Is this data modeling correct for
Hello,
> First, Is this data modeling correct for follow base (follower,
following actions) social network?
For social network, I advise you to see Graph Databases, over Cassandra
Example : https://academy.datastax.com/resources/getting-started-graph-databases
De : Mohammad Kerm
We are using Cassandra for our social network and we are designing/data
modeling tables we need, it is confusing for us and we don't know how to
design some tables and we have some little problems!
*As we understood for every query we have to have different tables*, and
for example user
On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad wrote:
> You could keep a "num_buckets" value associated with the client's account,
> which can be adjusted accordingly as usage increases.
>
Yes, but the adjustment problem is tricky when there are multiple
concurrent writers. What happens when yo
You could keep a "num_buckets" value associated with the client's account,
which can be adjusted accordingly as usage increases.
On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote:
> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
> clintlmar...@coolfiretechnologies.com> wrote:
>
>> What sort of dat
On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
clintlmar...@coolfiretechnologies.com> wrote:
> What sort of data is your clustering key composed of? That might help some
> in determining a way to achieve what you're looking for.
>
Just a UUID that acts as an object identifier.
>
> Clint
> On Jan
What sort of data is your clustering key composed of? That might help some
in determining a way to achieve what you're looking for.
Clint
On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote:
> Hi Nate,
>
> Yes, I've been thinking about treating customers as either small or big,
> where "small" ones have
Hi Nate,
Yes, I've been thinking about treating customers as either small or big,
where "small" ones have a single partition and big ones have 50 (or
whatever number I need to keep sizes reasonable). There's still the problem
of how to handle a small customer who becomes too big, but that will hap
Hi Jack,
Thanks for your response. My answers inline...
On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky
wrote:
> Jim, I don't quite get why you think you would need to query 50 partitions
> to return merely hundreds or thousands of rows. Please elaborate. I mean,
> sure, for that extreme 100th
>
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
Jim, I don't quite get why you think you would need to query 50 partitions
to return merely hundreds or thousands of rows. Please elaborate. I mean,
sure, for that extreme 100th percentile, yes, you would query a lot of
partitions, but for the 90th percentile it would be just one. Even the 99th
per
Thanks for responding!
My natural partition key is a customer id. Our customers have widely
varying amounts of data. Since the vast majority of them have data that's
small enough to fit in a single partition, I'd like to avoid imposing
unnecessary overhead on the 99% just to avoid issues with the
You should endeavor to use a repeatable method of segmenting your data.
Swapping partitions every time you "fill one" seems like an anti pattern to
me. but I suppose it really depends on what your primary key is. Can you
share some more information on this?
In the past I have utilized the consiste
A problem that I have run into repeatedly when doing schema design is how
to control partition size while still allowing for efficient multi-row
queries.
We want to limit partition size to some number between 10 and 100 megabytes
to avoid operational issues. The standard way to do that is to figur
of parentheses
);
INSERT INTO users(first_name, last_name, id) values (‘neha’, ‘dave’, 1);
SELECT * FROM users where first_name = 'rob' and last_name = 'abb';
From: Neha Trivedi [mailto:nehajtriv...@gmail.com]
Sent: Thursday, April 30, 2015 10:16 AM
To: user@cassandra.apac
Helle all,
I was wondering which data model of the Three describe below better in
terms of performance. Seems 3 is good.
*#1. log with 3 Index*
CREATE TABLE log (
id int PRIMARY KEY,
first_name set,
last_name set,
dob set
);
CREATE INDEX log_firstname_index ON test.log
te:
>
>> Hi All,
>>I was just googling around and reading the various articles on data
>> modeling in cassandra. All of them talk about working backwards, i.e.,
>> first now what type of queries you are going to make and select a right
>> data model which can suppor
e going to need to model in a scalable way.
On Tue, Jan 6, 2015 at 11:47 AM, Srinivasa T N wrote:
> Hi All,
>I was just googling around and reading the various articles on data
> modeling in cassandra. All of them talk about working backwards, i.e.,
> first now what type of qu
tional approach where you get whatever you
want, but your performance may degrade when you do the costly joins.
Regards,
James
On Tue, Jan 6, 2015 at 9:47 AM, Srinivasa T N wrote:
> Hi All,
>I was just googling around and reading the various articles on data
> modeling in cassandra.
Hi All,
I was just googling around and reading the various articles on data
modeling in cassandra. All of them talk about working backwards, i.e.,
first now what type of queries you are going to make and select a right
data model which can support those queries efficiently. But one thing I
View this message in context:
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594539.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
ds of user, if we also
> denormalize the like board, everytime that pin is liked by another user we
> would have to update the like count in thousands of like boards.
>
> Does normalize work better in this case or cassandra can handle this kind
> of
> write load?
>
>
>
>
://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-modeling-for-Pinterest-like-application-tp7594481p7594517.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at
Nabble.com.
QRS pattern will help you to separate the write and read stages, also
heavy unit and integration testing.
On Fri, May 16, 2014 at 5:14 AM, ziju feng wrote:
> Hello,
>
> I'm working on data modeling for a Pinterest-like project. There are
> basically two main concepts: Pin a
Hello,
I'm working on data modeling for a Pinterest-like project. There are
basically two main concepts: Pin and Board, just like Pinterest, where pin
is an item containing an image, description and some other information such
as a like count, and each board should contain a sorted list of
You’re right. I didn’t catch that. No need to have email in the PRIMARY KEY.
On Jan 21, 2014, at 5:11 PM, Jon Ribbens
wrote:
> On Tue, Jan 21, 2014 at 10:40:39AM -0800, Drew Kutcharian wrote:
>> Thanks, I was actually thinking of doing that. Something along the lines
>> of
>> CREATE TABLE
On Tue, Jan 21, 2014 at 10:40:39AM -0800, Drew Kutcharian wrote:
>Thanks, I was actually thinking of doing that. Something along the lines
>of
>CREATE TABLE user (
> idtimeuuid PRIMARY KEY,
> emailtext,
> nametext,
> ...
>);
>CREATE TABLE user_ema
It's a broad topic, but I mean all of the best practices alluded to by
writeups like this.
http://www.technicalinfo.net/papers/WebBasedSessionManagement.html
-Tupshin
On Jan 21, 2014 11:37 AM, "Drew Kutcharian" wrote:
> Cool. BTW, what do you mean by have additional session tracking ids?
> What
Cool. BTW, what do you mean by have additional session tracking ids? What’d
that be for?
- Drew
On Jan 21, 2014, at 10:48 AM, Tupshin Harper wrote:
> It does sound right.
>
> You might want to have additional session tracking id's, separate from the
> user id, but that is an additional imp
It does sound right.
You might want to have additional session tracking id's, separate from the
user id, but that is an additional implementation detail, and could be
external to Cassandra. But the approach you describe accurately describes
what I would do as a first pass, at least.
-Tupshin
On
Thanks, I was actually thinking of doing that. Something along the lines of
CREATE TABLE user (
idtimeuuid PRIMARY KEY,
emailtext,
nametext,
...
);
CREATE TABLE user_email_index (
email text,
id timeuuid,
PRIMARY KEY (email, id)
);
And during registration, I would ju
One CQL row per user, keyed off of the UUID.
Another table keyed off of email, with another column containing the UUID
for lookups in the first table. Only registration will require a
lightweight transaction, and only for the purpose of avoiding duplicate
email registration race conditions.
-Tup
A shameful bump ;)
> On Jan 20, 2014, at 2:14 PM, Drew Kutcharian wrote:
>
> Hey Guys,
>
> I’m new to CQL (but have been using C* for a while now). What would be the
> best way to model a users table using CQL/Cassandra 2.0 Lightweight
> Transactions where we would like to have:
> - A unique
Hey Guys,
I’m new to CQL (but have been using C* for a while now). What would be the best
way to model a users table using CQL/Cassandra 2.0 Lightweight Transactions
where we would like to have:
- A unique TimeUUID as the primary key of the user
- A unique email address used for logging in
In t
on I would like to get it on cassandra is because currently at peak
times this is an extremely write heavy application since people are
registering for a conference that launched or filling out a new survey, so
everyone comes in all at once.
Also, if anyone is in the bay area and wants to discus
> b) the "batch_mutate" advantages are better, for the communication
> "client<=>coordinator node" __and__ for the communications "coordinator
> node<=>replicas".
Yes. A single row mutation can write to many CFs.
> Is there any expe
Thanks Aaron.
It helped.
Let's me rephrase a little bit my questions. It's about data modeling impact on
"batch_mutate" advantages.
I have one CF for storing data, and ~10 (all different) CF used for indexing
that data.
when adding a piece of data, I need to add indexes t
> So, one alternative design for indexing CF could be:
> rowkey = folder_id
> colname = (indexed value, timestamp, file_id)
> colvalue = ""
>
If you always search in a folder what about
rowkey =
colname =
(That's closer to secondary indexes in cassandra with the addition of the
folder_id)
>
If you create a reverse index on all column names, where the single row has a
key something like "the_index" and each column name is the column name that has
been used else where, you are approaching the "twitter global timeline anti
pattern"(™).
Basically you will end up with a hot row that h
Hi,
I have a use case that sounds like storing data associated with files. So, I
store them with the CF:
rowkey = (folder_id, file_id)
colname = property name (about the file corresponding to file_id)
colvalue = property value
And I have CF for "manual" indexing:
rowkey = (folder_id, indexed val
One thing I can do is to have a client-side cache of the keys to reduce the
number of updates.
On Apr 5, 2013, at 6:14 AM, Edward Capriolo wrote:
> Since there are few column names what you can do is this. Make a reverse
> index, low read repair chance, Be aggressive with compaction. It will
Since there are few column names what you can do is this. Make a reverse
index, low read repair chance, Be aggressive with compaction. It will be
many extra writes but that is ok.
Other option is turn on row cache and try read before write. It is a good
case for row cache because it is a very smal
I don't really need to answer "what rows contain column named X", so no need
for a reverse index here. All I want is a distinct set of all the column names,
so I can answer "what are all the available column names"
On Apr 4, 2013, at 4:20 PM, Edward Capriolo wrote:
> Your reverse index of "wh
Your reverse index of "which rows contain a column named X" will have very
wide rows. You could look at cassandra's secondary indexing, or possibly
look at a solandra/solr approach. Another option is you can shift the
problem slightly, "which rows have column X that was added between time y
and tim
Hi Edward,
I anticipate that the column names will be reused a lot. For example, key1 will
be in many rows. So I think the number of distinct column names will be much
much smaller than the number of rows. Is there a way to have a separate CF that
keeps track of the column names?
What I was t
You can not get only the column name (which you are calling a key) you can
use get_range_slice which returns all the columns. When you specify an
empty byte array (new byte[0]{}) as the start and finish you get back all
the columns. From there you can return only the columns to the user in a
format
Hey Guys,
I'm working on a project and one of the requirements is to have a schema free
CF where end users can insert arbitrary key/value pairs per row. What would be
the best way to know what are all the "keys" that were inserted (preferably w/o
any locking). For example,
Row1 => key1 -> XXX,
my suggestions.
From: aa...@thelastpickle.com
Subject: Re: Data Modeling: Comments with Voting
Date: Tue, 2 Oct 2012 10:39:42 +1300
To: user@cassandra.apache.org
You cannot (and probably do not want to) sort continually when the voting is
going on.
You can store the votes using CounterColumnTypes in
y with a
>> dummy row id 'sort_by_votes_list' and column names can be a composite of
>> number of votes , and comment id ( as more than 1 comment can have the same
>> votes)
>>
>>
>> Regards,
>> Roshni
>>
>> > Date: Wed, 26 Sep 20
t can have the same
> votes)
>
>
> Regards,
> Roshni
>
> > Date: Wed, 26 Sep 2012 17:36:13 -0700
> > From: k...@mustardgrain.com
> > To: user@cassandra.apache.org
> > CC: d...@venarc.com
> > Subject: Re: Data Modeling: Comments with Voting
>
che.org
> CC: d...@venarc.com
> Subject: Re: Data Modeling: Comments with Voting
>
> Depending on your needs, you could simply duplicate the comments in two
> separate CFs with the column names including time in one and the vote in
> the other. If you allow for updates to the comment
Depending on your needs, you could simply duplicate the comments in two
separate CFs with the column names including time in one and the vote in
the other. If you allow for updates to the comments, that would pose
some issues you'd need to solve at the app level.
On 9/26/12 4:28 PM, Drew Kutch
Hi Guys,
Wondering what would be the best way to model a flat (no sub comments, i.e.
twitter) comments list with support for voting (where I can sort by create time
or votes) in Cassandra?
To demonstrate:
Sorted by create time:
- comment 1 (5 votes)
- comment 2 (1 votes)
- comment 3 (no votes)
wants to update individual fields , while
B is better if one wants easier paging, reading multiple items at once
in one read. etc. The details are in this discussion thread
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-another-question-td7581967.html
I had an a
On Wed, Sep 19, 2012 at 3:32 PM, Brian O'Neill wrote:
> That said, I'm keeping a close watch on:
> https://issues.apache.org/jira/browse/CASSANDRA-3647
>
> But if this is CQL only, I'm not sure how much use it will be for us
> since we're coming in from different clients.
> Anyone know how/if coll
On Wed, Sep 19, 2012 at 2:00 PM, Roshni Rajagopal
wrote:
> Hi,
>
> There was a conversation on this some time earlier, and to continue it
>
> Suppose I want to associate a user to an item, and I want to also store 3
> commonly used attributes without needing to go to an entity item column
> famil
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-another-question-td7581967.html
I had an additional question,
as its being said, that CQL is the direction in which cassandra is moving, and
there's a lot of effort in making CQL the standard,
How does approach
gt; = {name: Betty Crocker,descr: Cake, Qty: 5},
> ={name: Nutella,descr: Choc spread, Qty: 15}
> }
>
> Essentially A is better if one wants to update individual fields , while B
> is better if one wants easier paging, reading multiple items at once in one
> read. etc. The details are in this
yes, you are right, it depend on use cases.
I suggested it is a better choice not only choice. JSON will be better if
any filed change re-write whole data without reading.
I tend to use JSON more, where my data does not change or very rarely, Like
storing demoralized JSON data for analytic purpose.
i would respectfully disagree, what you have said is true but it really
depends on the use case.
1) do you expect to be doing updates to individual fields of an item, or
will you always update all fields at once? if you are doing separate
updates then the first is definitely easier to handle
First is better choice, each filed can be updated separately(write only).
Second you have to take care json yourself (read first-modify-then write).
On Fri, Aug 24, 2012 at 5:45 PM, Roshni Rajagopal <
roshni.rajago...@wal-mart.com> wrote:
> Hi,
>
> Suppose I have a column family to associate a us
Hi,
Suppose I have a column family to associate a user to a dynamic list of items.
I want to store 5-10 key information about the item, & no specific sorting
requirements are there.
I have two options
A) use composite columns
UserId1 : {
: = Betty Crocker,
: = Cake
: = 5
: = Nutella,
: = C
Just read up on composite keys and what looks like future deprecation of super
column families.
I guess Option 2 would now be:
- column family with composite key from grouping and location
> e.g.
> '0:0,0': { meta }
> ...
> '0:10,10' : { meta }
> '1:10,0' : {meta}
> …
> '1:20, 10': {meta}
I have a question on what the best way is to store the data in my schema.
The data
I have millions of nodes, each with a different cartesian coordinate. The keys
for the nodes are hashed based on the coordinate.
My search is a proximity search. I'd like to find all the nodes within a given
di
I would bucket the time stats as well.
If you write all the attributes at the same time, and always want to read them
together, storing them in something like a JSON blob is legitimate approach.
Other Aaron, can you elaborate on
> I'm not using composite row keys (it's just
> AsciiType) as th
On Thu, May 17, 2012 at 8:55 AM, jason kowalewski
wrote:
> We have been attempting to change our data model to provide more
> performance in our cluster.
>
> Currently there are a couple ways to model the data and i was
> wondering if some people out there could help us out.
>
> We are storing tim
On Wed, May 2, 2012 at 8:22 AM, Tim Wintle wrote:
> On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote:
>> Tens or a few hundred MB per row seems reasonable. You could do
>> thousands/MB if you wanted to, but that can make things harder to
>> manage.
>
> thanks (Both Aarons)
>
>> Depending on
On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote:
> Tens or a few hundred MB per row seems reasonable. You could do
> thousands/MB if you wanted to, but that can make things harder to
> manage.
thanks (Both Aarons)
> Depending on the size of your data, you may find that the overhead of
> ea
I would try to avoid 100's on MB's per row. It will take longer to compact and
repair.
10's is fine. Take a look at in_memory_compaction_limit and thrift_frame_size
in the yaml file for some guidance.
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpi
On Tue, May 1, 2012 at 10:20 AM, Tim Wintle wrote:
> I believe that the general design for time-series schemas looks
> something like this (correct me if I'm wrong):
>
> (storing time series for X dimensions for Y different users)
>
> Row Keys: "{USET_ID}_{TIMESTAMP/BUCKETSIZE}"
> Columns: "{DIME
I believe that the general design for time-series schemas looks
something like this (correct me if I'm wrong):
(storing time series for X dimensions for Y different users)
Row Keys: "{USET_ID}_{TIMESTAMP/BUCKETSIZE}"
Columns: "{DIMENSION_ID}_{TIMESTAMP%BUCKETSIZE}" -> {Counter}
But I've not fou
9A0C-0305E82C3301") = "FR",
>>> ("country", "21EC2020-3AEA-1069-A2DD-08002B30309D") = "EN",
>>> ("firstName", "3F2504E0-4F89-11D3-9A0C-0305E82C3301") = "Carl",
>>> ("firstName&qu
EC2020-3AEA-1069-A2DD-08002B30309D") = "Doe",
>> ...
>> }]
>>
>>
>> As far as i understand it seems to be the fastest way to retrieve all values
>> of a field in the same order.
>> To update, i don't need to read before writing.
>>
>> Problem : the row will be very large : 300 000 000 of columns. I can split
>> it in different rows based on the value of the specific field, for example
>> country.
>>
>> ---
>> Solution 3:
>>
>> Wide Row by field
>>
>> Column Family : customers
>> One row by field : so 300 rows
>> Columns : ID = FieldValue
>>
>> Benefits :
>> The row will be smaller, 1 000 000 colums.
>>
>> Problem :
>> Update seems more expensive, for every customer to update, i need to update
>> 300 rows.
>>
>> ---
>>
>> Witch solution seems to be the good one ? Does Cassandra is really a good
>> fit for this use case ?
>>
>> Thanks
>>
>> Alexis Coudeyras
>>
>> --
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-tp7300846p7300846.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>
nt rows based on the value of the specific field, for example
> country.
>
> ---
> Solution 3:
>
> Wide Row by field
>
> Column Family : customers
> One row by field : so 300 rows
> Columns : ID = FieldValue
>
> Benefits :
> The row will be smaller, 1 000 000 colums.
>
> Problem :
> Update seems more expensive, for every customer to update, i need to update
> 300 rows.
>
> ---
>
> Witch solution seems to be the good one ? Does Cassandra is really a good
> fit for this use case ?
>
> Thanks
>
> Alexis Coudeyras
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Data-Modeling-tp7300846p7300846.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
here's what i ended up, this seems to work for me.
@Test
public void readAndWriteSettingTTL() throws Exception {
int ttl = 2;
String columnFamily = "Quote";
Set symbols = new HashSet(){{
add("appl");
You wouldn't query for all the keys that have a column name x exactly.
Instead what you would do is for sector x grab your list of symbols S.
Then you would get the last column for each of those symbols (which you do
in different ways depending on the API), and then test if that date is
within yo
with the quote CF below how would one query for all keys that have a
column name value that have a timeuuid of later than x minutes? i need
to be able to find all symbols that have not been fetch in x minutes by
sector. i know i get list of symbol by sector from my sector CF.
thanks,
deno
O
1 - 100 of 123 matches
Mail list logo