ained.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay
tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin
xample). Cloud can work (Netflix
uses Cassandra on AWS), but your performance will be a lot more consistent
on physical hardware and Cassandra like all databases likes lots of RAM
(although this can be offset some with SSD's) which tends to be expensive
in the cloud.
--
Aaron Turner
http://synf
On Wed, Sep 11, 2013 at 4:40 PM, Shahab Yunus wrote:
> Thanks Aaron for the reply. Yes, VMs or the nodes will be in cloud if we
> don't go the physical route.
>
> " Look how Cassandra scales and provides redundancy. "
> But how does it differ for physical machines or VMs (in cloud.) Or after
> yo
Physical machines unless you're running your cluster in the cloud (AWS/etc).
Reason is simple: Look how Cassandra scales and provides redundancy.
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for
Have you tried running your code in GDB to find which line is causing the
error? That would be what I'd do first.
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
https://github.com/synfinatic/tcpreplay - Pcap editing and replay tools for
Unix & Windows
Those who woul
one socket is just
> as fast as 10/20…..I would love to know the truth/answer to that though.
>
> Later,
> Dean
>
>
> From: Aaron Turner mailto:synfina...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> us
e new java driver is, but have not verified(I
> hope it is)
>
> Dean
>
> From: Aaron Turner mailto:synfina...@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandr
t Thrift a binary protocol?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benj
ts are *cheap*. There's almost literally zero I/O
associated with a snapshot. Backing up all that data off the system
is a different story, but at least it's large sequential reads which
is pretty well optimized.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcprep
build a 4TB disk
array, doesn't mean you can have a single Cassandra node with 4TB of
data. Typically, people around here seem to recommend ~400GB, but
that depends on hardware.
Honestly, for the price of a single computer you could test this
pretty easy. That's what I'd do.
--
Aa
l it work? Possibly. What are the disadvantages? Well
it depends on a bunch of things you haven't told us.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential
you care
about the facility and priority, then you'll need to some how encode
that in the row/column name. Otherwise you'll have to filter out
records post-query. So for read performance, chances are you'll have
to insert the information multiple times depending on your search
par
ere
> http://code.google.com/a/apache-extras.org/p/cassandra-jdbc/
>
> Thanks,
> Matt
>
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liber
hat is this
> specifically for?
> Thanks again for the help!
>
>
> Renato M.
>
> 2013/1/15 Aaron Turner :
>> I don't think so. Usually you'd use either a Time-UUID or something
>> like epoch time as the column name to get a range of columns by time
>
ave two questions here:
> 1) What is the timestamp column used for?
> 2) How can I retrieve this timestamp column using Hector client?
>
> Thanks in advance!
>
>
> Renato M.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap edi
a Aleixo
>> Bacharel em Ciência da Computação pela UFG
>> Mestrando em Ciência da Computação pela UFG
>> Programador no LUPA
>>
>
>
>
> --
> Everton Lima Aleixo
> Bacharel em Ciência da Computação pela UFG
> Mestrando em Ciência da Computação pela UFG
> Program
>> sort of EOL issues you're referring to. Unfortunately previous
>> requests on this list for such a statement have gone unanswered.
>>
>> The non-official response is that various people run in production
>> with Java 7 and it seems to work. :
>1) run a major compaction
>> >2) code up sstablesplit
>> >3) profit!
>> >
>> >This method incurs a management penalty if not automated, but is
>> >otherwise the most efficient way to deal with tombstones and obsolete
>> >data.. :D
>> >
&g
> leveled compaction will kill your performance. get patch from jira for
> > maximum sstable size per CF and force cassandra to make smaller tables,
> they
> > expire faster.
> >
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.
eparating my read heavy from write heavy CF's because generally
speaking they benefit from different compaction methods. But don't go
crazy creating 1000's of CF's either.
Hope that gives you some ideas to investigate further!
--
Aaron Turner
http://synfin.net/ Tw
h isn't replicated to all the nodes
for whatever reason, then the data can come back. Repair just
guarantees that all the nodes that should of gotten the tombstones got
them.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and repla
ated tool for running
> repairs every X days(this should really be an automated/schedulable
> thing)???
I use a cron job. It's a good idea to use the '-pr' flag btw. Also,
you only need to run repair against CF's which actually have deletes.
--
Aaron Turner
http:/
meone has to already have an automated project for this, anyone
> know of one??
>
> Thanks,
> Dean
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would giv
e applications
> which would be huge and then all the tables which is large, it just keeps
> growing. It is a very nice concept(all data in one location), though we
> will see how implementing it goes.
>
> How much overhead per column family in RAM? So far we have around 4000
>
On Thu, Sep 27, 2012 at 7:35 PM, Marcelo Elias Del Valle
wrote:
>
>
> 2012/9/27 Aaron Turner
>>
>> How strict are your security requirements? If it wasn't for that,
>> you'd be much better off storing data on a per-statistic basis then
>> per-dev
yeah, there isn't a hard limit for the number of CF's, but there
is overhead associated with each one and so I wouldn't consider your
design as scalable. Generally speaking, hundreds are ok, but
thousands is pushing it.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
h
disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.
On Tue, Sep 25, 2012 at 10:36 AM, Віталій Тимчишин wrote:
> See my comments inline
>
> 2012/9/25 Aaron Turner
>>
>> On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин
>> wrote:
>> > Why so?
>> > What are pluses and minuses?
>> > As
ble,
but even then I'd guesstimate 50MB is far more reasonable then 512MB.
-Aaron
> 2012/9/23 Aaron Turner
>>
>> On Sun, Sep 23, 2012 at 8:18 PM, Віталій Тимчишин
>> wrote:
>> > If you think about space, use Leveled compaction! This won't only allow
>> >
of data. I'm thinking about repairing the rolling 48 hours CF more
often and reducing the gc_grace time so that compaction has a better
chance of removing stale data from disk.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and
7;s freaking huge. From my conversations
with various developers 5-10MB seems far more reasonable. I guess it
really depends on your usage patterns, but that seems excessive to me-
especially as sstables are promoted.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcp
e so that compactions take less space in the
> future meaning we can buy less nodes?
>
> Thanks,
> Dean
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up es
On Mon, Sep 10, 2012 at 10:17 PM, Morantus, James (PCLN-NW)
wrote:
> Hey folks,
>
>
>
> Can you recommend any tools to pull data from MySQL and pump it to
> Cassandra?
This: http://www.datastax.com/dev/blog/bulk-loading
--
Aaron Turner
http://synfin.net/ Twitter
or
read & writes. I would strongly suggest 3 nodes per DC if you care
about consistent reads. Generally speaking, 3 nodes per-DC is
considered the recommended minimum number of nodes for a production
system.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.
ep.
> Secondly, what's the need for sleep 120?
just give the cluster a chance to settle down between repairs...
there's no real need for it, just is there "because".
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editi
ue, Aug 28, 2012 at 7:03 AM, Edward Capriolo wrote:
> You can consider adding -pr. When iterating through all your hosts
> like this. -pr means primary range, and will do less duplicated work.
>
> On Mon, Aug 27, 2012 at 8:05 PM, Aaron Turner wrote:
>> I use cron. On one box I j
om disclosure under
> applicable law. Global Relay will not be liable for any compliance or
> technical information provided herein. All trademarks are the property of
> their respective owners.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.ne
just under 4 months of data is less then 2GB! I'm pretty
thrilled.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safe
. Most
people don't use them because of the rather poor performance
characteristics SC's have.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liber
er
or subsequent compaction activity? All my CF's I'll be writing to
use compression and leveled compaction.
Right now my Cassandra data store has about 4 months of data and we
have 5 years of historical (not sure yet how much we'll actually load
yet, but minimally 1 years worth).
caching, pooling, etc) of Cassandra 1.X.
> Right now i come to know that following client exists:
>
> 1) Hector(Java)
> 2) Thrift (Java)
> 3) Kundera (Java)
>
>
> With Regards,
> Amit
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.syn
ould
>> you select to serve HTTP requests to ensure you get:
>>
>> a) The best support from the cassandra community (e.g. timely updates
>> of drivers, better stability)
>> b) Optimal efficiency between webservers and cassandra cluster, in
>> terms of the pe
table.
>
> See http://wiki.apache.org/cassandra/MemtableSSTable
>
> On Sat, Aug 11, 2012 at 11:03 AM, Aaron Turner wrote:
>> So how does that work? An sstable is for a single CF, but it can and
>> likely will have multiple rows. There is no read to write and as I
>> unde
; That is, in the spesial case where you get sstable file per column/value, you
> are correct, but normally, I guess most of us are storing more per key.
>
> Regards,
> Terje
>
> On 11 Aug 2012, at 10:34, Aaron Turner wrote:
>
>> Curious, but does cassandra store the row
Curious, but does cassandra store the rowkey along with every
column/value pair on disk (pre-compaction) like Hbase does? If so
(which makes the most sense), I assume that's something that is
optimized during compaction?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
t that information? do we scan through each node row as we will have
> row for each node?
>
> thanks
>
> -Aaron Turner wrote: -
> To: user@cassandra.apache.org
> From: Aaron Turner
> Date: 08/09/2012 07:38PM
> Subject: Re: Cassandra data model help
>
> On Thu,
in the new CF and then deletes the original row. By doing
this, my disk space requirements (before replication) went from over
1.1TB/year to 305GB/year.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windo
on Hector/pycassa/etc. Of course, you still need to
write code around it, and if that's Java I'm not sure how much it
matters.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Thos
t to write a recipe. Several
> people added content to the first edition and it would be great to see
> that type of participation again.
>
> Thank you,
> Edward
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and repl
debug this further to see what is causing this?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Libert
for a given user/stat combination.
If I need to get multiple stats per user, I just use more threads on
the client side. I'm not using composite row keys (it's just
AsciiType) as that can lead to hotspots on disk. My timestamps are
also just plain unix epoch's as that takes less space
he performance? Thanks!
Have you tried using more threads on the client side? Generally
speaking, when I need faster read/write performance I look for ways to
parallelize my requests and it scales pretty much linearly.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http:/
On Wed, May 2, 2012 at 8:22 AM, Tim Wintle wrote:
> On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote:
>> Tens or a few hundred MB per row seems reasonable. You could do
>> thousands/MB if you wanted to, but that can make things harder to
>> manage.
>
> thanks (Bot
one tombstone for the row delete, rather
then 288 for each column deleted.
I don't use compression.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to
ng what parameters I
> could tweak to improve the performance.
>
Is your client mult-threaded? The single threaded performance of
Cassandra isn't at all impressive and it really is designed for
dealing with a lot of simultaneous requests.
--
Aaron Turner
http://synfin.net/
readed performance of Cassandra isn't anything to write home about.
Anyways, I'm not sure I would recommend JRuby+Hector if this is the
only reason you'd use JRuby over MRI, but if you might find the
plethora of Java libraries useful it's definitely worth looking into.
--
Aaron Tu
On Wed, Dec 7, 2011 at 3:59 PM, Christof Bornhoevd wrote:
> Hi All,
>
> I'm using Cassandra 1.0.3. Can I have (simple) Columns and SuperColumns
> within the same row of a SuperColumnFamily?
Nope. Personally, i avoid super columns all together.
--
Aaron Turner
n
the Rails side, not Hector/Cassandra which has been pretty rock solid
so far in my testing).
I basically wrote my own custom ORM on top of Hector. It's not AR
compliant or anything like that and pretty application specific.
Mostly it just tries to simplify the Hector API.
--
Aaron Tur
he right way to go?
>>>>> That is, the requirement is for a large data store, that can move with
>>>>> product changes and requirements swiftly.
>>>>> Given that in Cassandra one thinks hard about the queries, and then
>>>>> builds a model
ry real
> and serious performance loss, I'm working on a strategy of moving forward.
>
> If the tombstones do cause such problem, where should I be looking for
> performance bottlenecks?
> Is it disk, CPU or something else? Thing is, I don't see anything
> outstanding in
1, 2011 at 10:29 AM, Aaron Turner wrote:
> Lately I've been working on some data processing code in Cassandra and
> apparently I don't write bug-free code the very first time. :) Hence,
> while debugging, I often need to look at data in Cassandra to see what
> my code is do
x27;t efficient for the server to do, but
the client could do that. I really don't care too much about
performance since this is a debugging/diagnostics tool.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &a
ct Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up
Haven't found this in the docs yet, but is the TTL the number of
seconds in the future to expire? Unix epoch time to expire?
something else?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
ed to leave such utilities external. At its core was "get
> and put".
> Did I miss something in my reading of intent?
> -Sarah
>
> -Original Message-
> From: Aaron Turner [mailto:synfina...@gmail.com]
> Sent: Sunday, November 06, 2011 8:25 AM
> To: user@cass
1. Basic SQL-like summary transforms for both CQL and Thrift API clients like:
SUM
AVG
MIN
MAX
2. Native 64bit UNsigned datatype
3. Add support for matching column names via LIKE (% and _ wildcards)
for ascii type
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http
guration. I am learning on
> the job so to speak.
>
> Thank you kindly for any comments or pointers.
>
> # Cassandra store properties
> # keyspace=
> # name=
> # class=
> # qualifier=
> # family=
> # type=
> # cluster=
> # host=
>
> --
> Lewis
&g
(since 2^16/8 = 8K)
Alternatively, you could store 16K columns per row (each column is a
/24) and each column would have 8 bytes. Off the top of my head I'm
not sure which would be faster, but the first solution would be more
disk space efficient. If you need to update your bitmasks regul
Seems fine now.
2011/10/13 Patricio Echagüe :
> Hi Aaron. does it still happen ? We didn't set up any password on the page.
>
> On Tue, Oct 11, 2011 at 9:15 AM, Aaron Turner wrote:
>>
>> Just a FYI:
>>
>> http://hector-client.org is requesting a username/p
you have a personal blog and want us to include the link, let us know.
> Feedback is always welcome.
> Thanks!
> Hector Team.
>
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those wh
use SuperColumns since
Cassandra has to read all the supercolumns anyways, so storing as json
requires less overhead.
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essent
On Mon, Jul 25, 2011 at 11:24 AM, Sylvain Lebresne wrote:
> On Mon, Jul 25, 2011 at 7:35 PM, Aaron Turner wrote:
>> On Sun, Jul 24, 2011 at 3:36 PM, aaron morton
>> wrote:
>>> What's your use case ? There are people out there having good times with
&
on the client and try
again later, but that's not what timeout means. Without any means to
recover I've actually lost a lot of reliability that I currently have
with my single PostgreSQL server backed data store.
Right now I'm trying to come up with a way that my distributed snmp
po
t; yes, but with regular columns, retry is OK, while counter is not.
I know I've heard that fixing this issue is "hard". I've assumed this
to mean "don't expect a fix anytime soon". Is that accurate?
Beginning to start having second thoughts that Cassandra is
EY =
> '1_20110728_ifoutmulticastpkts';
> cqlsh>
> _
> [default@test] list counts;
> Using default limit of 100
> ---
> RowKey: 1_20110728_ifoutmulticastpkts
> => (counter=12, value=16)
> => (counter=1310367600,
ve at character '+'
Frankly, I'm about ready to open a ticket against 0.8.1 saying
CQL/Counter support does not work at all.
Or is there a trick which isn't documented in the ticket? I tried
reading the Java code referred to in ticket #2473, but i'm over my
head.
On Tue, Jul
line 1:53 no viable alternative at character '+'
On Tue, Jul 12, 2011 at 5:35 PM, Jonathan Ellis wrote:
> Try quoting the column name.
>
> On Tue, Jul 12, 2011 at 5:30 PM, Aaron Turner wrote:
>> Using Cassandra 0.8.1 and cql 1.0.3 and following the syntax mentioned
>
at character '+'
Column names are Long's, hence the INT = INT + INT
Ideas?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a
to 0.7.4 and ran scrub without
>>> > any error. Now 'list CF' in CLI does not return any data as followings:
>>> >
>>> > list User;
>>> > Using default limit of 100
>>> > Input length = 1
>>> >
>>> > I
note: looks like the Perl API isn't being maintained well...
how's the ruby API overall? stable? performance?
Thanks!
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would gi
ly
rollups. Perhaps there's even an open source project or two
implementing this sorta thing? I've found flewton
(https://github.com/flewton/flewton), which is possibly relevant, but
my Java skills are pretty non-existent so I'm having a hard time
figuring it out.
Thanks,
Aar
81 matches
Mail list logo