you define the max size of your heap (-Xmx), but you do not define the max
size of your offheap (MaxMetaspaceSize for jdk 8, PermSize for jdk7), so
you could occupy all of the memory on the instance. your system killed the
process to preserve itself. you should also take into account that the
mem
looks like you're connecting to a service listening on SSL but you don't
have the CA used in your truststore
On Thu, May 24, 2018 at 1:58 PM, Surbhi Gupta
wrote:
> Getting below error:
>
> Caused by: sun.security.validator.ValidatorException: PKIX path building
> failed: sun.security.provider.ce
https://github.com/TargetHolding/pyspark-cassandra
On Mon, Jun 20, 2016 at 1:47 PM, Joaquin Alzola
wrote:
> Hi List
>
> Is there a Spark Cassandra connector in python? Of course there is the one
> for scala ...
>
> BR
>
> Joaquin
> This email is confidential and may be subject to privilege. If y
Snapshot would flush your memtable to disk and you could stream your
sstables out. Incremental backups would be the differences that have
occurred since your last snapshot as far as I'm aware. Since it's
reasonably unfeasible to constantly stream out full snapshots (depending on
the density of yo
Periodic snapshots + incremental backups I think are pretty good in terms
of restoring to point in time. But you must manage cleaning up your
snapshots + incremental backups on your own. I believe that tablesnap (
https://github.com/JeremyGrosser/tablesnap) is a pretty decent approach in
terms of
-archive.com/user@spark.apache.org/msg44793.html
More info on tuning shuffle behavior:
https://spark.apache.org/docs/1.5.1/configuration.html#shuffle-behavior
On Thu, Jun 16, 2016 at 1:57 PM, Cassa L wrote:
> Hi Dennis,
>
> On Wed, Jun 15, 2016 at 11:39 PM, Dennis Lovely wrote:
>
>
You could try tuning spark.shuffle.memoryFraction and
spark.storage.memoryFraction (both of which have been deprecated in 1.6),
but ultimately you need to find out where you are bottlenecked and address
that as adjusting memoryFraction will only be a stopgap. both shuffle and
storage memoryFractio
wait a few more days before I am
sure of that.
Kind regards,
Dennis
Am 19.01.2016 um 19:39 schrieb Femi Anthony:
So is the logging to Cassandra being done via Spark ?
On Wed, Jan 13, 2016 at 7:17 AM, Dennis Birkholz mailto:birkh...@pubgrade.com>> wrote:
Hi together,
we Cassan
ld really appreciate if someone could give me a hint how to fix
this problem, thanks!
Greets,
Dennis
P.s.:
some information about our setup:
Cassandra 2.1.12 in a two Node configuration with replication factor=2
Spark 1.5.1
Cassandra Java Driver 2.2.0-rc3
Spark Cassandra Java Connector 2.10-1.5.0-M2
- "^org.apache.cassandra.metrics.ClientRequest.+"
- "^org.apache.cassandra.metrics.Storage.+"
- "^org.apache.cassandra.metrics.ThreadPools.+"
prefix: "servers.<%= hostname %>"
HTH,
-Dennis
On Wed, Dec 17, 2014 at 9:16 AM, Karl Rieb wro
described but I think there should be a little
more automation in it.
Thanks all,
Dennis
Am 11.04.2014 21:11, schrieb Robert Coli:
On Fri, Apr 11, 2014 at 1:21 AM, Dennis Schwan
mailto:dennis.sch...@1und1.de>> wrote:
The archived commitlogs are copied to the restore directory and afte
we only see the data from the
snapshot, not the commitlogs.
Regards,
Dennis
P.S.: Cassandra 2.0.6
Am 10.04.2014 23:17, schrieb Robert Coli:
On Thu, Apr 10, 2014 at 1:19 AM, Dennis Schwan
mailto:dennis.sch...@1und1.de>> wrote:
do you know any description how to perform a point-in-time re
Hey there,
do you know any description how to perform a point-in-time recovery
using the archived commitlogs?
We have already tried several things but it just did not work.
We have a 20 Node Cluster (10 in each DC).
Thanks in Advance,
Dennis
--
Dennis Schwan
Oracle DBA
Mail Core
1&a
Hi Yuki,
thanks for your answer. I still do nt know if it is expected behaviour
that Cassandra tries to repair these 1280 ranges everytime I run a
nodetool repair on every node?
Regards,
Dennis
Am 03.11.2013 03:27, schrieb Yuki Morishita:
Hi Dennis,
As you can see in the output,
[2013
do at all.
Thanks for your help!
Dennis
--
Dennis Schwan
Oracle DBA
Mail Core
1&1 Internet AG | Brauerstraße 48 | 76135 Karlsruhe | Germany
Phone: +49 721 91374-8738
E-Mail: dennis.sch...@1und1.de | Web: www.1und1.de
Hauptsitz Montabaur, Amtsgericht Montabaur, HRB 6484
Vorstand: Ralph Dom
ow <http://www.sparrowmailapp.com>
>
> On Wednesday, 23 February 2011 at 8:38 PM, Matthew Dennis wrote:
>
> The map returned by multiget_slice (what I suspect is the underlying thrift
> call for getColumnsFromRows) is not a order preserving map, it's a HashMap
> s
The map returned by multiget_slice (what I suspect is the underlying thrift
call for getColumnsFromRows) is not a order preserving map, it's a HashMap
so the order of the returned results cannot be depended on. Even if it was
a order preserving map, not all languages would be able to make use of t
Data is in Memtables from writes before they get flushed (based on first
threshold of ops/size/time exceeded; all are configurable) to SSTables on
disk.
There is a keycache and a rowcache. The keycache caches offsets into
SSTables for the rows. the rowcache caches the entire row. There is also
+1 on avoiding OPP
On Wed, Feb 16, 2011 at 3:27 PM, Tyler Hobbs wrote:
>
> Thanks for you input, but we have a set key that consists of name:timestamp
>> that we are using.. and we need to also retrieve the oldest data as well..
>>
>
> Then you'll need to denormalize and store every row three wa
Assuming you aren't changing the RC, the normal bootstrap process takes care
of all the problems like that, making sure things work correctly.
Most importantly, if something fails (either the new node or any of the
existing nodes) you can recover from it.
Just don't connect clients directly to th
You have a single HAProxy node in front of the cluster or you have a HAProxy
node on each machine that is a client of Cassandra that points at all the
nodes in the cluster?
The former has a SPOF and bottleneck (the HAProxy instance), the latter does
not (and is somewhat common, especially for thin
tor node could have been avoided somehow.
> Does the write on the coordinator node (incase it is not part of the N
> replica nodes for that key) get deleted before response of the write is
> returned back to the client ?
>
>
> On Tue, Feb 15, 2011 at 4:40 PM, Matthew Dennis wro
1. Yes, the coordinator node propagates requests to the correct nodes.
2. most (all?) higher level clients (pycassa, hector, etc) load balance for
you. In general your client and/or the caller of the client needs to catch
exceptions and retry. If you're using RRDNS and some of the nodes are
temp
But you can not depend on such behavior. If you do a write and you get an
unavailable exception, the only thing you know is at that time it was not
able to be placed on all the nodes required to meet your CL. It may
eventually end up on all those nodes, it may not be on any of the nodes or
at the
0.7.1 is what I would go with right now. It's likely you'll eventually have
to upgrade that as well, but moving to other 0.7.x releases should be fairly
painless. Most development is happening on the 0.7 releases, which already
have lots of fixes over the 0.6 series (not to mention performance
im
regardless of increasing RF or not, RR happens based on the
read_repair_chance setting. RR happens after the request has been replied
to though, so it's possible that if you increase the RF and then read that
the read might get stale/missing data. RR would then put the correct value
on all the co
no, it's actually worse to do that.
1) you're introducing single points of failure (your array).
2) you're introducing complexity and expense
3) you're introducing latency
4) you're introducing bottle necks
5) some other reasons...
You do want your commit log on a separate disk though. The o
On Mon, Feb 14, 2011 at 6:58 PM, Dan Hendry wrote:
> > 1) If I insert a key and want to verify which node it went to then how do
> I
> > do that?
>
> I don't think you can and there should be no reason to care. Cassandra
> abstracts where data is being stored, think in terms of consistency levels
> > Write Latency: NaN ms.
> > Pending Tasks: 0
> > Key cache capacity: 20
> > Key cache size: 0
> > Key cache hit rate: NaN
> > Row cache: disabled
> > Compacted row minimum size: 0
> > Compacted row maximum size: 0
> > Compacted row mean size: 0
&
On Mon, Feb 14, 2011 at 2:54 PM, Robert Coli wrote:
> Regarding very large memtables, it is important to recognize that
> throughput refers only to the size of the COLUMN VALUES, and not, for
> example, their names.
>
That would be a bug in it's own right. There are lots of use cases that
only
On Mon, Feb 14, 2011 at 6:28 PM, Aaron Morton wrote:
> Will take a closer look at the code tonight, perhaps we should return an
> error if you try to using Network Topology it cannot detect any DC's .
>
>
+1
Is your ReplicationFactor (RF) really set to 0? Don't do that, it needs to
be at least 1 and probably needs to be 3 in production if you care about
your data. It must be greater than 0 and less than the number of nodes in
your ring. It represents the number of nodes to copy/replicate data to.
An
nodes contain data for (prevTokenInRing, nodesOwnToken] (i.e. exclusive from
previous token to inclusive of the nodes token). So .179 will contain
things that hash in the range (152896308109140433971537345591636551711,0]
and .12 will contain things that hash in range
(0,152896308109140433971537345
You need to specify your initial tokens. LoadBalance really doesn't do a
good job of balancing the load. Take a look at "Load Balancing" in
http://wiki.apache.org/cassandra/Operations There is a little python script
in there to help you pick tokens for a given cluster size.
If you don't want to
2 GiB is pretty small for a C* node. You can also try reducing all the
caching to zero with so little memory. If you have lots of CFs you probably
want to reduce the memtable throughput too.
On Wed, Oct 27, 2010 at 12:43 PM, Koert Kuipers <
koert.kuip...@diamondnotch.com> wrote:
> While bootst
Also, in general, you probably want to set Xms = Xmx (regardless of the
value you eventually decide on for that).
If you set them equal, the JVM will just go ahead and allocate that amount
on startup. If they're different, then when you grow above Xms it has to
allocate more and move a bunch of s
Allan,
I'm confused on why removetoken doesn't do anything and would be interested
in finding out why, but to answer your question:
You can shutdown down your last node, nuke the system directory (make a
backup just in case), restart the node, load the schema (export it first if
need be) and be o
n CL.ONE.
On Thu, Oct 7, 2010 at 7:11 PM, David McIntosh wrote:
> Are there any data loss concerns if you have the commit log sync set to
> periodic and are writing with CL One or Any?
>
>
>
> *From:* Matthew Dennis [mailto:mden...@riptano.com]
> *Sent:* We
+1 on disabling swap
On Oct 7, 2010 3:27 PM, "Peter Schuller"
wrote:
>> The nodes are still swapping, even though the swappiness is set to zero
>> right now. After swapping comes the OOM.
>
> In addition to what's already been said, consider just flat out
> disabling swap completely, unless you ha
Keep in mind that .7 and on will have per-CF settings for most things so
there will be even more control over the the tuning...
On Oct 7, 2010 3:10 PM, "Peter Schuller"
wrote:
>> What if there is more than one keyspace in the system ? Assuming each
>> keyspace has the same number of column familie
If I remember correctly the only operator supported for secondary indexes
right now is EQ, not LTE (or the others).
On Thu, Oct 7, 2010 at 6:13 AM, Christian Decker wrote:
> I'm currently trying to get started on secondary indices in Cassandra
> 0.7.0svn, but without any luck so far. I have the
Creating indexes takes extra space (does in MySQL, PGSQL, etc too).
https://issues.apache.org/jira/browse/CASSANDRA-749 has quite a bit of
detail about how the secondary indexes currently work.
On Wed, Oct 6, 2010 at 7:17 PM, Alvin UW wrote:
> Hello,
>
> Before 0.7, actually we can create an ex
Rob is correct.
drain is really on there for when you need the commit log to be empty (some
upgrades or a complete backup of a shutdown cluster).
There really is no point to using to shutdown C* normally, just kill it...
On Wed, Oct 6, 2010 at 4:18 PM, Rob Coli wrote:
> On 10/6/10 1:13 PM, Aar
The SCs are stored on disk in the order defined by the compareWith setting
so if you want them back in a different order either someone is sorting them
(C*, which doesn't sort them right now, or the client; which doesn't make
much of a difference, it's just moving the load around) or you're
denorma
>
> PS. Are other ppl interested in this functionality ?
> I could file it to JIRA as well...
>
>
Yes, please file it to Jira. It seems like it would be pretty useful for
various things and fairly easy to change the code to move it to another
directory whenever C* thinks it should be deleted...
Some relevant reading if you're interested:
http://dslab.epfl.ch/pubs/crashonly/
http://web.archive.org/web/20060426230247/http://crash.stanford.edu/
On Wed, Oct 6, 2010 at 1:46 PM, Scott Mann wrote:
> Yes. ctrl-C if running in the foreground. Use kill , if running
> in the background (see the
uld my best bet be to simply get ALL of my users uuids and ages, then
> throw away all of those that do not meet the required test?
>
> Thank you.
>
> On Oct 6, 2010, at 2:09 PM, Matthew Dennis wrote:
>
> As Norman said, secondary indexes are only in .7 but you can create
> s
As Norman said, secondary indexes are only in .7 but you can create standard
indexes in both .6 and .7
Basically have a email_domain_idx CF where the row key is the domain and the
column names have the row id of the user (the column value is unused in this
scenario). This sounds basically like wh
or d...@riptano.com
On Wed, Sep 29, 2010 at 11:43 AM, Jonathan Ellis wrote:
> We'll get those fixed.
>
> Here or tho...@riptano.com directly is fine.
>
> Thanks!
>
s should be set up on virtual VMs, but
what about the Cassandra and Hadloop servers, should their be set up on VMs or
directly on physical machines? If they should be set up on VMs, the data of
Cassandra and Hadloop should be stored in local storage or a Storage Repository?
Thanks,Dennis
<>
50 matches
Mail list logo