Hi,
I am interested in understanding the quantitative effect of adding a node
to a Cassandra cluster. From my tests I have observed that the CPU usage on
an added node is relatively high when data is being streamed in to it while
bootstrapping.

For example, here is a graph for a node that was added to a 4 node cluster
(I can insert images right?):


​
As you can see, the cpu usage stays pretty high while data is streamed into
the node.

I attached a JVM profiler: https://github.com/aragozin/jvm-tools
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aragozin_jvm-2Dtools&d=BQMF-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=IPrliSk1Ijg-U4peH4IIW6Lcj4i6GMuxpxu3zDH-l0w&m=bZUdJplcHM5O0utN6k29Q7-osMmhK8tYqLUvl_xhTFg&s=rzUSlUgnPe1HeHl5iM4Uktrz2KExlXdNyNMmgcnQlus&e=>
(recommended
by one of the members of the Cassandra user mailing list) to try and figure
out what exactly was taking up so much cpu.

Here is a snapshot of the profiler when cpu usage was high:

​The threads "STREAM-IN-/IP_ADDRESS" seem to be the most cpu intensive.
Why are these threads so cpu intensive?
Can you point me to the source code where these threads are implemented?

(My Cassandra Version is 2.1.7)

-- 

Thank you,
Aadil Ahamed

Reply via email to