Hi, I am interested in understanding the quantitative effect of adding a node to a Cassandra cluster. From my tests I have observed that the CPU usage on an added node is relatively high when data is being streamed in to it while bootstrapping.
For example, here is a graph for a node that was added to a 4 node cluster (I can insert images right?): As you can see, the cpu usage stays pretty high while data is streamed into the node. I attached a JVM profiler: https://github.com/aragozin/jvm-tools <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_aragozin_jvm-2Dtools&d=BQMF-g&c=96ZbZZcaMF4w0F4jpN6LZg&r=IPrliSk1Ijg-U4peH4IIW6Lcj4i6GMuxpxu3zDH-l0w&m=bZUdJplcHM5O0utN6k29Q7-osMmhK8tYqLUvl_xhTFg&s=rzUSlUgnPe1HeHl5iM4Uktrz2KExlXdNyNMmgcnQlus&e=> (recommended by one of the members of the Cassandra user mailing list) to try and figure out what exactly was taking up so much cpu. Here is a snapshot of the profiler when cpu usage was high: The threads "STREAM-IN-/IP_ADDRESS" seem to be the most cpu intensive. Why are these threads so cpu intensive? Can you point me to the source code where these threads are implemented? (My Cassandra Version is 2.1.7) -- Thank you, Aadil Ahamed