streaming_connections_per_host - speeding up CPU bound bootstrap
Currently the StreamPlan created for bootstrap (and rebuild) will only create one connection per host. If you have less nodes than cores, this is likely to be CPU bound (a CPU seems to be able to process ~5MB/s). Is there any reason why something naive like https://github.com/iksaif/cassandra/commit/8352c21284811ca15d63183ceae0b11586623f31 would not work ? I believe this is what is about https://issues.apache.org/jira/browse/CASSANDRA-4663 See also: https://issues.apache.org/jira/browse/CASSANDRA-12229, but I don't believe non-blocking I/O would change anything here. -- Corentin Chary http://xf.iksaif.net
Re: streaming_connections_per_host - speeding up CPU bound bootstrap
Hard to say because this comment doesn't show the code that was tried. My proposed change (https://issues.apache.org/jira/secure/attachment/12842717/0001-streaming-add-a-way-to-configure-the-number-of-conne.patch) should open multiple connections per hosts: this fixes blocking writes on a single connection or CPU-bound (de)-serialization. On Mon, Dec 12, 2016 at 2:04 AM, Nate McCall wrote: > I have not dug too deeply yet, but how would you compare/reconcile > your proposed changes with this comment: > https://issues.apache.org/jira/browse/CASSANDRA-4663?focusedCommentId=15342248&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15342248 > > On Thu, Dec 8, 2016 at 4:10 AM, Corentin Chary > wrote: >> Currently the StreamPlan created for bootstrap (and rebuild) will only >> create one connection per host. If you have less nodes than cores, >> this is likely to be CPU bound (a CPU seems to be able to process >> ~5MB/s). >> >> Is there any reason why something naive like >> https://github.com/iksaif/cassandra/commit/8352c21284811ca15d63183ceae0b11586623f31 >> would not work ? >> >> I believe this is what is about >> https://issues.apache.org/jira/browse/CASSANDRA-4663 >> See also: https://issues.apache.org/jira/browse/CASSANDRA-12229, but I >> don't believe non-blocking I/O would change anything here. >> >> -- >> Corentin Chary >> http://xf.iksaif.net -- Corentin Chary http://xf.iksaif.net
Re: tracing improvements
On Wed, Jan 25, 2017 at 9:55 PM, Sam Overton wrote: > Hello cassandra-dev, > > I would like to continue the momentum on improving Cassandra's tracing, > following Mick's excellent work on pluggable tracing and Zipkin support. > > There are a couple of areas we can improve that would make tracing an even > more > useful tool for cluster operators to diagnose ongoing issues. > > The control we currently have over tracing is coarse and somewhat > cumbersome. > Enabling tracing from the client for a specific query is fine for > application > developers, particularly in an environment where Zipkin is being used to > trace > all parts of the system and show an aggregated view. For an operator > investigating an issue however, this does not always give us the control > that we > need in order to obtain relevant data. We often need to diagnose an issue > without the possibility of making any changes in the client, and often > without > the prior knowledge of which queries at the application level are > experiencing > poor performance. > > Our only other instigator of tracing is nodetool settraceprobability which > only > affects a single node and gives us no control over precisely which queries > get > traced. In practise, it is very difficult to find the relevant queries that > we > want to investigate, so we have often resorted to bulk loading the traces > into > an external tool for analysis, and this seems sub-optimal when cassandra > could > reduce much of the friction. > > I have a few proposals to improve tracing that I'd like to throw out to > the mailing list to get feedback before I start implementing. > > 1. Include trace_probability as a CF level property, so sampled tracing can > be > enabled on a per-CF basis, cluster-wide, by changing the CF property. > https://issues.apache.org/jira/browse/CASSANDRA-13154 > > 2. Allow tracing at the CFS level. If we have a misbehaving host, then it > would > be useful to enable sampled tracing at the CFS layer on just that host so > that > we can investigate queries landing on that replica, rather than just queries > passing through as a coordinator as is currently possible. > https://issues.apache.org/jira/browse/CASSANDRA-13155 > > 3. Add an interface allowing for custom filters which can decide whether > tracing > should be enabled for a given query. This is a similar idea to > CASSANDRA-9193 > [1] but following the same pattern that we have for IAuthenticator, > IEndpointSnitch, ConfigurationLoader et al. where the intention is that > useful > default implementations are provided, but abstracted in such a way that > custom > implementations can be written for deployments where a specific type of > functionality is required. This would then allow solutions such as > CASSANDRA-11012 [2] without any specific support needing to be written in > Cassandra. > https://issues.apache.org/jira/browse/CASSANDRA-13156 > > Thanks for reading! > Regards, > > Sam > > > [1] https://issues.apache.org/jira/browse/CASSANDRA-9193 Facility to write > dynamic > code to selectively trigger trace or log for queries > > [2] https://issues.apache.org/jira/browse/CASSANDRA-11012 Allow tracing CQL > of a > specific client only, based on IP (range) Not directly related, but to make (3) more useful it would also be great to be able to list currently executing queries. I've had multiple cases where read queries would just use all my slots and never finish and it was quite painful to discover what the query was exactly (slow query don't help if the query never finishes). -- Corentin Chary http://xf.iksaif.net
Presubmit checks in github
Hello, Looks like currently we don't automatically run any check when a new pull request is created on github. It could be as simple as https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/.travis.yml (minus the release part) I guess circleci could be used too since apparently it's already used for some things. Thoughs ? -- Corentin Chary http://xf.iksaif.net - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org
Re: Presubmit checks in github
I guess adding a hook for circle would work as well :) On Wed, Sep 20, 2017 at 5:04 PM, Jeff Jirsa wrote: > At one point we tested with a travis config very similar to what circle does > (run all the unit tests, etc). The free options for Travis can’t run the > basic unit test suite reliably, where circle could, so we put the circle yaml > in - no reason we can’t ADD Travis as well > > The ASF has a paid Travis account so we can get some basic builds and badges > on pull request, but I’m not convinced we’ll ever get reliable unit test runs > based on conversations with their (Travis) sales team, so the Travis badges > wouldn’t really be meaningful. > > Some of us are actively working on making this less friction for contributors > (by making test builds trigger on pull requests), but it takes time to get > all that sorted out. > > > > -- > Jeff Jirsa > > >> On Sep 20, 2017, at 5:38 AM, Corentin Chary wrote: >> >> Hello, >> Looks like currently we don't automatically run any check when a new >> pull request is created on github. >> >> It could be as simple as >> https://github.com/criteo-forks/cassandra/blob/cassandra-3.11-criteo/.travis.yml >> (minus the release part) >> >> I guess circleci could be used too since apparently it's already used >> for some things. >> >> Thoughs ? >> >> -- >> Corentin Chary >> http://xf.iksaif.net >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: dev-h...@cassandra.apache.org >> > > ----- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > -- Corentin Chary http://xf.iksaif.net - To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org