Hi,
Sorry for my long silence. Kazhuiro thank you for your answer. The
situation is more clear. I think that I need to extend my Riak cluster with
more nodes to increase performance. The reason for my opinion is:
Nov 19 11:42:40 localhost haproxy[24678]: 172.18.103.31:49608
[19/Nov/2015:11:42:40.137] riak riak_backend/viper 3/5/113 1471 --
8191/2594/2594/138/0 0/0
Nov 19 11:42:41 localhost haproxy[24678]: 172.18.108.170:44517
[19/Nov/2015:11:41:42.264] riak riak_backend/serpent 1/0/58806 5982 cD
8191/2849/2849/155/0 0/0
Nov 19 11:59:46 localhost haproxy[24678]: 172.18.102.39:42919
[19/Nov/2015:11:42:14.566] riak riak_backend/mussurana 1/0/1052250 1484789
-- 3134/2888/2888/154/0 0/0
Nov 19 12:07:26 localhost haproxy[24678]: 172.18.40.2:44946
[19/Nov/2015:11:42:14.508] riak riak_backend/rattler 1/0/1511814 2471638 cD
3172/2888/2888/161/0 0/0
Nov 19 12:17:56 localhost haproxy[24678]: 172.18.103.30:58654
[19/Nov/2015:11:42:40.141] riak riak_backend/mamba 3/1/2116572 3383878 cD
2988/2886/2886/166/0 0/0
Nov 19 12:23:55 localhost haproxy[24678]: 172.18.40.4:59089
[19/Nov/2015:11:41:39.831] riak riak_backend/eggeater 1/0/2535854 4109579
CD 3020/2888/2888/153/0 0/0
Nov 19 12:38:54 localhost haproxy[24678]: 172.18.40.4:37536
[19/Nov/2015:11:41:47.533] riak riak_backend/cobra 1/0/3427457 3387298 --
2983/2886/2886/159/0 0/0
Nov 19 12:50:37 localhost haproxy[24678]: 172.18.102.39:51870
[19/Nov/2015:11:41:49.413] riak riak_backend/lora 1/0/4128262 6445878 --
2989/2889/2889/164/0 0/0
I think that it is not haproxy's timeouts issue. Am I right?
Regarding to HAProxy config I have the following config for Riak pb:
frontend riak
bind 172.18.108.170:8087
mode tcp
option tcplog
option contstats
timeout client 30s
default_backend riak_backend
backend riak_backend
mode tcp
balance roundrobin
option tcpka
option srvtcpka
option httpchk GET /ping
timeout server 60s
server rinkhals rinkhals.pleiad.uaprom:8087 weight 1 maxconn 1024 check
port 8090
server chuckwalla chuckwalla.pleiad.uaprom:8087 weight 1 maxconn 1024
check port 8090
and so on...
And config for Riak CS:
frontend riakcs
bind 193.34.169.1:80
mode http
option contstats
option httplog
option http-server-close
timeout client 30s
default_backend riakcs_backend
backend riakcs_backend
mode http
balance roundrobin
option httpchk GET /riak-cs/ping
option redispatch
retries 3
timeout server 60s
timeout connect 60s
timeout http-request 60s
server rinkhals rinkhals.pleiad.uaprom:8080 weight 1 maxconn 1024 check
port 8000
server chuckwalla chuckwalla.pleiad.uaprom:8080 weight 1 maxconn 1024
check port 8000
and so on...
And default section:
defaults
log global
option dontlognull
retries 3
option redispatch
maxconn 8192
timeout connect 5000
timeout client 4h
timeout server 4h
balance leastconn
I want to mention again taht I have about 1000 rps to Riak CS, average
object size is 10 Kb.
Thank you.
On Mon, Nov 16, 2015 at 4:01 AM Kazuhiro Suzuki <[email protected]> wrote:
> Hi,
>
> ha_proxy's timeout settings often causes disconnected errors on a Riak
> CS deployment by high work load. termination_stat [1] in tcplog [2]
> lets you know if timeout happens or not.
>
> > 2015-11-13 13:13:09.514 [error]
> <0.11264.1387>@riak_cs_wm_common:maybe_create_user:222 Retrieval of user
> record for s3 failed. Reason: disconnected
>
> This means Riak CS failed to read a user data from Riak for
> authentication due to a disconnected error.
>
> > Riak CS adds, removes, gets properties through Stanchion service. Am I
> right? I can't exactly understand where is my bottleneck - Riak, Riak CS or
> Stanchion.
>
> Mainly Stanchion is only used to update/delete data of users and
> buckets. To inspect a node, Riak S2/CS 2.1 introduced new metrics
> including various latencies and counters, which help to identify
> bottleneck.
>
> > When we need authenticated access for reading object from bucket do we
> need Stanchion? If not I can't understand why I had a lot of error during
> getting objects from Riak CS.
>
> Authenticated access is always necessary but a read request of user
> data for auth is issued from Riak CS to Riak directly, not through
> Stanchion.
>
> > P. S. Sometimes when there is some issues with Riak CS - Stanchion
> connectivity I need to restart Riak CS.
>
> Riak CS 1.5.0 has connection pool leak problem [3]. You might hit the
> issue...
>
> [1]: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.5
> [2]: https://cbonte.github.io/haproxy-dconv/configuration-1.5.html#8.2.2
> [3]:
> http://docs.basho.com/riakcs/latest/cookbooks/Riak-CS-Release-Notes/#Riak-CS-1-5-2
>
> On Sat, Nov 14, 2015 at 2:04 AM, Vladyslav Zakhozhai
> <[email protected]> wrote:
> >
> > Hello.
> >
> > I have Riak CS cluster with 18 nodes. On each node there is Riak CS and
> Riak
> > service and one Stanchion node.
> >
> > Versions:
> > Riak 1.4.12
> > Riak CS 1.5.0
> > Stanchion 1.5.0
> >
> > Riak CS and Riak allocated behind HAProxy balancers:
> >
> > WAN -> HAProxy -> Riak CS nodes -> HAProxy -> Riak nodes.
> > ans
> > Stanchion -> HAProxy -> Riak
> >
> > Today due a spike of traffic load (about 1000 rps) on the cluster 50% of
> > Riak CS returned HTTP 500 and 503 (querying /riak-cs/ping resource also
> was
> > not successful).
> >
> > In Riak CS logs I've seen the following messages:
> >
> > 2015-11-13 13:13:09.514 [error]
> > <0.11264.1387>@riak_cs_wm_common:maybe_create_user:222 Retrieval of user
> > record for s3 failed. Reason: disconnected
> >
> > In Riak CS logs I see the following:
> > 2015-11-13 17:31:52.995 [error] <0.11254.6534> Lager event handler
> > error_logger_lager_h exited with reason
> >
> {'EXIT',{{badmatch,["/buckets/uaprom-image/objects/272547384_cid1322007_pid183135512-26a7c1f3.jpg",{error,{error,{badmatch,{error,closed}},[{webmachine_request,recv_unchunked_body,3,[{file,"src/webmachine_request.erl"},{line,471}]},{webmachine_request,call,2,[{file,"src/webmachine_request.erl"},{line,193}]},{wrq,stream_req_body,2,[{file,"src/wrq.erl"},{line,121}]},{riak_cs_wm_object,handle_normal_put,2,[{file,"src/riak_cs_wm_object.erl"},{line,341}]},{riak_cs_wm_common,accept_body,2,[{file,...},...]},...]}},...]},...}}
> >
> > I suspect that there were problem between Riak CS - Stanhion or Stanhion
> -
> > Riak. I have no clear idea in Stanchion troubleshooting. The main reason
> is
> > the following. Stanhion works fine, service is up (answers on ping
> command).
> > But it is very laconic: there is almost nothing in console and error logs
> > (even with debug log level).
> >
> > Riak CS adds, removes, gets properties through Stanchion service. Am I
> > right? I can't exactly understand where is my bottleneck - Riak, Riak CS
> or
> > Stanhion.
> >
> > When we need authenticated access for reading object from bucket do we
> need
> > Stanchion? If not I can't understand why I had a lot of error during
> getting
> > objects from Riak CS.
> >
> > Thank you in advance.
> >
> > P. S. Sometimes when there is some issues with Riak CS - Stanchion
> > connectivity I need to restart Riak CS.
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > [email protected]
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
>
>
> --
> Kazuhiro Suzuki | Basho Japan KK
>
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com