Re: [opensource-dev] SL server issues in the past 4 months

2019-04-10 Thread Henri Beauchamp
On Mon, 8 Apr 2019 10:49:42 -0400, Oz Linden (Scott Lawrence) wrote:

> On 2019-04-08 09:27 , Henri Beauchamp wrote:
> > In the hope my observations will help you guys to get things back on
> > track (because it's getting really badly needed).
> 
> Thank you Henri.
> 
> We are very much aware of these problems and trying hard to correct 
> them, but I believe you may have added some useful insights; I've 
> forwarded your message to the two teams I have attacking them.

Thank you !

In the mean time, I managed to diagnose the rebake issue and fixed it:
what I found out could, perhaps, also be a hint about what happens for
failed TPs.

The problem was a race condition (as it is often the case in the
server/viewer communication): my workaround for derezzing attachments
triggered a rebake in the arrival sim, but I did not check whether the
capabilities for that sim were received or not; this did not cause any
issue before last week (i.e. rebaking with old (cached, in my viewer)
capabilities URIs did not matter), but it does now (whether it is the
result of a changed timing, algorithm or policy on the server side is
of course unknown to me).

In my new code, I simply flag the rebake as needed and now actually
perform it only once the capabilities for the new agent region are
received, and everything works like a charm (well, not fully, since that
workaround should not even be needed in the first place, and the bogus
kill-objects message on agent attachments is still wrongly sent by the
departure region).

Now, why might it be related at all with failed TPs ?... Well, if the
capabilities are received too late, and seeing how viewers implementing
region-Windlight are way more prone to timeouts than mine, it could
indeed be that the viewers are attempting to use a capability that is
not yet available, causing a timeout...

Regards,

Henri.
___
Policies and (un)subscribe information available here:
http://wiki.secondlife.com/wiki/OpenSource-Dev
Please read the policies before posting to keep unmoderated posting privileges


Re: [opensource-dev] SL server issues in the past 4 months

2019-04-10 Thread John Nagle
On 4/10/19 4:27 AM, Henri Beauchamp wrote:
> In the mean time, I managed to diagnose the rebake issue and fixed it:
> what I found out could, perhaps, also be a hint about what happens for
> failed TPs.
> 
> The problem was a race condition (as it is often the case in the
> server/viewer communication): my workaround for derezzing attachments
> triggered a rebake in the arrival sim, but I did not check whether the
> capabilities for that sim were received or not; this did not cause any
> issue before last week (i.e. rebaking with old (cached, in my viewer)
> capabilities URIs did not matter), but it does now (whether it is the
> result of a changed timing, algorithm or policy on the server side is
> of course unknown to me).
> 
> In my new code, I simply flag the rebake as needed and now actually
> perform it only once the capabilities for the new agent region are
> received, and everything works like a charm (well, not fully, since that
> workaround should not even be needed in the first place, and the bogus
> kill-objects message on agent attachments is still wrongly sent by the
> departure region).
> 
> Now, why might it be related at all with failed TPs ?... Well, if the
> capabilities are received too late, and seeing how viewers implementing
> region-Windlight are way more prone to timeouts than mine, it could
> indeed be that the viewers are attempting to use a capability that is
> not yet available, causing a timeout...

Hm. On failed teleports, I see lots of capability retrieval
failures in the Firestorm log. Like this:

2019-04-09T19:57:01Z WARNING #CoreHttp#  llcorehttp/_httppolicy.cpp(434) 
stageAfterCompletion : HTTP request 0x7f81c4f9a5f0 failed after 5 
retries.  Reason:  Not Found (Http_404)
2019-04-09T19:57:01Z WARNING #CoreHTTP# 
llmessage/llcorehttputil.cpp(282) onCompleted : Possible failure 
[Http_404] cannot POST url 
'https://sim10658.agni.lindenlab.com:12043/cap/4d310fee-b2c7-cb62-357d-0317811990e7'
 
because Not Found
2019-04-09T19:57:01Z INFO #  llcommon/llsdserialize_xml.cpp(417) parse : 
LLSDXMLParser::Impl::parse: XML_STATUS_ERROR parsing:cap not found: 
'4d310fee-b2c7-cb62-357d-0317811990e7'
2019-04-09T19:57:01Z WARNING #LLEventPollImpl# 
newview/lleventpoll.cpp(222) eventPollCoro : Canceling coroutine
2019-04-09T19:57:01Z WARNING #CoreHttp#  llcorehttp/_httppolicy.cpp(434) 
stageAfterCompletion : HTTP request 0x305e8e50 failed after 1 retries. 
Reason:  Not Found (Http_404)
2019-04-09T19:57:01Z WARNING #CoreHTTP# 
llmessage/llcorehttputil.cpp(282) onCompleted : Possible failure 
[Http_404] cannot POST url 
'https://sim10412.agni.lindenlab.com:12043/cap/d3b0c971-112d-b7c7-00d9-17dcb92b5027'
 
because Not Found
2019-04-09T19:57:01Z INFO #  llcommon/llsdserialize_xml.cpp(417) parse : 
LLSDXMLParser::Impl::parse: XML_STATUS_ERROR parsing:cap not found: 
'd3b0c971-112d-b7c7-00d9-17dcb92b5027'
2019-04-09T19:57:01Z WARNING #LLEventPollImpl# 
newview/lleventpoll.cpp(222) eventPollCoro : Canceling coroutine
2019-04-09T19:57:02Z INFO #  newview/llviewerdisplay.cpp(239) 
display_stats : FPS: 21.70
2019-04-09T19:57:02Z WARNING #CoreHttp#  llcorehttp/_httppolicy.cpp(434) 
stageAfterCompletion : HTTP request 0x40ff54b0 failed after 1 retries. 
Reason:  Not Found (Http_404)
2019-04-09T19:57:02Z WARNING #CoreHTTP# 
llmessage/llcorehttputil.cpp(282) onCompleted : Possible failure 
[Http_404] cannot POST url 
'https://sim10658.agni.lindenlab.com:12043/cap/11bb494c-1a04-24f1-8644-826273c548d8'
 
because Not Found
2019-04-09T19:57:02Z INFO #  llcommon/llsdserialize_xml.cpp(417) parse : 
LLSDXMLParser::Impl::parse: XML_STATUS_ERROR parsing:cap not found: 
'11bb494c-1a04-24f1-8644-826273c548d8'
2019-04-09T19:57:02Z WARNING #LLEventPollImpl# 
newview/lleventpoll.cpp(222) eventPollCoro : Canceling coroutine

This was from yesterday, when Oz had people from Server User Group
TPing between three mostly empty Linden sims as a test.

I've seen occasional errors like that before. That there are ever
404 errors for a cap seems wrong. The sim told the viewer to fetch
that URL directly from the sim, and then the sim didn't have it
available.

John Nagle


___
Policies and (un)subscribe information available here:
http://wiki.secondlife.com/wiki/OpenSource-Dev
Please read the policies before posting to keep unmoderated posting privileges