His host seems to be losing track of RPC sequence numbers. Loss of cached writes on restart?
2014-08-08 07:13:53.1883 [PID=28339] [HOST#6960982] [USER#8522684] RPC seqno 59642 less than expected 59643; creating new host 2014-08-08 07:13:53.1896 [PID=28339] [HOST#6960982] [USER#8522684] Found similar existing host for this user - assigned. 2014-08-08 07:13:53.1932 [PID=28339] [CRITICAL] [HOST#6960982] [RESULT#3670788988] [WU#1562416658] changed CPID: marking in-progress result 03se08ad.16169.8252.438086664200.12.220_0 as client error! 2014-08-08 07:13:53.1932 [PID=28339] Request: [USER#8522684] [HOST#6960982] [IP 41.79.224.134] client 7.2.42 On Fri, Aug 8, 2014 at 9:17 AM, Richard Haselgrove < [email protected]> wrote: > The same user appears to have suffered another 'abandon' event today: > > http://setiathome.berkeley.edu/results.php?hostid=6960982&state=6 > > The reasons mentioned by Eric are all valid, but there appears to be an > irreducible core of sporadic events which cannot be ascribed to user > malfeasance. In earlier reports like this, many (but not all) of the cases > appeared to be associated with long-distance and/or poor quality internet > connections - again, like this one. > > ------------------------------ > *From:* Eric J Korpela <[email protected]> > *To:* "McLeod, John" <[email protected]> > *Cc:* "[email protected]" <[email protected]>; Richard > Haselgrove <[email protected]> > *Sent:* Friday, 8 August 2014, 16:56 > > *Subject:* Re: [boinc_dev] astropulse robustness / abandonned tasks > > Astropulse does checkpoint quite frequently, and restarts without problem > most of the time. "Abandoned" is definitely a server side decision that > indicates a client detach or a reset or some sort of confusion as to the > identity of a host and whether it was working on those results. (Other > possibilities include multiple hosts using a copied or shared BOINC > directory, multiple copies of BOINC on one host using the same BOINC client > directory, deletion or corruption or bad permissions on files in the BOINC > client directory, any of which could confuse client or server). > > > Which client version and OS are you using? > > > On Fri, Aug 8, 2014 at 5:55 AM, McLeod, John <[email protected]> wrote: > > > BOINC has a checkpointing mechanism built in, but it requires that the > > project developers write checkpoint code. Some projects can checkpoint > > almost any time, and others can checkpoint only every few minutes, and > some > > cannot checkpoint at all. SETI can checkpoint frequently (and instigated > > the mechanism to NOT do every possible checkpoint, but only once every X > > minutes). CPDN always checkpoints every time it can (typically this is > > several minutes). I cannot remember an example of one that cannot > > checkpoint at all, but they exist. > > > > -----Original Message----- > > From: boinc_dev [mailto:[email protected]] On Behalf Of > > Richard Haselgrove > > Sent: Friday, August 08, 2014 4:48 AM > > To: Luc A. Germain; [email protected] > > Subject: Re: [boinc_dev] astropulse robustness / abandonned tasks > > > > The abandoning of tasks happens when the BOINC server 'thinks' that it > has > > 'evidence' that the client has detached from the project and then > > re-attached again. This has affected a number of users in the past, but > has > > proved extremely tricky to diagnose and resolve: not least, because most > of > > the evidence resides in the server logs. > > > > We did investigate one suspected case at Albert during credit testing, > but > > that turned out to be a genuine 'detach' caused by hard disk failure - it > > is distinguished from reports like this one because no running tasks were > > left on the host computer (they were on the drive that failed...) to > waste > > time and electricity. > > > > I would certainly welcome it if we could pair up a developer and a > project > > administrator with access to server logs to investigate this problem and > > cure it at source. > > > > The checkpointing question is a matter for the project developers, and > > I'll leave it to them to respond via this list. > > > > > > > > >________________________________ > > > From: Luc A. Germain <[email protected]> > > >To: [email protected] > > >Sent: Friday, 8 August 2014, 9:41 > > >Subject: [boinc_dev] astropulse robustness / abandonned tasks > > > > > > > > >Hi, > > >Two things: > > >1) A suggestion here for you develloppers ;-) As atropulse tasks take > > "some" time to complete they are more prone to power failure as we have > in > > the third world. When it happens most of the time the task restarts > > computing from start (this is even more frustrating when the task reaches > > near completion). Could it be possible to introduce regular checkpoints > by > > saving intermediate data, or work files, where the task computing could > > restart from, saving so a lot of computing time ? Maybe this could be an > > option in the user profile as I guess not everyone needs this. > > > > > >2) Two days ago I sent a message about abandonned tasks. Since, all my > > computing goes to the garbage bin as they are not taken into account. > Which > > procedure should/could I try to solve this problem ? Could > > uninstalling/reinstalling the application from my computers be a > solution? > > Should I wait till the problem solves by itself (and would this not take > > ages) ? > > > > > >An answer would be highly appreciated. > > > > > >Best regards and thanks for your work, > > >Luc > > >_______________________________________________ > > >boinc_dev mailing list > > >[email protected] > > >http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > > >To unsubscribe, visit the above URL and > > >(near bottom of page) enter your email address. > > > > > > > > > > > _______________________________________________ > > boinc_dev mailing list > > [email protected] > > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > > To unsubscribe, visit the above URL and > > (near bottom of page) enter your email address. > > _______________________________________________ > > boinc_dev mailing list > > [email protected] > > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > > To unsubscribe, visit the above URL and > > (near bottom of page) enter your email address. > > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > > > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
