Hello,
Thanks beforehand for your interest. I'm using vesion 7.2.42 (x64) under
windows 7.
The problem of the abandonned task is, to me, unrelated with the
Astropulse suggestion. Let me seriate those to be more precise.
Last week (during the week end) computed tasks were accumulating I was
wondering why. I checked and one node in the US was the last one I could
tracert. Other Ip adresses in the US were accessible as normal.
Sunday evening (my time UT+2) when insisting by "retry now" some tasks
finally went thru and I spent more than 3 hours doing so to eliminate
the backlog (transfer wise and project wise). It seems that this was the
starting point of my problem. Monday during the day (my time) everything
seemed to come back to normal but it was the start of the "abandonned
tasks" situation and no credit of course. I guess I would have been
wiser not to insist to get the credits I was hoping as reward ;-).
What would happen if I desinstall completely Boing and reinstalling it ?
Would I get the same computer number (6960982) for the new installation
of the client as it "seems" that my other machine still produces correct
results (but I could be wrong). I deduct this from the fact that only
the abovementionned computer number is mentionned in the more than 200
tasks marked in error ? I wait for your advice about this potential
bypass in order not to make the situation worse.
About the astropulse checkpoint suggestion, it's true that when I close
my machine properly I never saw any problem. But here in Burundi we get
regular power down due to electricity failure and this could happen any
time and sometimes for 12 to 24 hours. When I'm not at home my machine
crashes then when my power backup supply runs out of steam. So the
shutdown is not controlled and then, most of the time, tasks rerun from
scratch when I restart my machines. Of course this is not a problem for
setiathome v 7 or even astropulse opend_nvidia_100 as they dont take too
long (generally not alot more than 3 hours) but for astropulse v6 that
take 25 to 30 hours with my hardware there is a large risk that I get a
non planned power failure. If I am at a near end computing completion
when it happens, you can understand my frustration.
Sorry for having been so long but I tried to be as descriptive as
possible to give you as much information as possible.
Excuse my bad English, I am french speaking and do my best to be clear.
I hope that this will help not only my little personnal problem but that
other users will benefit from it.
Again best regards to all of you and for your fantastic project I am
accompanying since the 90's;
Luc
Le 8/08/2014 17:56, Eric J Korpela a écrit :
Astropulse does checkpoint quite frequently, and restarts without
problem most of the time. "Abandoned" is definitely a server side
decision that indicates a client detach or a reset or some sort of
confusion as to the identity of a host and whether it was working on
those results. (Other possibilities include multiple hosts using a
copied or shared BOINC directory, multiple copies of BOINC on one host
using the same BOINC client directory, deletion or corruption or bad
permissions on files in the BOINC client directory, any of which could
confuse client or server).
Which client version and OS are you using?
On Fri, Aug 8, 2014 at 5:55 AM, McLeod, John <[email protected]
<mailto:[email protected]>> wrote:
BOINC has a checkpointing mechanism built in, but it requires that
the project developers write checkpoint code. Some projects can
checkpoint almost any time, and others can checkpoint only every few
minutes, and some cannot checkpoint at all. SETI can checkpoint
frequently (and instigated the mechanism to NOT do every possible
checkpoint, but only once every X minutes). CPDN always checkpoints
every time it can (typically this is several minutes). I cannot
remember an example of one that cannot checkpoint at all, but they
exist.
-----Original Message-----
From: boinc_dev [mailto:[email protected]
<mailto:[email protected]>] On Behalf Of Richard
Haselgrove
Sent: Friday, August 08, 2014 4:48 AM
To: Luc A. Germain; [email protected]
<mailto:[email protected]>
Subject: Re: [boinc_dev] astropulse robustness / abandonned tasks
The abandoning of tasks happens when the BOINC server 'thinks' that
it has 'evidence' that the client has detached from the project and
then re-attached again. This has affected a number of users in the
past, but has proved extremely tricky to diagnose and resolve: not
least, because most of the evidence resides in the server logs.
We did investigate one suspected case at Albert during credit
testing, but that turned out to be a genuine 'detach' caused by hard
disk failure - it is distinguished from reports like this one
because no running tasks were left on the host computer (they were
on the drive that failed...) to waste time and electricity.
I would certainly welcome it if we could pair up a developer and a
project administrator with access to server logs to investigate this
problem and cure it at source.
The checkpointing question is a matter for the project developers,
and I'll leave it to them to respond via this list.
>________________________________
> From: Luc A. Germain <[email protected] <mailto:[email protected]>>
>To: [email protected] <mailto:[email protected]>
>Sent: Friday, 8 August 2014, 9:41
>Subject: [boinc_dev] astropulse robustness / abandonned tasks
>
>
>Hi,
>Two things:
>1) A suggestion here for you develloppers ;-) As atropulse tasks
take "some" time to complete they are more prone to power failure as
we have in the third world. When it happens most of the time the
task restarts computing from start (this is even more frustrating
when the task reaches near completion). Could it be possible to
introduce regular checkpoints by saving intermediate data, or work
files, where the task computing could restart from, saving so a lot
of computing time ? Maybe this could be an option in the user
profile as I guess not everyone needs this.
>
>2) Two days ago I sent a message about abandonned tasks. Since,
all my computing goes to the garbage bin as they are not taken into
account. Which procedure should/could I try to solve this problem ?
Could uninstalling/reinstalling the application from my computers be
a solution? Should I wait till the problem solves by itself (and
would this not take ages) ?
>
>An answer would be highly appreciated.
>
>Best regards and thanks for your work,
>Luc
>_______________________________________________
>boinc_dev mailing list
>[email protected] <mailto:[email protected]>
>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>To unsubscribe, visit the above URL and
>(near bottom of page) enter your email address.
>
>
>
_______________________________________________
boinc_dev mailing list
[email protected] <mailto:[email protected]>
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected] <mailto:[email protected]>
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr <http://www.avg.fr>
Version: 2014.0.4716 / Base de données virale: 3986/8001 - Date: 07/08/2014
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.