Hello,
Thanks beforehand for your interest. I'm using vesion 7.2.42 (x64) under windows 7. The problem of the abandonned task is, to me, unrelated with the Astropulse suggestion. Let me seriate those to be more precise.

Last week (during the week end) computed tasks were accumulating I was wondering why. I checked and one node in the US was the last one I could tracert. Other Ip adresses in the US were accessible as normal. Sunday evening (my time UT+2) when insisting by "retry now" some tasks finally went thru and I spent more than 3 hours doing so to eliminate the backlog (transfer wise and project wise). It seems that this was the starting point of my problem. Monday during the day (my time) everything seemed to come back to normal but it was the start of the "abandonned tasks" situation and no credit of course. I guess I would have been wiser not to insist to get the credits I was hoping as reward ;-).

What would happen if I desinstall completely Boing and reinstalling it ? Would I get the same computer number (6960982) for the new installation of the client as it "seems" that my other machine still produces correct results (but I could be wrong). I deduct this from the fact that only the abovementionned computer number is mentionned in the more than 200 tasks marked in error ? I wait for your advice about this potential bypass in order not to make the situation worse.


About the astropulse checkpoint suggestion, it's true that when I close my machine properly I never saw any problem. But here in Burundi we get regular power down due to electricity failure and this could happen any time and sometimes for 12 to 24 hours. When I'm not at home my machine crashes then when my power backup supply runs out of steam. So the shutdown is not controlled and then, most of the time, tasks rerun from scratch when I restart my machines. Of course this is not a problem for setiathome v 7 or even astropulse opend_nvidia_100 as they dont take too long (generally not alot more than 3 hours) but for astropulse v6 that take 25 to 30 hours with my hardware there is a large risk that I get a non planned power failure. If I am at a near end computing completion when it happens, you can understand my frustration.

Sorry for having been so long but I tried to be as descriptive as possible to give you as much information as possible.

Excuse my bad English, I am french speaking and do my best to be clear. I hope that this will help not only my little personnal problem but that other users will benefit from it.

Again best regards to all of you and for your fantastic project I am accompanying since the 90's;

Luc


Le 8/08/2014 17:56, Eric J Korpela a écrit :
Astropulse does checkpoint quite frequently, and restarts without
problem most of the time.  "Abandoned" is definitely a server side
decision that indicates a client detach or a reset or some sort of
confusion as to the identity of a host and whether it was working on
those results.  (Other possibilities include multiple hosts using a
copied or shared BOINC directory, multiple copies of BOINC on one host
using the same BOINC client directory, deletion or corruption or bad
permissions on files in the BOINC client directory, any of which could
confuse client or server).


Which client version and OS are you using?


On Fri, Aug 8, 2014 at 5:55 AM, McLeod, John <[email protected]
<mailto:[email protected]>> wrote:

    BOINC has a checkpointing mechanism built in, but it requires that
    the project developers write checkpoint code.  Some projects can
    checkpoint almost any time, and others can checkpoint only every few
    minutes, and some cannot checkpoint at all.  SETI can checkpoint
    frequently (and instigated the mechanism to NOT do every possible
    checkpoint, but only once every X minutes).  CPDN always checkpoints
    every time it can (typically this is several minutes).  I cannot
    remember an example of one that cannot checkpoint at all, but they
    exist.

    -----Original Message-----
    From: boinc_dev [mailto:[email protected]
    <mailto:[email protected]>] On Behalf Of Richard
    Haselgrove
    Sent: Friday, August 08, 2014 4:48 AM
    To: Luc A. Germain; [email protected]
    <mailto:[email protected]>
    Subject: Re: [boinc_dev] astropulse robustness / abandonned tasks

    The abandoning of tasks happens when the BOINC server 'thinks' that
    it has 'evidence' that the client has detached from the project and
    then re-attached again. This has affected a number of users in the
    past, but has proved extremely tricky to diagnose and resolve: not
    least, because most of the evidence resides in the server logs.

    We did investigate one suspected case at Albert during credit
    testing, but that turned out to be a genuine 'detach' caused by hard
    disk failure - it is distinguished from reports like this one
    because no running tasks were left on the host computer (they were
    on the drive that failed...) to waste time and electricity.

    I would certainly welcome it if we could pair up a developer and a
    project administrator with access to server logs to investigate this
    problem and cure it at source.

    The checkpointing question is a matter for the project developers,
    and I'll leave it to them to respond via this list.



     >________________________________
     > From: Luc A. Germain <[email protected] <mailto:[email protected]>>
     >To: [email protected] <mailto:[email protected]>
     >Sent: Friday, 8 August 2014, 9:41
     >Subject: [boinc_dev] astropulse robustness / abandonned tasks
     >
     >
     >Hi,
     >Two things:
     >1) A suggestion here for you develloppers ;-) As atropulse tasks
    take "some" time to complete they are more prone to power failure as
    we have in the third world. When it happens most of the time the
    task restarts computing from start (this is even more frustrating
    when the task reaches near completion). Could it be possible to
    introduce regular checkpoints by saving intermediate data, or work
    files, where the task computing could restart from, saving so a lot
    of computing time ? Maybe this could be an option in the user
    profile as I guess not everyone needs this.
     >
     >2) Two days ago I sent a message about abandonned tasks. Since,
    all my computing goes to the garbage bin as they are not taken into
    account. Which procedure should/could I try to solve this problem ?
    Could uninstalling/reinstalling the application from my computers be
    a solution? Should I wait till the problem solves by itself (and
    would this not take ages) ?
     >
     >An answer would be highly appreciated.
     >
     >Best regards and thanks for your work,
     >Luc
     >_______________________________________________
     >boinc_dev mailing list
     >[email protected] <mailto:[email protected]>
     >http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
     >To unsubscribe, visit the above URL and
     >(near bottom of page) enter your email address.
     >
     >
     >
    _______________________________________________
    boinc_dev mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
    To unsubscribe, visit the above URL and
    (near bottom of page) enter your email address.
    _______________________________________________
    boinc_dev mailing list
    [email protected] <mailto:[email protected]>
    http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
    To unsubscribe, visit the above URL and
    (near bottom of page) enter your email address.


Aucun virus trouvé dans ce message.
Analyse effectuée par AVG - www.avg.fr <http://www.avg.fr>
Version: 2014.0.4716 / Base de données virale: 3986/8001 - Date: 07/08/2014


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to