I had a similar issue with Collatz a while back.  Instead of the normal 500
WUs ready to send, there were over 50,000.  It took several weeks to clear
them out and the nightly maintenance, backup time, disk space, etc.  all
increased by 100 times until the WUs were completed.   I believe the issue
was due to a hung process from a daemon erroring out causing the shared
memory read/writes to fail. It couldn't access the shared memory so it
decided it should generate more work.  I would have preferred that it
logging an error and do nothing else.  Something more in line with:

if ((available_wus < cache_sze) &&
(no_errors_occurred_getting_available_wu_count)) 
        generate_more_work

Note: This happened a couple months ago and I don't recall the exact
details, so I think the above is how is more or less what happened, but I'm
not 100% sure.

Jon Sonntag

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Travis Desell
Sent: Tuesday, October 11, 2011 10:57 AM
To: [email protected]
Cc: Astro Research
Subject: [boinc_dev] bug with the work generation system

Is it possible for the work generator to go a bit haywire?  If we're using a
policy which checks to see the number of unsent workunits, if one of the
feeder queries gets hung up, is it possible for the work generator to keep
thinking that there aren't enough unsent workunits because the feeder hasn't
updated things and then repeatedly generate more workunits which could flood
the database?

I'm wondering if this is a possibility because I think it's what might have
happened with the most recent MilkyWay@Home crash...

thanks,
--Travis


---------------------------------------------------------------------------
Travis Desell,  Assistant Professor
University of North Dakota - Dept. of Computer Science
[email protected] - cell: 518-867-1054 Streibel Hall Room 220 -
office: 777-701-3477
3950 Campus Road Stop 9015
Grand Forks, North Dakota 52802-9015

Homepage ( http://people.cs.und.edu/~tdesell/ ) MilkyWay@Home (
http://milkyway.cs.rpi.edu/ ) DNA@Home ( http://dnahome.cs.rpi.edu/ )
Worldwide Computing Laboratory ( http://wcl.cs.rpi.edu/ )
----------------------------------------------------------------------------

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to