I don't think we are using the latest version, but I can do an upgrade.
The problem seems to occur in rapid successions, and then disappears for
a while. Here is a computer with a few such cases. You can see how the
WU name differs from the input file name:
http://aqua.dwavesys.com/results.php?hostid=60

-Kamran


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David Anderson
Sent: Tuesday, February 01, 2011 2:30 PM
To: [email protected]
Subject: Re: [boinc_dev] Error when submitting work to the DB (corrupted
command line arguments)

Hmmm.  I can't reproduce this, using the same WU template file.
Are you using the newest server software (i.e., trunk)?

-- David

On 01-Feb-2011 2:06 PM, Kamran Karimi wrote:
> Hi all,
>
> We have been encountering strange errors recently: A rather large
number of our tasks fail because the command line arguments are
incorrect. We traced one specific failed task and were lead to the
bin/create_work program. Here is the trace:
>
> 1) These commands are used to submit work to BOINC's DB:
>
> ln -s `pwd`/ibadfp_p1/instance.txt `bin/dir_hier_path
instance.txt_31jan11_am_16_030_200_000_1_235`
> ln -s `pwd`/ibadfp_p1/ipfile.bin `bin/dir_hier_path
instance.txt_31jan11_am_16_030_200_000_1_235.ip`
> ln -s `pwd`/ibadfp_p1/qubit.param `bin/dir_hier_path
instance.txt_31jan11_am_16_030_200_000_1_235.param`
> bin/create_work -appname fokker_planck -wu_name
31jan11_am_16_030_200_000_1_235 \
> -wu_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu \
> -result_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_result
instance.txt_31jan11_am_16_030_200_000_1_235
instance.txt_31jan11_am_16_030_200_000_1_235.ip
instance.txt_31jan11_am_16_030_200_000_1_235.param
>
>
> 2) These are the contents of the file
ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu:
>
> <file_info>
>       <number>0</number>
> </file_info>
> <file_info>
>       <number>1</number>
> </file_info>
> <file_info>
>       <number>2</number>
> </file_info>
> <workunit>
>       <file_ref>
>               <file_number>0</file_number>
>
<open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name>
>               <copy_file/>
>       </file_ref>
>       <file_ref>
>               <file_number>1</file_number>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
>               <open_name>ipfile.bin</open_name>
>               <copy_file/>
>       </file_ref>
>       <file_ref>
>               <file_number>2</file_number>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name
>
>               <open_name>qubit.param</open_name>
>               <copy_file/>
>       </file_ref>
>       <command_line>  --T 0.03 --t_f 0.0002 --n_particles_per 2
--h_ramp_mid 0.000 --h_ramp_width 0.0025 --time_step_factor 48.0
--gamma_frac 1.0 --t_fraci 0.38 --t_fracf 0.88  --input_file
instance.txt_31jan11_am_16_030_200_000_1_235</command_line>
>       <target_nresults>1</target_nresults>
>       <max_success_results>1</max_success_results>
>       <min_quorum>1</min_quorum>
>       <rsc_fpops_est>5e13</rsc_fpops_est>
>       <rsc_memory_bound>5e7</rsc_memory_bound>
>       <rsc_fpops_bound>1e20</rsc_fpops_bound>
>       <rsc_disk_bound>1e8</rsc_disk_bound>
>       <delay_bound>1728000</delay_bound>
> </workunit>
>
>
>   Please note the command line argument "--input_file"
>
>
> 3) This is what we see in the database:
>
> <workunit>
> <file_ref>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235</file_name>
>
<open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name>
>               <copy_file/>
> </file_ref>
> <file_ref>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
>      <open_name>ipfile.bin</open_name>
>               <copy_file/>
> </file_ref>
> <file_ref>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name
>
>
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name
>
>      <open_name>qubit.param</open_name>
>               <copy_file/>
> </file_ref>
> <command_line>
> --T 0.03 --t_f 0.0002 --n_particles_per 2 --h_ramp_mid 0.000
--h_ramp_width 0.0025 --time_step_factor 48.0 --gamma_frac 1.0 --t_fraci
0.38 --t_fracf 0.88  --input_file
instance.txt_31jan11_am_16_030_200_00_11_235
> </command_line>
> </workunit>
>
>
>   Please note how the last few characters corresponding to the
"--input-file" parameter have changed from "200_000_1_235" to
"200_00_11_235"
>
>   The result is a failure on a volunteer computer, with the app
complaining that it can't open the input file.
>
>   How could this happen? Any help in resolving this issue is
appreciated.
>
> -Kamran
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to