Hi all,

We have been encountering strange errors recently: A rather large number of our 
tasks fail because the command line arguments are incorrect. We traced one 
specific failed task and were lead to the bin/create_work program. Here is the 
trace: 

1) These commands are used to submit work to BOINC's DB:

ln -s `pwd`/ibadfp_p1/instance.txt `bin/dir_hier_path 
instance.txt_31jan11_am_16_030_200_000_1_235`
ln -s `pwd`/ibadfp_p1/ipfile.bin `bin/dir_hier_path 
instance.txt_31jan11_am_16_030_200_000_1_235.ip`
ln -s `pwd`/ibadfp_p1/qubit.param `bin/dir_hier_path 
instance.txt_31jan11_am_16_030_200_000_1_235.param`
bin/create_work -appname fokker_planck -wu_name 31jan11_am_16_030_200_000_1_235 
\
-wu_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu \
-result_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_result 
instance.txt_31jan11_am_16_030_200_000_1_235 
instance.txt_31jan11_am_16_030_200_000_1_235.ip 
instance.txt_31jan11_am_16_030_200_000_1_235.param


2) These are the contents of the file 
ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu:

<file_info>
        <number>0</number>
</file_info>
<file_info>
        <number>1</number>
</file_info>
<file_info>
        <number>2</number>
</file_info>
<workunit>
        <file_ref>
                <file_number>0</file_number>
                
<open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name>
                <copy_file/>
        </file_ref>
        <file_ref>
                <file_number>1</file_number>
                
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
                <open_name>ipfile.bin</open_name>
                <copy_file/>
        </file_ref>
        <file_ref>
                <file_number>2</file_number>
                
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name>
                <open_name>qubit.param</open_name>
                <copy_file/>
        </file_ref>
        <command_line> --T 0.03 --t_f 0.0002 --n_particles_per 2 --h_ramp_mid 
0.000 --h_ramp_width 0.0025 --time_step_factor 48.0 --gamma_frac 1.0 --t_fraci 
0.38 --t_fracf 0.88  --input_file 
instance.txt_31jan11_am_16_030_200_000_1_235</command_line>
        <target_nresults>1</target_nresults>
        <max_success_results>1</max_success_results>
        <min_quorum>1</min_quorum>
        <rsc_fpops_est>5e13</rsc_fpops_est>
        <rsc_memory_bound>5e7</rsc_memory_bound>
        <rsc_fpops_bound>1e20</rsc_fpops_bound>
        <rsc_disk_bound>1e8</rsc_disk_bound>
        <delay_bound>1728000</delay_bound>
</workunit>

 
 Please note the command line argument "--input_file"


3) This is what we see in the database:

<workunit>
<file_ref>
    <file_name>instance.txt_31jan11_am_16_030_200_000_1_235</file_name>
    <open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name>
                <copy_file/>
</file_ref>
<file_ref>
    <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
                
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
    <open_name>ipfile.bin</open_name>
                <copy_file/>
</file_ref>
<file_ref>
    <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name>
                
<file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name>
    <open_name>qubit.param</open_name>
                <copy_file/>
</file_ref>
<command_line>
--T 0.03 --t_f 0.0002 --n_particles_per 2 --h_ramp_mid 0.000 --h_ramp_width 
0.0025 --time_step_factor 48.0 --gamma_frac 1.0 --t_fraci 0.38 --t_fracf 0.88  
--input_file instance.txt_31jan11_am_16_030_200_00_11_235
</command_line>
</workunit>


 Please note how the last few characters corresponding to the "--input-file" 
parameter have changed from "200_000_1_235" to "200_00_11_235"

 The result is a failure on a volunteer computer, with the app complaining that 
it can't open the input file.

 How could this happen? Any help in resolving this issue is appreciated.

-Kamran
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to