Noticed this thread.  The example given is using a rather old version.  The
same is also happening on all my  AQUA machines in v6.12.12.  Error
information:

<core_client_version>6.12.12</core_client_version>
<![CDATA[
<stderr_txt>
ERROR! Cannot open input file:
instance.txt_21jan11_hm_16_038_200_00__1_238. Exiting
23:34:45 (572): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
  <file_name>21jan11_hm_16_038_200_000_1_238_1_0</file_name>
  <error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Regards/Ed



On Tue, Feb 1, 2011 at 5:03 PM, Kamran Karimi <[email protected]> wrote:

> I don't think we are using the latest version, but I can do an upgrade.
> The problem seems to occur in rapid successions, and then disappears for
> a while. Here is a computer with a few such cases. You can see how the
> WU name differs from the input file name:
> http://aqua.dwavesys.com/results.php?hostid=60
>
> -Kamran
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of David Anderson
> Sent: Tuesday, February 01, 2011 2:30 PM
> To: [email protected]
> Subject: Re: [boinc_dev] Error when submitting work to the DB (corrupted
> command line arguments)
>
> Hmmm.  I can't reproduce this, using the same WU template file.
> Are you using the newest server software (i.e., trunk)?
>
> -- David
>
> On 01-Feb-2011 2:06 PM, Kamran Karimi wrote:
> > Hi all,
> >
> > We have been encountering strange errors recently: A rather large
> number of our tasks fail because the command line arguments are
> incorrect. We traced one specific failed task and were lead to the
> bin/create_work program. Here is the trace:
> >
> > 1) These commands are used to submit work to BOINC's DB:
> >
> > ln -s `pwd`/ibadfp_p1/instance.txt `bin/dir_hier_path
> instance.txt_31jan11_am_16_030_200_000_1_235`
> > ln -s `pwd`/ibadfp_p1/ipfile.bin `bin/dir_hier_path
> instance.txt_31jan11_am_16_030_200_000_1_235.ip`
> > ln -s `pwd`/ibadfp_p1/qubit.param `bin/dir_hier_path
> instance.txt_31jan11_am_16_030_200_000_1_235.param`
> > bin/create_work -appname fokker_planck -wu_name
> 31jan11_am_16_030_200_000_1_235 \
> > -wu_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu \
> > -result_template ibadfp_p1/31jan11_am_16_030_200_000_1_235_result
> instance.txt_31jan11_am_16_030_200_000_1_235
> instance.txt_31jan11_am_16_030_200_000_1_235.ip
> instance.txt_31jan11_am_16_030_200_000_1_235.param
> >
> >
> > 2) These are the contents of the file
> ibadfp_p1/31jan11_am_16_030_200_000_1_235_wu:
> >
> > <file_info>
> >       <number>0</number>
> > </file_info>
> > <file_info>
> >       <number>1</number>
> > </file_info>
> > <file_info>
> >       <number>2</number>
> > </file_info>
> > <workunit>
> >       <file_ref>
> >               <file_number>0</file_number>
> >
> <open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name>
> >               <copy_file/>
> >       </file_ref>
> >       <file_ref>
> >               <file_number>1</file_number>
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
> >               <open_name>ipfile.bin</open_name>
> >               <copy_file/>
> >       </file_ref>
> >       <file_ref>
> >               <file_number>2</file_number>
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name
> >
> >               <open_name>qubit.param</open_name>
> >               <copy_file/>
> >       </file_ref>
> >       <command_line>  --T 0.03 --t_f 0.0002 --n_particles_per 2
> --h_ramp_mid 0.000 --h_ramp_width 0.0025 --time_step_factor 48.0
> --gamma_frac 1.0 --t_fraci 0.38 --t_fracf 0.88  --input_file
> instance.txt_31jan11_am_16_030_200_000_1_235</command_line>
> >       <target_nresults>1</target_nresults>
> >       <max_success_results>1</max_success_results>
> >       <min_quorum>1</min_quorum>
> >       <rsc_fpops_est>5e13</rsc_fpops_est>
> >       <rsc_memory_bound>5e7</rsc_memory_bound>
> >       <rsc_fpops_bound>1e20</rsc_fpops_bound>
> >       <rsc_disk_bound>1e8</rsc_disk_bound>
> >       <delay_bound>1728000</delay_bound>
> > </workunit>
> >
> >
> >   Please note the command line argument "--input_file"
> >
> >
> > 3) This is what we see in the database:
> >
> > <workunit>
> > <file_ref>
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235</file_name>
> >
> <open_name>instance.txt_31jan11_am_16_030_200_000_1_235</open_name>
> >               <copy_file/>
> > </file_ref>
> > <file_ref>
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.ip</file_name>
> >      <open_name>ipfile.bin</open_name>
> >               <copy_file/>
> > </file_ref>
> > <file_ref>
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name
> >
> >
> <file_name>instance.txt_31jan11_am_16_030_200_000_1_235.param</file_name
> >
> >      <open_name>qubit.param</open_name>
> >               <copy_file/>
> > </file_ref>
> > <command_line>
> > --T 0.03 --t_f 0.0002 --n_particles_per 2 --h_ramp_mid 0.000
> --h_ramp_width 0.0025 --time_step_factor 48.0 --gamma_frac 1.0 --t_fraci
> 0.38 --t_fracf 0.88  --input_file
> instance.txt_31jan11_am_16_030_200_00_11_235
> > </command_line>
> > </workunit>
> >
> >
> >   Please note how the last few characters corresponding to the
> "--input-file" parameter have changed from "200_000_1_235" to
> "200_00_11_235"
> >
> >   The result is a failure on a volunteer computer, with the app
> complaining that it can't open the input file.
> >
> >   How could this happen? Any help in resolving this issue is
> appreciated.
> >
> > -Kamran
> > _______________________________________________
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to