Your message dated Mon, 22 Apr 2013 10:17:42 +0000
with message-id <e1uudoc-0003uq...@franck.debian.org>
and subject line Bug#687266: fixed in aces3 3.0.6-7
has caused the Debian Bug report #687266,
regarding aces3: some jobs hang when run sequentially
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
--
687266: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=687266
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: aces3
Version: 3.0.6-1
Severity: important
When I run the same job as mention in #687264 with only one process, the
job hangs in tran_rhf_ao_sv1.sio:
Gather on company_rank succeeded.
Static pre-defined array # 2 is first used on line 328
Allocated 800 bytes for static arrays.
Allocated 896466560 bytes for blkmgr.
Total memory usage 855 MBytes.
Max. possible usage 900 MBytes
Total blocks used = 65759
At this point, no more output is written and also no temporary files are
written or updated, while the xaces3 process spins at 100% CPU.
This is a representable backtrace:
0x000000000042e968 in one_pass_of_server () at sumz.c:439
439 MPI_Iprobe(MPI_ANY_SOURCE, readytag, newcomm, &flag,
&status);
(gdb) bt
#0 0x000000000042e968 in one_pass_of_server () at sumz.c:439
#1 0x000000000042f73d in exec_thread_server_ (bflag=bflag@entry=0x729b20) at
sumz.c:1248
#2 0x00000000004df2a4 in wait_on_block (array=23, block=1, blkndx=56362,
type=201, request=4, instruction_timer=35, comm_timer=95) at wait_on_block.f:50
#3 0x000000000048b6a5 in compute_block (op=..., array_table=...,
narray_table=198, index_table=..., nindex_table=32, block_map_table=...,
nblock_map_table=55,
segment_table=..., nsegment_table=43, scalar_table=..., nscalar_table=13,
address_table=..., debugit=.FALSE., validate=.FALSE., flopcount=0, comm=3,
comm_timer=95,
instruction_timer=35) at compute_block.F:759
#4 0x00000000004d1e53 in optable_loop (optable=..., noptable=245,
array_table=..., narray_table=198, array_labels=..., index_table=...,
nindex_table=32, segment_table=...,
nsegment_table=43, block_map_table=..., nblock_map_table=55,
scalar_table=..., nscalar_table=13, proctab=..., address_table=...,
debug=.FALSE., validate=.FALSE., comm=3,
comm_timer=95, _array_labels=_array_labels@entry=10) at optable_loop.f:274
#5 0x00000000004423e5 in master.0.sip_fmain_init (__entry=1,
ncompany_workers_min=<error reading variable: Cannot access memory at address
0x0>,
ierr_return=<error reading variable: Cannot access memory at address 0x0>)
at sip_fmain.F:582
#6 0x000000000042f8b8 in sumz_work_ (dryrun_flag=0x2,
dryrun_flag@entry=0x7fff04aff4e8, fmbuffer=0xffff8002,
fmbuffer@entry=0x23d0448c, dbg_flag=0x1,
dbg_flag@entry=0x7fff04aff4e4, totalrecvbuffer=0x36c66) at sumz.c:1294
#7 0x0000000000423bea in worker_work () at worker_work.F:79
#8 0x000000000041a613 in aces3 () at beta.F:914
#9 0x000000000041959d in main (argc=<optimized out>, argv=<optimized out>) at
beta.F:1014
#10 0x00007f0a0f6b4ead in __libc_start_main () from
/lib/x86_64-linux-gnu/libc.so.6
#11 0x00000000004195c9 in _start ()
Another:
0x00007f0a122f0f63 in PMPI_Iprobe () from /usr/lib/libmpi.so.0
(gdb) bt
#0 0x00007f0a122f0f63 in PMPI_Iprobe () from /usr/lib/libmpi.so.0
#1 0x000000000042e9d0 in one_pass_of_server () at sumz.c:445
#2 0x000000000042f73d in exec_thread_server_ (bflag=bflag@entry=0x729b20) at
sumz.c:1248
#3 0x00000000004df2a4 in wait_on_block (array=23, block=1, blkndx=56362,
type=201, request=4, instruction_timer=35, comm_timer=95) at wait_on_block.f:50
#4 0x000000000048b6a5 in compute_block [...] at compute_block.F:759
And another:
0x00007f0a10c4e369 in opal_progress () from /usr/lib/libopen-pal.so.0
(gdb) bt
#0 0x00007f0a10c4e369 in opal_progress () from /usr/lib/libopen-pal.so.0
#1 0x00007f0a122cd9c9 in ?? () from /usr/lib/libmpi.so.0
#2 0x00007f0a122f84e3 in PMPI_Test () from /usr/lib/libmpi.so.0
#3 0x00007f0a1110e122 in pmpi_test__ () from /usr/lib/libmpi_f77.so.0
#4 0x00000000004df2bd in wait_on_block (array=23, block=1, blkndx=56362,
type=201, request=4, instruction_timer=35, comm_timer=95) at wait_on_block.f:48
#5 0x000000000048b6a5 in compute_block [...] at compute_block.F:759
I did not encounter any other backtraces after a few more tries.
Michael
--- End Message ---
--- Begin Message ---
Source: aces3
Source-Version: 3.0.6-7
We believe that the bug you reported is fixed in the latest version of
aces3, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to 687...@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Michael Banck <mba...@debian.org> (supplier of updated aces3 package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmas...@debian.org)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Format: 1.8
Date: Mon, 22 Apr 2013 10:44:02 +0200
Source: aces3
Binary: aces3
Architecture: source amd64
Version: 3.0.6-7
Distribution: unstable
Urgency: high
Maintainer: Michael Banck <mba...@debian.org>
Changed-By: Michael Banck <mba...@debian.org>
Description:
aces3 - Advanced Concepts in Electronic Structure III
Closes: 687264 687266
Changes:
aces3 (3.0.6-7) unstable; urgency=high
.
[ Michael Banck ]
* debian/patches/exit_impossible_seq_jobs_gracefully.patch: New patch,
prints a helpful error message and quits if a job type is requested which
cannot be run sequentially (Closes: #687266).
* debian/patches/ignore_invalid_message_nind.patch: New patch, if a message
of type ``server_barrier_signal'' or ``server_quit_msgtype'' arrives,
ignore if the value of the ``nind'' field is invalid (Closes: #687264).
Checksums-Sha1:
e67c825dfea5f57e5b0dfe9d3ec0127966713aee 1317 aces3_3.0.6-7.dsc
c1a678df89f7ab144e8405f1e59a9031e1c38906 10228 aces3_3.0.6-7.debian.tar.gz
890dd1ec585b1bcfefb7bf80c51c783978ac1dae 12616738 aces3_3.0.6-7_amd64.deb
Checksums-Sha256:
1200dba65209a07e644829ad8a8568c1a20c7cee6b06aa62b5df3c5ba5658036 1317
aces3_3.0.6-7.dsc
fc08b500f2feade518747a042f8543b711cbcf2aa5c95d4149337a5ec0f812da 10228
aces3_3.0.6-7.debian.tar.gz
75328de92a383f935729364fb3eafcef25e39a0993f565259dfdbd59445b4236 12616738
aces3_3.0.6-7_amd64.deb
Files:
0ae44c4cd02a66dfbe160215bf5d27d8 1317 science optional aces3_3.0.6-7.dsc
c5f8bab82e1821734b0b7698e80a810c 10228 science optional
aces3_3.0.6-7.debian.tar.gz
9a51ae2a71f434db4841220fe17628f5 12616738 science optional
aces3_3.0.6-7_amd64.deb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iEYEARECAAYFAlF0/LYACgkQmHaJYZ7RAb/gvACglmlaHnS0JQ0cg0wzIVV0fB7c
WIkAnRjI6TELx0qMnQ7oMpnwkat6kf49
=FcAu
-----END PGP SIGNATURE-----
--- End Message ---