Package: dmtcp
Version: 1.2.4-1
Severity: normal

Setting --interval for the dmtcp_coordinator has no effect (same for env
var). This seems to be a new bug -- it used to work prior 1.2.4. What
happens is illustrated by the coordinator log below. Initially the
coordinator gets the correct checkpointing interval, but as soon as a
client connects it gets reset to zero. Afterwards it can be reenable via
dmtcp_command, and checkpoints will be created at the desired interval.
Consequently, repeated automated checkpointing in Condor also doesn't
work with 1.2.4 as the interval is only given to the coordinator once
upon start.

Here is the log:

michael@meiner ~/debian/dmtcptest % DMTCP_CHECKPOINT_INTERVAL=30 
dmtcp_coordinator
dmtcp_coordinator starting...
    Port: 7779
    Checkpoint Interval: 30
    Exit on last client: 0
Type '?' for help.

i
Checkpoint Interval: 30
[30806] NOTE at dmtcp_coordinator.cpp:1020 in onConnect; REASON='worker 
connected'
     hello_remote.from = 1313a2c6-30822-4f57196d(-1)
[30806] NOTE at dmtcp_coordinator.cpp:1026 in onConnect; 
REASON='CheckpointInterval Updated'
     oldInterval = 30
     theCheckpointInterval = 0
i
Checkpoint Interval: Disabled (checkpoint manually instead)
<HERE I MANUALLY REENABLE THE INTERVAL VIA DMTCP_COMMAND>
Checkpoint Interval: 30
[30806] NOTE at dmtcp_coordinator.cpp:1294 in startCheckpoint; REASON='starting 
checkpoint, suspending all nodes'
     s.numPeers = 1
[30806] NOTE at dmtcp_coordinator.cpp:1296 in startCheckpoint; 
REASON='Incremented Generation'
     UniquePid::ComputationId().generation() = 1
[30806] NOTE at dmtcp_coordinator.cpp:630 in onData; REASON='locking all nodes'
[30806] NOTE at dmtcp_coordinator.cpp:665 in onData; REASON='draining all nodes'
[30806] NOTE at dmtcp_coordinator.cpp:671 in onData; REASON='checkpointing all 
nodes'
[30806] NOTE at dmtcp_coordinator.cpp:681 in onData; REASON='building name 
service database'
[30806] NOTE at dmtcp_coordinator.cpp:700 in onData; REASON='entertaining 
queries now'
[30806] NOTE at dmtcp_coordinator.cpp:705 in onData; REASON='refilling all 
nodes'
[30806] NOTE at dmtcp_coordinator.cpp:734 in onData; REASON='restarting all 
nodes'
[30806] NOTE at dmtcp_coordinator.cpp:905 in onDisconnect; REASON='client 
disconnected'
     client.identity() = 1313a2c6-30822-4f57196d

^C[30806] NOTE at dmtcp_coordinator.cpp:522 in handleUserCommand; 
REASON='killing all connected peers and quitting ...'
DMTCP coordinator exiting... (per request)



-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (500, 'testing'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 3.1.0-1-686-pae (SMP w/2 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages dmtcp depends on:
ii  libc6       2.13-26
ii  libgcc1     1:4.6.1-4
ii  libmtcp1    1.2.4-1
ii  libstdc++6  4.6.1-4

dmtcp recommends no packages.

dmtcp suggests no packages.

-- no debconf information



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to