Had to do home directory migrations a couple of times without 'full'
downtimes. Similar process, only I don't think we ever bothered
disabling users in LDAP or blocking their jobs. Generally, we told them
we'd move their directory at time X and would they please log out
everywhere; at time X, we killed their jobs & sessions (if any),
migrated everything (including automount information), and let then know
they can log in again.
Saying that clearing sssd etc caches sounds like a very good idea :)
Two suggestions to add:
- Make the old home directories read only/immutable directly after
migration, so that sessions forgotten or picking up the wrong automount
information throw errors when trying to use them.
- I'd rsync the whole file system across to the new machines way ahead
of 'migration day', so that during migration only a 'last pass' sort of
sync was required - generally much faster if most of the files are
already there.
Tina
On 16/04/2021 14:20, Ward Poelmans wrote:
Hi Ole,
On 16/04/2021 14:23, Ole Holm Nielsen wrote:
Question: Does anyone have experiences with this type of scenario? Any
good ideas or suggestions for other methods for data migration?
We once did something like that.
Basically it did something like that:
- Process is kicked off per user by some trigger
- Block all new jobs of the given user
- Wait until all currently running jobs have finished
- Disable the user in the LDAP and wipe the sssd cache for the user.
- Kill all their processes on the login nodes
- Move the data
- Re-enable the user in the LDAP
- Remove any blocks/limits of the user to start new job
- Mail the user that he/she can continue working again.
The whole process went pretty smooth.
Ward
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk