Had to do home directory migrations a couple of times without 'full' downtimes. Similar process, only I don't think we ever bothered disabling users in LDAP or blocking their jobs. Generally, we told them we'd move their directory at time X and would they please log out everywhere; at time X, we killed their jobs & sessions (if any), migrated everything (including automount information), and let then know they can log in again.

Saying that clearing sssd etc caches sounds like a very good idea :)

Two suggestions to add:

- Make the old home directories read only/immutable directly after migration, so that sessions forgotten or picking up the wrong automount information throw errors when trying to use them.

- I'd rsync the whole file system across to the new machines way ahead of 'migration day', so that during migration only a 'last pass' sort of sync was required - generally much faster if most of the files are already there.

Tina

On 16/04/2021 14:20, Ward Poelmans wrote:
Hi Ole,

On 16/04/2021 14:23, Ole Holm Nielsen wrote:
Question:  Does anyone have experiences with this type of scenario?  Any
good ideas or suggestions for other methods for data migration?

We once did something like that.

Basically it did something like that:
- Process is kicked off per user by some trigger
- Block all new jobs of the given user
- Wait until all currently running jobs have finished
- Disable the user in the LDAP and wipe the sssd cache for the user.
- Kill all their processes on the login nodes
- Move the data
- Re-enable the user in the LDAP
- Remove any blocks/limits of the user to start new job
- Mail the user that he/she can continue working again.

The whole process went pretty smooth.

Ward


--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator

Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk

Reply via email to