Hi there,
We have a CI build framework configured such that many machines are
concurrently building and sharing a scons cache. This cache lives on an
Amazon EFS filesystem, mounted as NFS.
In general this has been spectacularly successful, but every once in a
while corrupted files start coming out of the cache. Our theory is that
the EFS + NFS locking guarantees aren't good enough for the SCons temp name
collision detection algorithm - attached is a patch we are going to try
running with to see if it improves things.
In addition to hoping a formalized version of this will be considered for
SCons, I'm curious if anyone sees a more likely explanation for the
symptoms described above.
--- CacheDir.py 2020-08-19 12:59:25.790302000 -0700
+++ CacheDir.py.uuid 2020-08-19 14:00:29.693749695 -0700
@@ -32,6 +32,7 @@
import os
import stat
import sys
+import uuid
import SCons.Action
import SCons.Warnings
@@ -100,7 +101,11 @@
cd.CacheDebug('CachePush(%s): pushing to %s\n', t, cachefile)
- tempfile = cachefile+'.tmp'+str(os.getpid())
+ # UUID in case filesystem doesn't support file operations well enough
to deal with multiple
+ # machines sharing a cache and attempting to write the same file at
the same time (NFS mount of
+ # AWS EFS?).
+ # TODO: Long filename concern on Windows?
+ tempfile = cachefile+'.tmp'+str(os.getpid()) + '_' + str(uuid.uuid1())
errfmt = "Unable to copy %s to cache. Cache file is %s"
if not fs.isdir(cachedir):
Cheers,
--
*Raven Kopelman* | Team Lead, Senior Developer
Safe Software Inc.
*T* 604.501.9985 x 331 | *F* 604.501.9965
[email protected] | www.safe.com
<http://www.safe.com/emailsignature>
_______________________________________________
Scons-dev mailing list
[email protected]
https://pairlist2.pair.net/mailman/listinfo/scons-dev