On Wed, Nov 07, 2012 at 11:05:23AM +0000, Laurence Tratt wrote: > On Mon, Nov 05, 2012 at 02:15:53PM +0100, Marc Espie wrote: > > > Turns out python is stupid enough to store path+timestamp in its compiled > > *.pyc files to know when to recompile. > > The "auto-recompile everything which is out of date" feature is ingenious but > there are at least two different ways of implementing it. One way is to check > the timestamps on X.py and X.pyc; if the latter is older than the former, > recompile, and (try to) cache the output to X.pyc. [This is how Converge > works, if anyone cares.] Judging by your comment, Python stores the timestamp > in the file itself (probably for portability/performance reasons) rather than > checking the timestamp from the filesystem. [I can't remember off-hand, nor > can I remember how PyPy does it either.] > > I assume the only people who are having pkg_delete problems are running a > Python program as root? If so, in one sense, they're the lucky ones. The > unlucky ones would then be those running as non-root where the "is X.py newer > than X.pyc" check fails, forcing a recompilation, but which can't then save > out a cached file. So every time they run Python they'll be paying a > recompilation cost for such files. > > The question this raises in my mind is the following. During an update, can > the timestamp of X.py be updated, but not X.pyc? > > If so, then I think that would be a problem that pkg_update needs to fix: > timestamps (particularly the relative timestamps of files i.e. "X is older > than Y") are an important piece of meta-data. > > If not, then it might be possible to make a case to the Python developers > that rather than storing timestamps in the pyc file, they should simply read > it from the filesystem. That would mean that both X.py and X.pyc could be > updated, but providing X.pyc is newer than X.py, Python would not try to > write out a cached file.
Actually, we discussed a possible approach. It looks reasonable to have a "compile as package" mode (say through an env variable for instance) that would disable the check and leave a mark in the compiled file that says the check is not relevant. It's as simple as 'store timestamp as 0000 in the .pyc file'... Apparently, there are other tendrils elsewhere that make this a bit more complicated, but for a package system, requiring timestamps to be consistent makes things MORE brittle, in the presence of NFS, clock-skew, and various other issues. For instance, one prominent member of the project does not believe in clock synchronization and does builds over NFS on machines with wildly inaccurate clocks... So I'm rather happy the pkg* system fudges some timestamps already (linked to the concept of tie()ing files, which allows some speed/space optimization at the expense of some irrelevant metadata)...