>> For all you guys, is the current caching - all filesystem based - useful
>> enough? I've been chewing on a network
>> based extension, for all those disposable builders that don't really have
>> great ways to cache
I am indeed finding that the built-in SCons caching isn’t conducive to network
caching. I was preparing a separate e-mail about it but I’ll just include it
here. Let me know if you want me to start a new thread for the discussion.
The basic summary is that the current cache implementation asks for the file
when it needs it and doesn’t have any bulk frontload capability, so if I have
5000 targets, I would have to do 5000 roundtrips to the server. That isn’t
going to work for network caching, especially given that we want to integrate
with virtual filesystems so we only hydrate the targets that are actually used
by developers.
--- Background ---
One of the things we are working on at VMware is implementing remote caching
using SCons. We are hoping to upstream as many of our changes as possible, so I
am hoping to get ideas on how to do this right. I hope to send out a summary of
the plans we have soon (dependency enforcement, remote caching, and
platform-specific virtual filesystem integration), but for now I need help with
one specific problem in our prototype: bulk cache frontloading.
--- Background on existing caching mechanism ---
Currently the SCons caching mechanism (as implemented by the Taskmaster and
CacheDir classes) does just-in-time caching where SCons first asks a CacheDir
object whether it has a target file in the cache. If it does, it proceeds onto
the next target file in the targets list for that action (if any). If it
doesn’t, it skips asking the cache for the rest of the target files in the
targets list of that action and just runs the action. If all targets are
retrieved from the cache, the action is not run.
--- Downsides of existing caching mechanism ---
This mechanism wouldn’t work for remote caching because it is
latency-sensitive. If I am building 5000 files, populating the cache would
require up to 5000 roundtrips to the cache server. If I have 100ms latency to
the cache server, that is an overhead of 500 seconds.
--- What I’d like to do ---
I’d like to implement a --cache-frontload parameter that does two runs through
the node graph:
1. An initial dry run where we generate the content signatures of all target
files.
* This run culminates with a call to the CacheDir object, e.g.
retrieve_all(allNodes) where “allNodes” contains (in the example from the
previous section) 5000 entries, each of which has the result of
get_cachedir_csig and the full file path.
2. A second run where we run any actions that were not fulfilled from cache.
I tried implementing this but ran into problems resetting the node graph
between steps #1 and #2. Anything not retrieved from the cache needs to be
reset to “pending” or “no state”, but ideally the cached children should be
retained so we don’t need to scan the files again. The problems I am running
into with resetting the node graph include:
1. Easily and quickly accessing all nodes that were iterated over during the
dry run.
2. Resetting the state of all nodes not retrieved from cache.
3. Reversing seemingly destructive “end of lifecycle” actions from objects.
The most I could try to do was to remember all Node objects (including build
targets and containing directories) and then do the following on each object:
1. node.set_state(SCons.Node.No_state)
2. node.clear()
3. node.clear_memoized_values()
4. node.executor_cleanup()
But it seems like a hack and I haven’t been able to get it to work well.
Has anyone tried doing something like this before? Any recommendations where to
start?
From: Mats Wichmann <[email protected]>
Sent: Friday, May 24, 2019 5:51 PM
To: SCons developer list <[email protected]>; Andrew C. Morrow
<[email protected]>
Cc: Adrian Oney <[email protected]>; Adam Gross <[email protected]>
Subject: Re: [Scons-dev] Looking for help mapping Windows pdb semantics to SCons
For all you guys, is the current caching - all filesystem based - useful
enough? I've been chewing on a network based extension, for all those
disposable builders that don't really have great ways to cache
On May 24, 2019 3:45:01 PM MDT, "Andrew C. Morrow"
<[email protected]<mailto:[email protected]>> wrote:
Hi Adam -
I'm working in this same area (caching and debug info handling) for the SCons
based MongoDB build system, right now.
Overall, I am trying to move to a model on Windows that is more like using
-gsplit-dwarf with the GNU tools, where every object file gets a (cacheable)
.pdb, and then we link with /DEBUG:fastlink, and defer the final per
library/executable PDB to a post link step by using mspdbcmf. This is similar
to using dwp to package up the .dwo files.
You can see some of my very much work-in-progress state here:
https://github.com/acmorrow/mongo/blob/SERVER-33661/site_scons/site_tools/separate_debug.py<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Facmorrow%2Fmongo%2Fblob%2FSERVER-33661%2Fsite_scons%2Fsite_tools%2Fseparate_debug.py&data=02%7C01%7Cgrossag%40vmware.com%7C81af0c69a9314136f95c08d6e091ead5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636943314661182429&sdata=oOb3tjn0SdjIp4o21UO%2F1UQP%2BiR7Z4HH0PpC7Kw3vb4%3D&reserved=0>
Unfortunately, I've encountered one showstopper issue for us:
https://developercommunity.visualstudio.com/content/problem/573023/absolute-paths-for-associated-pdb-files-are-record.html<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdevelopercommunity.visualstudio.com%2Fcontent%2Fproblem%2F573023%2Fabsolute-paths-for-associated-pdb-files-are-record.html&data=02%7C01%7Cgrossag%40vmware.com%7C81af0c69a9314136f95c08d6e091ead5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636943314661192426&sdata=NcNX3WU4SObfVzt7wCagJqU3Xo3phiiGlfJJP7fMGoc%3D&reserved=0>,
and I'm waiting to hear back on it.
The next steps in my current approach would be to move the actions that produce
the finalized .dwp, .dSYM, or .pdb file into separate builders, rather than
adding them as actions to the .Program and .SharedLibrary builders. That would
allow the build tasks to finalize the debug information to be executed
separately, or not at all for developer builds where keeping the debug info in
separated per-object files is sufficient.
If you are interested, I'd be happy to collaborate (off-list initially?) to
discuss some of the issues we have encountered and find a way to avoid
duplication of effort. Improving the debug info handling situation is something
I'm keenly interested in, as it is a major bottleneck in our build performance.
Thanks,
Andrew
On Fri, May 24, 2019 at 3:45 PM Tomasz Gajewski
<[email protected]<mailto:[email protected]>> wrote:
Adam Gross via Scons-dev <[email protected]<mailto:[email protected]>>
writes:
> I am investigating better supporting caching with SCons at VMware and
> am trying to see if I can teach SCons about pdb files.
Is there any problem for your use cases in using /Z7 option for
compilation? That tells the compiler to embed debug data in .obj file
like on linux. Then during linking pdb's are created. It works at least
for shared libraries and executables.
Regards
Tomasz Gajewski
_______________________________________________
Scons-dev mailing list
[email protected]<mailto:[email protected]>
https://pairlist2.pair.net/mailman/listinfo/scons-dev<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpairlist2.pair.net%2Fmailman%2Flistinfo%2Fscons-dev&data=02%7C01%7Cgrossag%40vmware.com%7C81af0c69a9314136f95c08d6e091ead5%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C636943314661192426&sdata=pr3AaLF5m5P%2FUB0nQBdTRF69Wu2f3lhz6JNOR%2F%2FNkN8%3D&reserved=0>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Scons-dev mailing list
[email protected]
https://pairlist2.pair.net/mailman/listinfo/scons-dev