george.karpenkov added inline comments.
================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:44 +CTU_FUNCTION_MAP_FILENAME = 'externalFnMap.txt' +CTU_TEMP_FNMAP_FOLDER = 'tmpExternalFnMaps' ---------------- gerazo wrote: > george.karpenkov wrote: > > What would happen when multiple analyzer runs are launched concurrently in > > the same directory? Would they race on this file? > Yes and no. The 1st, collect part of CTU creates this file by aggregating all > data from the build system, while the 2nd part which does the analysis itself > only reads it. Multiple analysis can use this file simultaneously without > problem. However, multiple collect phases launched on a system does not make > sense. In this case, the later one would write over the previous results with > the same data. > However, multiple collect phases launched on a system does not make sense Why not? What about a system with multiple users, where code is in the shared directory? Or even simpler: I've launched a background process, forgot about it, and then launched it again? > In this case, the later one would write over the previous results with the > same data. That is probably fine, I am more worried about racing, where process B would be reading a partially overriden file (not completely sure whether it is possible) ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:103 +def prefix_with(constant, pieces): + """ From a sequence create another sequence where every second element ---------------- gerazo wrote: > george.karpenkov wrote: > > Actually I would prefer a separate NFC PR just moving this function. This > > PR is already too complicated as it is. > Yes. The only reason was for the move to make it testable. However, we need > more tests as you wrote down below. Sure, but that separate PR can also include tests. ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:120 + func_map_cmd=args.func_map_cmd) + if hasattr(args, 'ctu_phases') and hasattr(args.ctu_phases, 'dir') + else CtuConfig(collect=False, analyze=False, dir='', func_map_cmd='')) ---------------- gerazo wrote: > george.karpenkov wrote: > > Extensive `hasattr` usage is often a codesmell, and it hinders > > understanding. > > Firstly, shouldn't `args.ctu_phases.dir` be always available, judging from > > the code in `arguments.py`? > > Secondly, in what cases `args.ctu_phases` is not available when we are > > already going down the CTU code path? Shouldn't we throw an error at that > > point instead of creating a default configuration? > > (default-configurations-instead-of-crashing is also another way to > > introduce very subtle and hard-to-catch error modes) > It definitely needs more comments, so thanks, I will put them here. > There are two separate possibilities here for this code part: > 1. The user does not want CTU at all. In this case, no CTU phases are > switched on (collect and analyze), so no CTU code will run. This is why dir > has no importance in this case. > 2. CTU capabilities were not even detected, so the help and available > arguments about CTU are also missing. In this case we create a dummy config > telling that everything is switched off. > > The reason for using hasattr was that not having CTU capabilities is not an > exceptional condition, rather a matter of configuration. I felt that handling > this with an exception would be misleading. Right, but instead of doing (1) and (2) can we have a separate (maybe hidden from user) param called e.g. `ctu_enabled` which would explicitly communicate that? ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:144 + if len(ast_files) == 1: + mangled_ast_pairs.append((mangled_name, ast_files.pop())) + ---------------- gerazo wrote: > george.karpenkov wrote: > > Firstly, no need to modify the set in order to get the first element, just > > use `next(iter(ast_files))`. > > Secondly, when exactly do conflicting names happen? Is it a bug? Shouldn't > > the tool log such cases? > Nice catch, thanks. > For your second question: Unfortunately, it is not a bug, rather a misfeature > which we can't handle yet. There can be tricky build-systems where a single > file with different configurations is built multiple times, so the same > function signature will be present in multiple link units. The other option > is when two separate compile units have an exact same function signature, but > they are never linked together, so it is not a problem in the build itself. > In this case, the cross compilation unit functionality cannot tell exactly > which compilation unit to turn to for the function, because there are > multiple valid choices. In this case, we deliberately leave such functions > out to avoid potential problems. It deserves a comment I think. > In this case, we deliberately leave such functions out to avoid potential > problems We probably want to log it, don't we? ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:145 + mangled_ast_pairs.append((mangled_name, ast_files.pop())) + + return mangled_ast_pairs ---------------- gerazo wrote: > george.karpenkov wrote: > > Overall, instead of creating a dictionary with multiple elements, and then > > converting to a list, it's much simper to only add an element to > > `mangled_to_asts` when it is not already mapped to something (probably > > logging a message otherwise), and then just return `mangled_to_asts.items()` > The reason for the previous is that we need to count the occurence number > ofdifferent mappings only let those pass through which don't have multiple > variations. ah, OK! ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:189 + # Remove all temporary files + shutil.rmtree(fnmap_dir, ignore_errors=True) + ---------------- gerazo wrote: > gerazo wrote: > > george.karpenkov wrote: > > > Having an analysis tool remove files is scary, what if (maybe not in this > > > revision, but in a future iteration) a bug is introduced, and the tool > > > removes user code instead? > > > Why not just create a temporary directory with `tempfile.mkdtemp`, put > > > all temporary files there, and then simply iterate through them? > > > Then you would be able to get rid of the constant `CPU_TEMP_FNMAP_FOLDER` > > > entirely, and OS would be responsible for cleanup. > > Yes, you are right. We are essentially using a temp dir. Because of the > > size we first had to put it next to the project (not on tmp drive for > > instance) and for debugging purposes we gave a name to it. Still it can be > > done with mkdtemp as well. > Finally, I came to the conclusion that mkdtemp would not be better than the > current solution. In order to find our created dir by other threads, we need > a designated name. Suffixing it by generated name would further complicate > things as we need not to allow multiple concurrent runs here. The current > solution is more robust from this point of view. OK ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:230 + if ctu_config.collect: + shutil.rmtree(ctu_config.dir, ignore_errors=True) + if ctu_config.collect and ctu_config.analyze: ---------------- gerazo wrote: > george.karpenkov wrote: > > Similarly to the comment above, I would prefer if analysis tool would not > > remove files (and I assume those are not huge ones?) > > Can we just use temporary directories? > Unlike above, here we do remove non-temporary data intentionally. The user > asks here to do the recollection of CTU data for a fresh start. Because there > is no "clean" functionality in the analyzer interface itself, this seemed to > be the easiest-on-user solution to save him/her an extra effort or a new > command. OK! ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:296 + 'command': [execution.cmd[0], '-c'] + compilation.flags, + 'ctu': json.loads(os.getenv('ANALYZE_BUILD_CTU')) } ---------------- gerazo wrote: > george.karpenkov wrote: > > Again, is it possible to avoid JSON-over-environment-variables? > There is an other thing against changing this. Currently the interface here > using env variables is used by intercept-build, analyze-build and scan-build > tool as well. In order to drop json, we need to change those tools too. It > would be a separate patch definitely. OK I didn't know that the JSON interface was used by other tools. In that case, ignore my comment. ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:567 + ast_command.append('-o') + ast_command.append(ast_path) + logging.debug("Generating AST using '%s'", ast_command) ---------------- gerazo wrote: > george.karpenkov wrote: > > The above can be written more succinctly as: > > `ast_command = [opts['clang'], ...] + args + ['-w', ...]` > After several iterations of the code, I find it easier to version control > such multiline constructs. If someone changes a data source, it is clear > which one (which line) was modified. The succint notation does not allow > clean VCS annotations. OK. Though you could still use split addition across multiple lines with `\` ================ Comment at: tools/scan-build-py/libscanbuild/analyze.py:609 + # Recover namedtuple from json when coming from analyze_cc + if not hasattr(ctu_config, 'collect'): + ctu_config = CtuConfig(collect=ctu_config[0], ---------------- gerazo wrote: > george.karpenkov wrote: > > In which case is this branch hit? Isn't improperly formed input argument > > indicative of an internal error at this stage? > An other part of scan-build-py, analyze_cc uses namedtuple to json format to > communicate. However, the names are not coming back from json, so this code > helps in this. This is the case when someone uses the whole toolset with > compiler wrapping. All the environment variable hassle is also happening > because of this. So these env vars are not for user modification (as you've > suggested earlier). OK so `opts['ctu']` is a tuple or a named tuple depending on how this function is entered? BTW could you point me to the `analyze_cc` entry point? For the purpose of having more uniform code with less cases to care about, do you think we could just use ordinary tuples instead of constructing a named one, since we have to deconstruct an ordinary tuple in any case? ================ Comment at: tools/scan-build-py/libscanbuild/arguments.py:376 + metavar='<ctu-dir>', + default='ctu-dir', + help="""Defines the temporary directory used between ctu ---------------- BTW can we also explicitly add `dest='ctu_dir'` here, as otherwise I was initially very confused as to where the variable is set. ================ Comment at: tools/scan-build-py/libscanbuild/clang.py:168 + return False + return True + ---------------- gerazo wrote: > george.karpenkov wrote: > > I might be missing something here, but why is the ability to call > > `--version` indicative of CTU support? > > At worst, this can lead to obscuring real bugs: imagine if the user has > > `args.clang` pointed to broken/non-existent binary, then `is_ctu_capable` > > would simply return `False` (hiding the original error!), which would show > > a completely misleading error message. > > > > Just checking `func_map_cmd` seems better, but even in this case we should > > probably log any errors occurring on `-version` call (such messages would > > really aid debugging) > The original idea was that clang can give information about CTU support > itself. However, it never happened because the analyzer is so deep down in > the system. So I am open to remove the clang binary check here. However, > clang binary is needed anyway, so the whole toolset will still throw an error > later on not having a clang binary. > so the whole toolset will still throw an error later on not having a clang > binary. Of course, but I think that would be easier to debug, and the error would mean that Clang is not available, not that CTU is not working. https://reviews.llvm.org/D30691 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits