On Tue, Jun 14, 2016 at 6:09 PM Adrian Prantl <apra...@apple.com> wrote:
> On Jun 13, 2016, at 9:37 AM, Steven Wu <steve...@apple.com> wrote: > > Thanks for the feedback! Replies inline. > > On Jun 12, 2016, at 11:44 PM, Eric Christopher <echri...@gmail.com> wrote: > > Hi Steven, > > Great to see the commentary and updates here. I've got a few questions > about some of this work. It might be nice to see some separate RFCs for a > couple of things, but we'll figure that out after you send out patches > probably :) > > What needs to be improved: >> 1. Whitelist for command line options that can be used with bitcode: >> Current trunk implementation embeds all the cc1 command line options >> (that includes header include paths, warning flags and other front-end >> options) in the command line section. That is lot of redundant information. >> To re-create the object file from the embedded optimized bitcode, most of >> these options are useless. On the other hand, they can leak information of >> the source code. One solution will be keeping a list of all the options >> that can affect code generation but not encoded in the bitcode. I have >> internally prototyped with disallowing these options explicitly and allowed >> only the reminder of the options to be embedded ( >> http://reviews.llvm.org/D17394). A better solution might be encoding >> that information in "Options.td" as specific group. >> > > This is really interesting. I'm not a particularly security minded person > so I don't have a lot of commentary there. An explicit whitelist sounds a > bit painful to keep maintained, explicitly having a group in Options.td > sounds pretty nice. You'll need to add them to multiple groups, but it > seems pretty nice. > > > I have already implemented the new approach in > http://reviews.llvm.org/D21230. It creates a new group for all the cc1 > options that can affect codegen but not having a corresponding attribute in > the bitcode. When I wrote up this patch, I think it is also a good idea to > extend the group to driver flags so clang driver can issue warnings when > using these flags with LTO because they are likely to be dropped in the > process. That is my next thing to do if someone reviews my patch and agrees > that is right thing to do. > > > >> 2. Assembly input handling: >> This is a workaround to allow source code written in assembly to work >> with "-fembed-bitcode" options. When compiling assembly source code with >> "-fembed-bitcode", clang-as creates an empty section "__LLVM, __asm" in the >> object file. That is just a way to distinguish object files compiled from >> assembly source from those compiled from higher level source code but >> forgot to use "-fembed-bitcode" options. Linker can use this section to >> diagnose if "-fembed-bitcode" is consistently used on all the object files >> participated in the linking. >> > > I'm surprised you want a separate and empty section and not a header flag > as those are easier to keep around and won't take up a precious mach-o > section. There are probably other options here as well. There are probably > other options or concerns that someone shipping bitcode might have here as > well, but I'm sure those are being talked about - doesn't have too much > affect on the community though. > > > I suppose you mean the alternative is to burn a macho command for that. > Well, that is a limited resource and we don't have much left. Plus, using > empty section will make this accessible to other binary format, not only > macho files. I also have an interesting thought about handle the assembly, > that is to wrap it in module assembly in a bitcode file. I am not sure it > would preserve the all semantics of the original assembly and that would > mean I need to somehow teach the assembler about bitcode (which might make > this not very attractive). Yes, you might be right this doesn't affect the > community, if no one else is interesting in a solution for the problem we > have, then this might not be suitable for contributing. I am happy to keep > it downstream. > > > 3. Bitcode symbol hiding: >> There was some concerns for leaking source code information when using >> bitcode feature. One approach to avoid the leak is to add a pass which >> renames all the globals and metadata strings. The also keeps a reverse map >> in case the original name needs to be recovered. The final bitcode should >> contain no more symbols or debug info than a stripped binary. To make sure >> modified bitcode can still be linked correctly, the renaming need to be >> consistent across all bitcode participated in the linking and everything >> that is external of the linkage unit need to be preserved. This means the >> pass can only be run during the linking and requires some LTO api. >> > > How are you planning to ensure the safety of the reverse map? Seems that > requiring linking is a bit icky, but might work. Are you mostly worried > about function names that could be stripped out? What LTO api are you > envisioning here? > > > The reverse map is emitted as a separate file from the output > binary/bitcode. It should not be shipped together with the binary output, > just like dSYM bundle. > The reason it needs to be done after linking is a limitation of the symbol > hiding technique. It requires that the symbols must be resolved. Think > about the following case: > a.o: > T export_symbol > T global_symbol > t local_symbol > b.o: > U global_symbol > > To make sure the bitcode after symbol hiding pass can still link and > produce the same output, the pass need to rename them: > a.o: > T export_symbol --> export_symbol (preserve) > T global_symbol --> hidden_symbol_1 (rename, but need to have the same > name as the one in b.o) > t local_symbol --> hidden_symbol_2 (rename, but don't care what it > becomes) > b.o: > U global_symbol --> hidden_symbol_1 > The pass need to know what symbols to keep and a global renaming table so > the names after renaming are consistent across all the modules. > > > > >> 4. Debug info strip to line-tables pass: >> As the name suggested, this pass strip down the full debug info to >> line-tables only. This is also one of the steps we took to prevent the leak >> of source code information in bitcode. >> > > I'm very curious about what's going on here. Could you elaborate? :) > > > Cc Adrian > He would know more about it. I would only know that it can reconstruct > -gline-tables-only debug info from full debug info. We use it as a part of > the bitcode pipeline because we don't want the bitcode file to be > exceedingly large but I can see this pass to be useful in other > circumstances. > > > It’s a pass that aims at downgrading full debug info to -gline-tables-only > debug info. It performs a deep copy of the debug metadata graph, removing > all types and variables while leaving all locations, subprograms, and > inline information intact. It then makes the IR point to the newly created > stripped subprograms, CU, and locations. > > Sounds like what I'd expect. How are you planning on hooking all of these up? -eric > -- adrian > > > Steven > > > Thanks a ton for the update - glad to see this being worked on! > > -eric > > >> >> Please let me know what do you think about the pieces above or if you >> have any concerns about the methodology. I will put up patches for review >> soon. >> >> Thanks >> >> Steven >> _______________________________________________ >> LLVM Developers mailing list >> llvm-...@lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits