[lldb-dev] "devirtualizing" files in the VFS

2018-11-15 Thread Sam McCall via lldb-dev
I'd like to get some more perspectives on the role of the VirtualFileSystem
abstraction in llvm/Support.
(The VFS layer has recently moved from Clang to LLVM, so crossposting to
both lists)

https://reviews.llvm.org/D54277 proposed adding a function to
VirtualFileSystem to get the underlying "real file" path from a VFS path.
LLDB is starting to use VFS for some filesystem interactions, but
wants/needs to keep using native IO (FILE*, file descriptors) for others.
There's some more context/discussion in the review.

My perspective is coloured by work on clang tooling, clangd etc. There we
rely on VFS to ensure code (typically clang library code) works in a
variety of environments, e.g:
in an IDE the edited file is consistently used rather than the one on disk
clang-tidy checks work on a local codebase, but our code review tool also
runs them as a service
This works because all IO goes through the VFS, so VFSes are substitutable.
We tend to rely on the static type system to ensure this (most people write
lit tests that use the real FS).

Adding facilities to use native IO together with VFS works against this,
e.g. a likely interface is
  // Returns the OS-native path to the specified virtual file.
  // Returns None if Path doesn't describe a native file, or its path is
unknown.
  Optional FileSystem::getNativePath(string Path)
Most potential uses of such a function are going to produce code that
doesn't work well with arbitrary VFSes.
Anecdotally, filesystems are confusing, and most features exposed by VFS
end up getting misused if possible.

So those are my reasons for pushing back on this change, but I'm not sure
how strong they are.
I think broadly the alternatives for LLDB are:
make a change like this to the VFS APIs
migrate to actually doing IO using VFS (likely a lot of work)
know which concrete VFSes they construct, and track the needed info
externally
stop using VFS, and build separate abstractions for tracking remapping of
native files etc
abandon the new features that depend on this file remapping

As a purist, 2 and 4 seem like the cleanest options, but that's easy to say
when it's someone else's work.
What path should we take here?

Cheers, Sam
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] "devirtualizing" files in the VFS

2018-11-15 Thread Jonas Devlieghere via lldb-dev
HI Sam,

Thanks again for taking the time to discuss this. 

> On Nov 15, 2018, at 3:02 AM, Sam McCall  wrote:
> 
> I'd like to get some more perspectives on the role of the VirtualFileSystem 
> abstraction in llvm/Support.
> (The VFS layer has recently moved from Clang to LLVM, so crossposting to both 
> lists)
> 
> https://reviews.llvm.org/D54277  proposed 
> adding a function to VirtualFileSystem to get the underlying "real file" path 
> from a VFS path. LLDB is starting to use VFS for some filesystem 
> interactions, but wants/needs to keep using native IO (FILE*, file 
> descriptors) for others. There's some more context/discussion in the review.
> 
> My perspective is coloured by work on clang tooling, clangd etc. There we 
> rely on VFS to ensure code (typically clang library code) works in a variety 
> of environments, e.g:
> in an IDE the edited file is consistently used rather than the one on disk
> clang-tidy checks work on a local codebase, but our code review tool also 
> runs them as a service
> This works because all IO goes through the VFS, so VFSes are substitutable. 
> We tend to rely on the static type system to ensure this (most people write 
> lit tests that use the real FS).

I want to emphasize that I don't have any intention of breaking any of those or 
other existing use cases. I opted for the virtual file system because it 
provides 95% of the functionality that's needed for reproducers: the real 
filesystem and the redirecting file system. It has the yaml mapping writer and 
reader, the abstraction level above the two, etc. It feels silly to implement 
everything again in LLDB (actually it would be more like copy/pasting 
everything) just because we miss that 5%, so I'm really motivated to find a 
solution that works for all of us :-) 

> Adding facilities to use native IO together with VFS works against this, e.g. 
> a likely interface is
>   // Returns the OS-native path to the specified virtual file.
>   // Returns None if Path doesn't describe a native file, or its path is 
> unknown.
>   Optional FileSystem::getNativePath(string Path)
> Most potential uses of such a function are going to produce code that doesn't 
> work well with arbitrary VFSes.
> Anecdotally, filesystems are confusing, and most features exposed by VFS end 
> up getting misused if possible.

You're right and this is a problem/limitation for LLDB as well. This was the 
motivation for the `ExternalFileSystem` (please forgive me for the terrible 
name, just wanted to get the code up in phab) because it had "some" semantic 
meaning for both implementations. But I also understand your concerns there. 

> So those are my reasons for pushing back on this change, but I'm not sure how 
> strong they are.
> I think broadly the alternatives for LLDB are:
> make a change like this to the VFS APIs
> migrate to actually doing IO using VFS (likely a lot of work)
> know which concrete VFSes they construct, and track the needed info externally
> stop using VFS, and build separate abstractions for tracking remapping of 
> native files etc
> abandon the new features that depend on this file remapping

Can you elaborate on what you have in mind for (3) and how it differs from (4)?

> As a purist, 2 and 4 seem like the cleanest options, but that's easy to say 
> when it's someone else's work.
> What path should we take here?

I'll withhold from answering this as I'm one of the stakeholders ;-) 

> 
> Cheers, Sam

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] [cfe-dev] "devirtualizing" files in the VFS

2018-11-15 Thread Jonas Devlieghere via lldb-dev


> On Nov 15, 2018, at 3:34 AM, Whisperity  wrote:
> 
> I am really not sure if adding real file system functionality strictly into 
> the VFS is a good approach. This "ExternalFileSystem" thing sounds weird to 
> me.

The `ExternalFileSystem` was an attempt to provide a more limited interface 
while exposing the "external" path in a way that made sense for the 
RedirectingFileSystem. Like Sam said in the review it's not great because it 
only does half of the work. 

> Does LLDB need to *write* the files through the VFS? I'm not sure perhaps a 
> "WritableVFS" could be implemented, or the necessary casting/conversion 
> options.

Most likely yes because of LLDB's design that abstracts over flies without 
prior knowledge about whether they'd only get read or written. However wouldn't 
it suffer form the exact same problems? 

> In case:
>  - there is a real path behind the file --- you could spawn an 
> llvm::RealFileSystem (the fqdn might not be this after the migration patch!) 
> and use that to obtain the file's buffer.

I'm not sure I follow what you have in mind here. Can you give a little more 
detail?

> How can you be sure the file actually exists on the FS? That's what the VFS 
> should be all about, hiding this abstraction... if you *are* sure it exists, 
> or want to make sure, you need to pull the appropriate realFS from the VFS 
> Overlay (most tools have an overlay of a memoryFS above the realFS).

That makes sense, for LLDB's use case we would be happy having just a real or 
redirecting filesystem (with fall through). 

> What I am not sure about is extending the general interface in a way that it 
> caters to a particular (or half of a particular) use case.

I totally understand this sentiment but I don't think that's totally fair. 
Finding files in different locations is an important feature of the VFS, when 
it was introduced in 2014 this was the only use case. The "devirtualization" 
aspect is unfortunate because native IO. 

> For example, in CodeCompass, we used a custom VFS implementation that 
> "hijacked" the overlay and included itself between the realFS and the 
> memoryFS. It obtains files from the database!
> 
> See:
> https://github.com/Ericsson/CodeCompass/blob/a1a7b10e3a9e2e4f493135ea68566cee54adc081/plugins/cpp_reparse/service/src/databasefilesystem.cpp#L191-L224
>  
> 
> 
> These files *do not necessarily* (in 99% of the cases, not at all) exist on 
> the hard drive at the moment of the code wanting to pull the file, hence why 
> we implemented this to give the file source buffer from DB. The ClangTool 
> that needs this still gets the memoryFS for its own purposes, and for the 
> clang libraries, the realFS is still under there.
> 
> Perhaps the "Status" type could be extended to carry extra information? 
> https://github.com/Ericsson/CodeCompass/blob/a1a7b10e3a9e2e4f493135ea68566cee54adc081/plugins/cpp_reparse/service/src/databasefilesystem.cpp#L85-L87
>  
> 

This sounds like an interesting idea. We already have the option to expose the 
external name here, would it be reasonable to also expose the external path 
here? (of course being an optional)

> 
> Sam McCall via cfe-dev  > ezt írta (időpont: 2018. nov. 15., Cs, 
> 12:02):
> I'd like to get some more perspectives on the role of the VirtualFileSystem 
> abstraction in llvm/Support.
> (The VFS layer has recently moved from Clang to LLVM, so crossposting to both 
> lists)
> 
> https://reviews.llvm.org/D54277  proposed 
> adding a function to VirtualFileSystem to get the underlying "real file" path 
> from a VFS path. LLDB is starting to use VFS for some filesystem 
> interactions, but wants/needs to keep using native IO (FILE*, file 
> descriptors) for others. There's some more context/discussion in the review.
> 
> My perspective is coloured by work on clang tooling, clangd etc. There we 
> rely on VFS to ensure code (typically clang library code) works in a variety 
> of environments, e.g:
> in an IDE the edited file is consistently used rather than the one on disk
> clang-tidy checks work on a local codebase, but our code review tool also 
> runs them as a service
> This works because all IO goes through the VFS, so VFSes are substitutable. 
> We tend to rely on the static type system to ensure this (most people write 
> lit tests that use the real FS).
> 
> Adding facilities to use native IO together with VFS works against this, e.g. 
> a likely interface is
>   // Returns the OS-native path to the specified virtual file.
>   // Returns None if Path doesn't describe a native file, or its path is 
> unknown.
>   Optional FileSystem::getNativePath(string P

Re: [lldb-dev] [CMake] LLDB framework / shared library

2018-11-15 Thread Stefan Gränitz via lldb-dev
Hi Alex, hi Pavel

Thanks for your feedback and sorry for the late reply, I was a little busy on 
the other sites.

Ok, so making the LLDB_FRAMEWORK_INSTALL_DIR vs. LLDB_FRAMEWORK_BUILD_DIR 
distinction and changing the approach for tools from symlink to copy sound like 
good tasks to start with.

I will see how far I get without changing the processing order. I am a little 
afraid about Pandora's box here. Anyway, thanks for the explanation and 
proposal!

Had a look at the history and it turned out the reason for the 3.7 requirement 
was a bug in CMake (https://gitlab.kitware.com/cmake/cmake/issues/16363). There 
was a workaround, but it was dropped (https://reviews.llvm.org/rLLDB289841). 
Policy-wise CMake always acts as if it was cmake_minimum_required, so I am 
always sceptic about version checks, but for a bug like this it seems fine. 
Let’s keep it as long as cmake_minimum_required is below.

Best
Stefan

> On 10. Nov 2018, at 11:22, Pavel Labath  wrote:
> 
> Hello Stefan,
> 
> first off, I want to point out that although I spent a lot of time
> digging through our cmake files, I am not using the framework build, nor
> am familiar with it's details.
> 
> I also don't know why the framework was split into multiple targets.
> What I do know is that we then ended up adding extra indirections so
> that we can make setting up dependencies easier (lldb can just depend on
> ${LLDB_SUITE_TARGET} without knowing whether framework was enabled or
> not). If you can merge everything into a single target, then this
> indirection could also go away.
> 
> Your ideas sound reasonable to me. As Alex points out, centralizing
> everything to API/CMakeLists.txt may be tricky, although I agree that is
> the natural place to do this sort of thing. The trickyness comes from
> the fact that our cmake files are not processed in dependency order.
> Although this would be somewhat unconventional, I wouldn't be opposed to
> changing the cmake processing order so that it more closely matches the
> dependencies. If that makes things easier for you, that is.
> 
> One way to achieve this would be to take out source/API and
> tools/debugserver invocations out of source and tools CMakeLists.txt
> files, respectively. and put them into the top level file. So then the
> top level file would do something like:
> ```
> add_subdirectory(source) # Build all consituent libraries
> 
> # Tools going into the lldb framework
> add_subdirectory(tools/debugserver)
> add_subdirectory(tools/argdumper)
> ...
> 
> add_subdirectory(source/API) # Build LLDB.framework (or liblldb) itself
> add_subdirectory(tools) # Build tools that depend on the framework
> ```
> 
> This could be cleaned up if we moved the source code around, but I'm not
> sure we want to take that hit (I don't think it's worth it). Also, it's
> just an idea, I don't know whether that will make your job easier or not.
> 
> Good luck,
> pavel
> 
> On 09/11/18 21:31, Alex Langford wrote:
>> Hi Stefan,
>> 
>> Thanks for taking the time to improve LLDB's CMake infrastructure!
>> 
>> (1) I don't entirely remember the reason I had separated them out into 
>> separate targets. A post-build step would be fine, so feel free to merge 
>> these.
>> (2) I have no problems with this. If I remember correctly, I wanted to put 
>> everything into LLDBFramework.cmake originally and include it if 
>> LLDB_BUILD_FRAMEWORK got set, but that proved more difficult than I 
>> realized. I think what you are suggesting is better than what I ended up 
>> doing. Feel free to make that change.
>> (3) I think this is a good idea. I found this confusing as well.
>> (4) If that is the case, I think I agree that the Xcode behavior is better. 
>> (5) I'm not sure that this is going to actually simplify things. I don't 
>> think it's possible to do in source/API/CMakeLists.txt because the tools 
>> haven't been added as targets yet. With the current CMake logic, you could 
>> pull this logic into its own section that happens after the tools are added 
>> as targets. I don't find this any better than what we have now.
>> (6) I'm not sure why this is the case actually. I believe this was added by 
>> beanz originally, I just moved this check to be closer to the beginning of 
>> the build. If it works with CMake versions less than 3.7 then I have no 
>> issues with removing it.
>> 
>> Let me know if something is unclear or you have further questions/concerns.
>> 
>> Alex
>> 
>> 
>> On 11/9/18, 9:38 AM, "sgraen...@apple.com on behalf of Stefan Gränitz" 
>>  wrote:
>> 
>>Hello Alex, hello Pavel
>> 
>>I spent some time creating/streamlining our CMake infrastructure 
>> downstream of LLDB and learned a few things about bundles, versions, 
>> code-signing etc. (mostly on the Darwin side of things). I am currently 
>> sorting out what can be upstreamed and prepare reviews.
>> 
>>Some work is still todo for the LLDB shared library/framework (for 
>> simplicity I will call it LLDB.framework). It would be great to kno