vedgy added a comment.

Another disadvantage of using `CINDEX_VERSION_MINOR` instead of the `sizeof` is 
that an older libclang version won't know about compatibility of newer versions 
with itself. But this reasoning brought me to a different thought: when a 
program is compiled against a libclang version X, it should not be expected to 
be runtime-compatible with a libclang version older than X. For example, 
KDevelop uses `CINDEX_VERSION_MINOR` to decide whether or not certain libclang 
API is available.

I suspect modifying the exposed struct's size will violate ODR in C++ programs 
that use libclang. Quote from the cppreference ODR page 
<https://en.cppreference.com/w/cpp/language/definition#One_Definition_Rule>:

> Note: in C, there is no program-wide ODR for types, and even extern 
> declarations of the same variable in different translation units may have 
> different types as long as they are compatible. In C++, the source-code 
> tokens used in declarations of the same type must be the same as described 
> above: if one .cpp file defines `struct S { int x; };` and the other .cpp 
> file defines `struct S { int y; };`, the behavior of the program that links 
> them together is undefined. This is usually resolved with unnamed namespaces.

I think the answer to this StackOverflow question 
<https://stackoverflow.com/questions/5140893/struct-defined-differently-for-c-and-c-is-it-safe-pc-lint-warns>
 confirms my suspicion.

In order to avoid the undefined behavior, libclang can use the same approach as 
for other structs/classes: make the struct opaque and provide functions 
`clang_createIndexOptions()`,  `clang_disposeIndexOptions()`, 
`clang_setTempDirPath(CXIndexOptions options, const char *path)`.

`int excludeDeclarationsFromPCH` and `int displayDiagnostics` currently passed 
to `clang_createIndex()` can also be included in the struct and acquire their 
own setter functions. Then only a single argument will be passed to 
`clang_createIndexWithOptions()`: `CXIndexOptions`.

--------------------------

In D139774#4094619 <https://reviews.llvm.org/D139774#4094619>, @aaron.ballman 
wrote:

> In D139774#4092553 <https://reviews.llvm.org/D139774#4092553>, @vedgy wrote:
>
>> In D139774#4091631 <https://reviews.llvm.org/D139774#4091631>, 
>> @aaron.ballman wrote:
>>
>>> My preference is still for specific API names. If we want to do something 
>>> general to all temp files, I think `FileSystemOptions` is the way to go.
>>>
>>> My concern with not using a constructor is that, because this is the C API, 
>>> we will be supporting the functionality for a long time even if we do 
>>> switch to using a global override in `FileSystemOptions` where you would 
>>> need to set the information up before executing the compiler. To me, these 
>>> paths are something that should not be possible to change once the compiler 
>>> has started executing. (That also solves the multithreading problem where 
>>> multiple threads are fighting over what the temp directory should be and 
>>> stepping on each other's toes.)
>>
>> As I understand, this leaves two choices:
>>
>> 1. Specific preamble API names, two separate non-constructor setters; the 
>> option values are stored in a separate struct (or even as two separate data 
>> members), not inside `FileSystemOptions`.
>> 2. General temporary-storage arguments (possibly combined in a struct) to a 
>> new overloaded constructor function; the option values are stored inside the 
>> `FileSystemOptions` struct.
>>
>> The second alternative is likely more difficult to implement, more risky and 
>> less convenient to use (the store-in-memory bool option cannot be modified 
>> at any time). Perhaps it should be delayed until (and if) we learn of other 
>> temporary files libclang creates? A downside of implementing the first 
>> option now is that the specific API would have to be supported for a long 
>> time, even after the general temporary-file API is implemented.
>
> I still think the general solution is where we ultimately want to get to and 
> that makes me hesitant to go with specific preamble API names despite that 
> being the direction you prefer. If this wasn't the C APIs, I'd be less 
> concerned because we make no stability guarantees about our C++ interfaces. 
> But we try really hard not to break the C APIs, so adding the specific names 
> is basically committing to not only the APIs but their semantics. I think 
> that makes implementing the general solution slightly more awkward because 
> these are weird special cases that barely get tested in-tree, so they'd be 
> very easy to overlook and accidentally break.
>
> Is there a middle ground where, instead of #2 for general temporary storage, 
> we went with #2 but with compiler-specific directories instead of system 
> directories. e.g., don't let the caller set the temp directory, but do let 
> the caller set the preamble directory (which defaults to the temp directory) 
> so long as it's set before invoking the compiler? This still won't let you 
> change options mid-run, but it also seems like it should have less risk of 
> affecting other components while still solving the thread safety issues. I'm 
> not certain if it's any easier to implement, but I think it does save you 
> from modifying `FileSystemOptions`. As a separate item, we could then 
> consider adding a new C API to let you toggle the in-memory vs on-disk 
> functionality after exploring that it won't cause other problems because 
> nobody considered the possibility that it's not a stable value for the 
> compiler invocation.

After thinking some more, I'd like to revisit this decision. You agree that the 
general solution `setTempDirPath()` is an ultimate goal. But the rejection of 
tampering with the return value of `system_temp_directory()` means that 
overriding the temporary path will likely never be perfectly reliable. So how 
about a more sincere general solution: `setPreferredTempDirPath()`? The 
documentation would say that libclang tries its best to place temporary files 
into the specified directory, but some might end up in the system temporary 
directory instead.

This non-strict `Preferred` contract opens a path to design simplification. 
[correct me if I am wrong] `clang_createIndex()` does not need and is not 
likely to need a temporary directory path itself. So the documentation can 
recommend calling a non-constructor `setPreferredTempDirPath(CXIndex, const 
char *path)` function right after the call to `clang_createIndex()`. And 
possibly caution that calling `setPreferredTempDirPath()` later can result in 
fewer temporary files ending up in the overridden directory. If 
`clang_createIndex()`**does** need the temporary directory path in the future 
(unlikely I think), then the discussed overload of the `CXIndex` constructor 
function can be added.

The `FileSystemOptions` struct is an implementation detail not exposed in 
libclang. So the temporary directory path can be stored outside of it for now, 
if it simplifies the implementation. If and when the temporary directory path 
becomes more widely used and has to be passed to more classes, it could be 
moved inside the `FileSystemOptions` struct without modifying the libclang API.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139774/new/

https://reviews.llvm.org/D139774

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to