Am Sun, 18 Mar 2012 12:21:51 +0000 schrieb Iain Buclaw <ibuc...@ubuntu.com>:
> On 18 March 2012 11:32, Johannes Pfau <nos...@example.com> wrote: > > I thought about supporting emulated tls a little. The GCC emutls.c > > implementation currently can't work with the gc, as every TLS > > variable is allocated individually and therefore we don't have a > > contiguous memory region for the gc. I think these are the possible > > solutions: > > > > * Try to fix GCCs emutls to allocate all tls memory for a module > > (application/shared object) at once. That's the best solution > > and native TLS works this way, but I'm not sure if we can extract > > enough information from the runtime linker to make this work (we > > need at least the combined size of all tls variables). > > > > * Provide a callback in GCC's emutls which is called after every > > allocation. This could call GC.addRange for every variable, but I > > guess adding huge amounts of ranges is slow. > > > > Painfully slow. > > > > * Make it possible to register a custom allocator for GCC's emutls > > (not sure if possible, as this would have to be set up very early in > > application startup). Then allocate the memory directly from the GC > > (but this memory should only be scanned, not collected) > > > > * Replace the calls to mallloc in emutls.c with a custom, region > > based memory allocator. (This is not a perfect solution though, it > > can always happen that we'll need more memory) > > > > > > > > * Do not use GCC's emutls at all, roll a custom solution. This > > could be compatible with / based on dmd's tls emulation for OSX. > > Most of the implementation is in core.thread, all that's necessary > > is to group the tls data into a _tls_data_array and call > > ___tls_get_addr for every tls access. I'm not sure if this can be > > done in the 'middle-end' though and it doesn't support shared > > libraries yet. > > > > If we are going to fix TLS, I'd rather it be in the most platform > agnostic way possible, if it could be helped. That would mean also > scrapping the current implementation on Linux (just tries to mimic > what dmd does, and has corner cases where it doesn't always get it > right). You mean getting rid of __tls_beg and __tls_end? I'd also like to remove those, but: TLS is mostly object-format specific (not as much OS specific). The ELF implementation lays out the TLS data for a module (module = shared library or the application) in a contiguous way. The details are described in "ELF Handling For Thread-Local Storage" (www.akkadia.org/drepper/tls.pdf). The GC requires the TLS blocks to be contiguous, this is not the case for GCC's emulated TLS and this causes issues there. For native TLS/ELF this requirement is met, but the GC also has to know the start and the size of the TLS sections. Although the runtime linker has this information, there's no standard way to access it. So we could: * Add a custom extension API to the C libraries. We'd need at least: A 'tls_range dl_get_tls_range(void *handle)' function related to the dl* set of funtions in the runtime linker, and a 'tls_range dl_get_tls_range2(struct dl_phdr_info *info)' to be used with dl_iterate_phdr. We also need some way to get the tls range for the application, 'get_app_tls_range' (although some libcs also return the application module in dl_iterate_phdr). This seems to be the best way, but we'd have to patch every C library and it would take some time till those updated C libraries are widely deployed. The other solution is to hook directly into each C libraries non-public (and maybe non-stable!) API. For example, the structure returned by BSD libc's dl_iterate_phdr and dlopen has these fields: int tlsindex; /* Index in DTV for this module void *tlsinit; /* Base address of TLS init block size_t tlsinitsize; /* Size of TLS init block for this module size_t tlssize; /* Size of TLS block for this module size_t tlsoffset; /* Offset of static TLS block for this module size_t tlsalign; /* Alignment of static TLS block tlsindex gives us the start-address of the TLS for every thread, as long as we know how to compute the TLS address from the TP (thread pointer) and the dtv index (there are basically 2 methods, described in "ELF Handling For Thread-Local Storage") and tlssize gives us the size. However, there doesn't seem to be a painless way to do this...