I decided to work on adding OpenMP support to the `graphics/dartktable`
starting at version `3.6.1`.

Here's the original revision with OpenMP disabled.

http://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/graphics/darktable/Makefile?rev=1.113&content-type=text/x-cvsweb-markup

Initially I've rebuild gcc8 from ports enabling OpenMP for -main and
-libs packages.

This gcc with omp support seemed to work fine at first glance but I
quickly realized the makefiles generated for Darktable were
incompatible.

- OpenMP was detected, good part

- Issues with `external/rawspeed` cmake checks for l1d cache line size
  and page size querying, partially fixed (workaround)
  `_SC_LEVEL1_DCACHE_LINESIZE` being undefined on OpenBSD

- base `as(1)` and the ports-gcc where conflicting on instruction sets
  giving `Error: no such instruction: vmovlps [...]` and others for
  some C and C++ Darktables sources.

After seeing those `as(1)` conflicts, I decided to rebuild gcc-8 with

    CONFIGURE_ARGS +=   --with-as="${LOCALBASE}/bin/gas"

and errors were gone.

Others followed. Missing `graphite` and `ifunc`. This was too much time
spent on GCC, so I decided to switch back the integration to base-
clang.

With the help of two OpenBSD 6.8 unofficial ports for `stlibomp` and
`libomp` found there:

https://j-bm.github.io/on/onp.html

I upgraded them to llvm 13.0.0 locally and good results started to
show.

In `graphic/darktable`,

    -DUSE_OPENMP=ON

with

    COMPILER =  base-clang

made a working package.

I locally added two temporary patches covering the issues encountered
initially.  The first one is for the l1d cache line size detection
which
calls this for `defined(__OpenBSD__)` and others:

    if (sysctlbyname("hw.cachelinesize", &val, &size, NULL, 0)

This is not declared for OpenBSD. Maybe a good approach would be to
implement a `sysctl(2)` constant for `_SC_LEVEL1_DCACHE_LINESIZE` as
seen around, see:

https://github.com/darktable-org/rawspeed/blob/88e2b92a204f2d404d156b0ba11a66604685d5a0/cmake/Modules/cpu-cache-line-size.cpp

As I said on GH, sysctl(2) would be perfect except that this CPUID
information is not available on OpenBSD currently.  There's no
`_SC_LEVEL1_DCACHE_LINESIZE` as well as others 0x8000005/6 cache
informations defined.  For information, I did not see such definitions
for other BSD's. Maybe I've not looked so deep, but at first glance,
there's no hw.cachelinesize as well...

I then scratched this to get the required value:

#include <stdio.h>
// This returns 64 on my AMD Ryzen 7 PRO 3700U
// I'm not 100% sure this is a portable solution across Intel and AMD
// Also I'm no 100% sure how to get the right L1D value rn, still
studying it.
// Seems that on AMD I shall return ECX from bits 0-7, I think a proper
__get_cpuid
// (cpuid.h from gcc, also available for clang afaik) would be better.
int main()
{
                uint32_t linesize;
                __asm__("cpuid"
                                                : "=a" (linesize)
                                                : "0" (5));
                printf("%d\n",linesize);
                return 0;
}

Thanks to these references:

https://en.wikipedia.org/wiki/CPUID#CPU-specific_information_outside_x86
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/cpuid.h
http://www.flounder.com/cpuid_explorer2.htm#CPUID(0x80000005)

This allowed me to produce this _dirty_ local workaround:

--- src/external/rawspeed/cmake/Modules/cpu-cache-line-size.cpp.orig  
Thu Jan 27 03:13:57 2022
+++ src/external/rawspeed/cmake/Modules/cpu-cache-line-size.cpp Thu Jan
27 03:16:04 2022
@@ -14,7 +14,7 @@ int main() {
   return 0;
 }

-#elif defined(__FreeBSD__) || defined(__NetBSD__) ||
defined(__OpenBSD__) ||   \
+#elif defined(__FreeBSD__) || defined(__NetBSD__) || \
     defined(__DragonFly__) || defined(__APPLE__)

 #include <stddef.h> // for size_t
@@ -27,6 +27,13 @@ int main() {
     return 1;
   std::cout << val << std::endl;
   return 0;
+}
+
+#elif defined(__OpenBSD__)
+int main() {
+    size_t val = 64;
+       std::cout << val << std::endl;
+       return 0;
 }

 #elif defined(_WIN32) || defined(_WIN64)

The second one was simpler as it was just a missing
`defined(__OpenBSD__)` for getting the page size constant.

--- src/external/rawspeed/cmake/Modules/cpu-page-size.cpp.orig  Thu Jan
27 03:08:25 2022
+++ src/external/rawspeed/cmake/Modules/cpu-page-size.cpp       Thu Jan
27 03:10:39 2022
@@ -4,7 +4,8 @@
 #include <unistd.h> // for _POSIX_C_SOURCE, sysconf, _SC_PAGESIZE
 #endif

-#if (defined(_POSIX_C_SOURCE) && _POSIX_C_SOURCE >= 1) ||
defined(__APPLE__)
+#if (defined(_POSIX_C_SOURCE) && _POSIX_C_SOURCE >= 1) || \
+    defined(__APPLE__) || defined(__OpenBSD__)

 int main() {
   long val = ::sysconf(_SC_PAGESIZE);

At the end of the day, Darktable is now working _a lot_ faster with
OpenMP. I wish to be able to get `stlibomp` and `libomp` starting at
13.0.0 in ports and a proper workaround or the missing CPUID
definitions in sys.

Thanks for your reading, please let me know your thoughts.


Reply via email to