Hi Tung, thanks to the very helpful and detailed response of Christian Seiler which I would like you to read below in detail since it explains the patches applied in the Debian packaging I was able to build the latest iqtree version. You can find all patches that are applied to iqtree here:
https://anonscm.debian.org/cgit/debian-med/iqtree.git/tree/debian/patches Please also note that I have updated the spelling patch with some spelling mistakes (in code and comments). Kind regards and thanks for your cooperation Andreas. On Tue, Jun 28, 2016 at 08:09:15PM +0200, Christian Seiler wrote: > On 06/28/2016 11:01 AM, Andreas Tille wrote: > > I admit I can not answer the question asked by upstream. The package in > > question is iqtree[1] and they said that they have different > > computational kernels implemented to respect different hardware. > > Current Git[1] does not even build - may be due to some fine tuning of > > gcc options needed??? > > I've looked at this, and there are a couple of things going on here: > > 0. Debian's build flags by default assume a generic architecture, so > you don't have to do anything by yourself. > > 1. Upstream's build system supports multiple options for the entirety > of the code. So you can compile the entire code with the AVX or FMA > instruction set. You patch that out completely from the CMakeLists.txt > in sse.patch, but that isn't actually required. (IQTREE_FLAGS would > have to be explicitly set to enable this.) > > 2. Furthermore, upstream's build system provides SSE and AVX kernels > for regardless of the build flags of the rest of the code, and they are > always compiled. (Well, you can disable compilation of the AVX kernel > if you add "novx" to IQTREE_FLAGS, but there's no reason to.) This > should work out of the box. > > That said, the code doesn't support non-SSE at all, because it hard- > codes at least SSE2 intrinsics in a lot of platces (and the one part > where it hardcodes SSE3, you already have a patch for). The code can > therefore not be compiled without SSE support enabled, unfortunately, > even on i386. If you want to support non-SSE at all on i386, upstream > (or yourself) needs to implement the routines in the vectorclass/ > directory (and possibly others) for non-SSE systems. (The kernel that > optionally uses AVX also exists in a non-SSE variant, so upstream is > not completely wrong about that, but there's a lot of _other_ code that > forces at least SSE2. > > 3. pll/ has a bug that it calls posix_memalign with PLL_BYTE_ALIGNMENT. > However, according to the manpage, the alignment must be a multiple of > sizeof(void *) for posix_memalign to work (and a power of 2), but > PLL_BYTE_ALIGNMENT is 1 if SSE3 is not used. If you explicitly set it > to 8 (to catch both 32bit and 64bit), posix_memalign will not fail and > the program won't segfault anymore. (posix_memalign with wrong align > argument will just return without a possibility to check for an error, > but also not allocating a buffer, leaving it empty.) > > Note though that if you don't compile with sse3 flags enabled, pll will > not use SSE code at all (other than that which the compiler generates), > which is probably slower. But it does work, though. (A grep for __SSE3 > shows though that porting this would be a LOT of work.) > > Irrespective of the SSE-stuff, two things: > > 1. Your debian/rules calls dh_auto_clean/configure/build in > override_dh_auto_build to build two variants. This can be done in a > more elegant way, because CMake does support out of tree builds, and > you can have debhelper use a specific build directory by specifying > -Bdirname to dh_auto_*. > > 2. You might want to add --parallel to your dh call in debian/rules. > CMake-based projects tend to support paralle builds, and iqtest is > no exception to that rule. Would speed up build times quite a bit. > > 3. If you want to test the -omp binary as you do in debian/rules > currently, you have to pass -redo, otherwise the second call will > simply fail. > > I've update your sse.patch to include the SSE-related fixes, and have > updated debian/rules to incorporate the two other things. Attached both > to this email. The package now builds on amd64 and i386 (and probably > will build on the kfreebsd and hurd variants thereof, though I haven't > checked) and the test suite runs. The AVX/FMA checks in CMakeLists.txt > are now not removed, because debian/rules never sets IQTEST_FLAGS to > fma or avx. (On amd64 the avx kernel is built with -mavx regardless > separately by the build system, so that's also OK; and on my Haswell > system the AVX detection works. On i386 the AVX kernel is never built, > as per what the upstream build system decided.) However, even on i386, > SSE2 support is required for this to work, otherwise the program will > crash with either illegal instruction or a segfault at start. (I can > provide you with a preinst script that checks for SSE2 support to show > a nice error message at package installation time, if you so wish.) > > Additionally (what I've NOT done): please check the lintian info and > pedantic messages of the package: > > - out-of-date-standards-version 3.9.7 > - a couple of spelling-error-in-binary > - hardening-no-pie/hardening-no-bindnow: consider enabling all > hardening flags (if that works, haven't checked) > - copyright-refers-to-symlink-license: you should refer to > /usr/share/common-licenses/GPL-3 and not .../GPL in the GPL-3 > block of debian/copyright > > Hope that helps. > > Regards, > Christian > #!/usr/bin/make -f > > # DH_VERBOSE := 1 > > pkg := $(shell dpkg-parsechangelog | sed -n 's/^Source: //p') > version=$(shell dpkg-parsechangelog -ldebian/changelog | grep Version: | cut > -f2 -d' ' | cut -f1 -d- ) > mandir=$(CURDIR)/debian/$(pkg)/usr/share/man/man1/ > > %: > dh $@ --parallel > > VARIANTS = omp serial > > override_dh_auto_configure: $(foreach > variant,$(VARIANTS),dh_auto_configure_$(variant)) > override_dh_auto_build: $(foreach > variant,$(VARIANTS),dh_auto_build_$(variant)) > override_dh_auto_install: $(foreach > variant,$(VARIANTS),dh_auto_install_$(variant)) > override_dh_auto_clean: $(foreach > variant,$(VARIANTS),dh_auto_clean_$(variant)) > > dh_auto_configure_omp: > dh_auto_configure -Bbuild.omp -- -DIQTREE_FLAGS="omp" > > dh_auto_configure_serial: > dh_auto_configure -Bbuild.serial -- -DIQTREE_FLAGS="" > > dh_auto_build_%: > dh_auto_build -Bbuild.$(subst dh_auto_build_,,$@) > > dh_auto_install_%: > dh_auto_install -Bbuild.$(subst dh_auto_install_,,$@) > > dh_auto_clean_%: > dh_auto_clean -Bbuild.$(subst dh_auto_clean_,,$@) > > override_dh_installexamples: > dh_installexamples > # remove example files in unusual dir > rm -f debian/*/usr/models.nex > rm -f debian/*/usr/example.[np][eh][xy] > > override_dh_installman: > mkdir -p $(mandir) > help2man --no-info --no-discard-stderr --help-option="-h" \ > --name='efficient phylogenetic software by maximum likelihood' \ > --version-string="$(version)" > $(CURDIR)/debian/$(pkg)/usr/bin/iqtree > $(mandir)/iqtree.1 > help2man --no-info --no-discard-stderr --help-option="-h" \ > --name='efficient phylogenetic software by maximum likelihood > (multiprocessor version)' \ > --version-string="$(version)" > $(CURDIR)/debian/$(pkg)/usr/bin/iqtree-omp > $(mandir)/iqtree-omp.1 > > override_dh_auto_test: > # use only the first example for build time test to save time on > autobuilders > # if [ "`find $(CURDIR) -name iqtree -type f -executable`" = "" ] ; then \ > # iqtreeomp=`find $(CURDIR) -name iqtree-omp -type f -executable` > ; \ > # ln -s iqtree-omp `dirname $$iqtreeomp`/iqtree ; \ > # fi > sed '/ myprefix/,$$d' debian/Documents_source/example.sh > example.short > echo 'time $(CURDIR)/build.omp/iqtree-omp -s example.phy -omp 2 -redo' > >> example.short > time sh example.short > rm example.short > Description: Do not use -m32 and -msse3 flags > Bug-Debian: https://bugs.debian.org/813436 > Author: Andreas Tille <ti...@debian.org> > Last-Update: Tue, 02 Feb 2016 08:41:45 +0100 > > --- a/CMakeLists.txt > +++ b/CMakeLists.txt > @@ -1,4 +1,4 @@ > -################################################################## > + ################################################################## > # IQ-TREE cmake build definition > # Copyright (c) 2012-2015 Bui Quang Minh, Lam Tung Nguyen > ################################################################## > @@ -172,7 +172,7 @@ if(CMAKE_SIZEOF_VOID_P EQUAL 4 OR IQTREE > endif() > SET(EXE_SUFFIX "${EXE_SUFFIX}32") > if (GCC OR CLANG) > - set(COMBINED_FLAGS "${COMBINED_FLAGS} -m32") > + set(COMBINED_FLAGS "${COMBINED_FLAGS}") > endif() > add_definitions(-DBINARY32) > else() > @@ -237,7 +237,7 @@ SET(SSE_FLAGS "") > if (VCC) > set(SSE_FLAGS "/arch:SSE2 -D__SSE3__") > elseif (GCC OR CLANG) > - set(SSE_FLAGS "-msse3") > + set(SSE_FLAGS "-msse2") > elseif (ICC) > if (WIN32) > set(SSE_FLAGS "/arch:SSE3") > @@ -273,8 +273,7 @@ elseif (IQTREE_FLAGS MATCHES "avx") # AV > > SET(EXE_SUFFIX "${EXE_SUFFIX}-avx") > else() #SSE intruction set > - message("Vectorization : SSE3") > - add_definitions(-D__SSE3) > + message("Vectorization : SSE2") > > endif() > > --- a/phylokernel.h > +++ b/phylokernel.h > @@ -15,6 +15,10 @@ > inline Vec2d horizontal_add(Vec2d x[2]) { > #if INSTRSET >= 3 // SSE3 > return _mm_hadd_pd(x[0],x[1]); > +#elif INSTRSET >= 2 // SSE3 > + Vec2d help0 = _mm_shuffle_pd(x[0], x[1], _MM_SHUFFLE2(0,0)); > + Vec2d help1 = _mm_shuffle_pd(x[0], x[1], _MM_SHUFFLE2(1,1)); > + return _mm_add_pd(help0, help1); > #else > #error "You must compile with SSE3 enabled!" > #endif > --- a/pll/pll.h > +++ b/pll/pll.h > @@ -82,7 +82,7 @@ extern "C" { > #define PLL_VECTOR_WIDTH 2 > > #else > -#define PLL_BYTE_ALIGNMENT 1 > +#define PLL_BYTE_ALIGNMENT 8 > #define PLL_VECTOR_WIDTH 1 > #endif > -- http://fam-tille.de