Control: tags -1 patch

Hi all,

Am Montag, den 08.06.2015, 11:45 +0200 schrieb Fabian Greffrath:
> So, in absence of a better approach, [...]

I think I have found another, maybe even the final, fix for this issue.
Remember that the operands in SSE functions must be aligned on 16-byte
boundaries. In the init_xrpow_core_sse() function these operands are on
the stack. However, when the code is called from the ocaml bindings,
the stack is allocated by ocaml which does not adhere to the 16-byte
boundary rule and thus casues the code to crash. So what we really need
here is a means for the init_xrpow_core_sse() function to maintain its
own stack and align it according to its needs. Now, guess what compiler
flag I found yesterday? ;)

-mstackrealign
    Realign the stack at entry. On the x86, the -mstackrealign option
generates an alternate prologue and epilogue that realigns the run-time
stack if necessary. This supports mixing legacy codes that keep 4-byte
stack alignment with modern codes that keep 16-byte stack alignment for
SSE compatibility. See also the attribute force_align_arg_pointer,
applicable to individual functions.

This flag applies per-file. If it is added to
liblamevectorroutines_la_CFLAGS (next to the -msse flag) in
libmp3lame/vector/Makefile.am the crash does not occure anymore.

There is also a very similar per-function variant in the form of the
force_align_arg_pointer attribute, but in the case at hand, all
functions in the libmp3lame/vector/xmm_quantize_sub.c file call SSE
-related code and thus I think it is safe to apply this flag file-wide.

I'll be glad to read that this flag fixes the issue for you as well and
read your opinions about the per-function or per-file variants.

Best regards,

Fabian
--- a/libmp3lame/vector/xmm_quantize_sub.c
+++ b/libmp3lame/vector/xmm_quantize_sub.c
@@ -52,6 +52,7 @@ static const FLOAT costab[TRI_SIZE * 2]
 
 
 
+__attribute__((force_align_arg_pointer))
 void
 init_xrpow_core_sse(gr_info * const cod_info, FLOAT xrpow[576], int upper, FLOAT * sum)
 {

--- a/libmp3lame/vector/Makefile.am
+++ b/libmp3lame/vector/Makefile.am
@@ -20,6 +20,7 @@ xmm_sources = xmm_quantize_sub.c
 
 if WITH_XMM
 liblamevectorroutines_la_SOURCES = $(xmm_sources)
+liblamevectorroutines_la_CFLAGS = -msse -mstackrealign
 endif
 
 noinst_HEADERS = lame_intrin.h

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to