use of preserve_temp_slots in copy_blkmode_from_reg

2005-05-10 Thread Jan Beulich
copy_blkmode_from_reg (called from expand_call) calls preserve_temp_slots 
without pushing a scope first. Hence it appears to be possible, and I'm 
appearantly running into such a case, that it calls this when temp_slot_level 
is zero, leading to array accesses with index -1 (resulting in ill behavior, a 
SEGV for me). Would it be the right thing to wrap this and the immediately 
preceding call to assign_temp into push_temp_slots/pop_temp_slots? Or is it the 
other way around and the call to preserve_temp_slots is improper here?

While looking at this, I found that expr.c always calls free_temp_slots 
immediately followed by pops_temp_slots. Isn't the call to free_temp_slots 
pointless there?

Thanks, Jan



building 3.4.x with in-tree binutils 2.16.*

2005-06-17 Thread Jan Beulich
These patches

http://gcc.gnu.org/ml/gcc-patches/2004-03/msg02319.html 
http://gcc.gnu.org/ml/gcc-patches/2004-10/msg00466.html 

are needed (the former in at least a rudimentary backport form) in order to be 
able to build a 3.4.x tree with an overlaid binutils 2.16 or newer tree. Are 
there any plans to backport these?

Jan



Bug or feature: symbol names of global/extern variables

2005-10-06 Thread Jan Beulich
I don't think this has anything to do with binutils; whether linking
succeeds exclusively depends on the mangling method used (VC does mangle
data object names, while g++ doesn't). AFAIK the standard only talks
about function signatures, meaning mangling data object names is neither
prohibited nor required (i.e. implementation defined) [7.5 clause 9
deals with that].

Jan

>>> Wolfgang Roemer <[EMAIL PROTECTED]> 06.10.05 14:44:58 >>>
Hello,

I encountered a subtle SEGV in a program and was able to track the
problem 
down to symbol names concerning global/extern variables. I discussed
that 
with some guys from the GCC project (see recipient list) and we came to
the 
conclusion it would make more sense to share our thoughts with you.
Here the 
problem:


If you have a global variable inside a cpp file and create a library
out of 
that, the symbol name for that global variable does in no way take the
type of 
the variable into account. A user of that variable can "make" it any
type 
with an "extern" declaration and thus produce subtle errors. An
example:


 lib.cpp 
int maximum;
int minimum;

static bool init ( )
{
  maximum = 2;
  minimum = -7;
}

static bool initialized = init ( );
---

Create a library out of that lib.cpp file. Then compile the following
main.cpp 
and link it against the library:


- main.cpp --
extern double maximum;
extern intminimum;

void main (int, char**)
{
  // Assume you are on a machine where the sizeof (int) is 4 bytes
  // and the sizeof (double) is 8 bytes.
  assert (minimum == -7);
  {
maximum = 2342343242343.3;
  }
  assert (minimum == -7);

  return 0;
}
-

The main.o will perfectly link with the library although main.o needs a
double 
variable named maximum and the lib only offers an int variable named
maximum. 
Because the symbol name does in no way reflect the variable type,
everything 
links fine but in fact the variable named "minimum" gets scrambled in
this 
example because "maximum" is accessed as if it is a double variable
thus 
overwriting 4 additional bytes (in this case the 4 bytes of the
variable 
minimum). The assertion will show that.

I tested that on Windows with Visual C++ as well and there main.obj
doesn't 
link because the variable type is part of the symbol name and everthing
is 
fine.

I think it would be very very important for the binary interface to
have that 
feature as well.

Regards,

Wolfgang Roemer


Re: Bug or feature: symbol names of global/extern variables

2005-10-06 Thread Jan Beulich
>>> Wolfgang Roemer <[EMAIL PROTECTED]> 06.10.05 16:02:37 >>>
>On Thu Oct 06, 2005 15:54, Michael Veksler wrote:
>[..]
>>>  2. I think that it will break C. As I remember, it is sometimes
>>>  legal in C (or in some dialects of C) to have conflicting
types.
>>>  You may define in one translation unit:
>>> char var[5];
>>>  and then go on and define in a different translation unit:
>>> char var[10];
>>>  The linker will merge both declarations and allocate at least
>>> 10 bytes for 'var' (ld's --warn-common will detect this).
>
>that is interesting: If the linker would behave that way, I wouldn't
get the 
>error because the needed 8 bytes for a double would be allocated.

I think your example had the variables initialized; this is a
difference. The linker does behave as Michael described for
uninitialized (aka common) variables; gcc's -fno-common suppresses this
behavior.

Jan


improved ia64 atomic ops

2005-11-21 Thread Jan Beulich
Richard,

in the context of internal discussions regarding target/24757 I have
been made aware of a change to the sync operations on ia64, and I have
problems understanding

>This differs from the generic code in that we know about the
zero-extending
>properties of cmpxchg, and the zero-extending requirements of
ar.ccv.  We
>also know that ld.acq+cmpxchg.rel equals a full barrier.

Namely, while ld.acq is guaranteed to execute/become visible before the
cmpxchg.rel, it is not guaranteed to execute after any preceding ld, st,
or st.rel; similarly the cmpxchg.rel, while definitely getting executed
after the ld.acq, there's no guarantee that there'll be no subsequent
operations (ld, ld.acq, st) executed prior to it. Thus the previous
behavior (where a fence was the first thing, and because of that plus
the use of cmpxchg.acq there was full separation backward and forward)
isn't being retained.

I think that only the inverse (initial .rel and final .acq) can be
viewed as a replacement for a full fence, with the caveat that these two
then aren't ordered wrt one another. But since there's no ld.rel, an
initial fence is unavoidable.

What am I missing?

Jan


g++.dg/ext/packed3.C

2005-12-06 Thread Jan Beulich
This test contains three invocations of Ref(), but only two of them are
considered ill. What I'd like to get an explanation for is why the third
(middle) instance is considered correct. After all, the u member of
Packed is packed, and hence all the members of Unpacked in that context
are, too. Namely, even if the object referenced by p is properly
aligned, p.u isn't and hence p.u.i isn't either.

I'm asking this because for *-*-netware*, which uses packed structures
by default, this test fails with an error message like the ones expected
on the other two calls to Ref().

Thanks, Jan


testsuite issue

2005-12-06 Thread Jan Beulich
>2005-12-01  Hans-Peter Nilsson  <[EMAIL PROTECTED]>
>
>   * gcc.dg/20041106-1.c, gcc.dg/20030321-1.c, gcc.dg/pr17112-1.c,
>   gcc.dg/pr17112-1.c, g++.dg/other/packed1.C,
>   g++.dg/other/crash-4.C, g++.dg/ext/packed8.C: Match "attribute
>   ignored" warnings when "packing" is the same as the ABI layout.

While most of these changes appear to be correct, I see a regression on
*-*-netware* (which by default uses packed structures) in
gcc.dg/20030321-1.c, and I believe the warning tested for cannot be
expected (since the code generating the warning tests

TYPE_ALIGN (TREE_TYPE (*node)) <= BITS_PER_UNIT

which cannot reasonably be expected to be true for 'long long'.

Can this part of the patch therefore be reverted?

Thanks, Jan


Re: testsuite issue

2005-12-07 Thread Jan Beulich
>> While most of these changes appear to be correct, I see a regression
on
>> *-*-netware* (which by default uses packed structures) in
>> gcc.dg/20030321-1.c, and I believe the warning tested for cannot be
>> expected (since the code generating the warning tests
>> 
>> TYPE_ALIGN (TREE_TYPE (*node)) <= BITS_PER_UNIT
>> 
>> which cannot reasonably be expected to be true for 'long long'.
>
>Why not?  If you don't get a warning for the attribute being
>ignored then, your target doesn't really pack structures; maybe
>we need to split default_packed into variants.

What has the alignment of type 'long long' to do with structure
packing? Structure packing is exactly to override the alignment of the
type. And in the given situation, alignment of 'long long' is at least
32 (bits), whereas BITS_PER_UNIT is 8, and hence the attribute is really
not ignored (because the alignment gets overridden in order to pack the
field), hence the expectation of a warning here seems wrong. If not, can
you explain what it is that makes you expect a warning here?

>> Can this part of the patch therefore be reverted?
>
>Uh, no.  It works as intended and removing it would cause a
>regression.

Not for *-*-netware*, where the change by itself represents a
regression as the test previously didn't fail.

Jan


Re: testsuite issue

2005-12-07 Thread Jan Beulich
>> While most of these changes appear to be correct, I see a regression
on
>> *-*-netware* (which by default uses packed structures) in
>> gcc.dg/20030321-1.c, and I believe the warning tested for cannot be
>> expected (since the code generating the warning tests
>> 
>> TYPE_ALIGN (TREE_TYPE (*node)) <= BITS_PER_UNIT
>> 
>> which cannot reasonably be expected to be true for 'long long'.
>
>Why not?  If you don't get a warning for the attribute being
>ignored then, your target doesn't really pack structures; maybe
>we need to split default_packed into variants.

Actually, thinking more about it, I agree, the warning should be there,
but is missing, because the condition for emitting it is wrong.

>On the other hand, FWIW, I think the warning is generally bogus,
>but I've already mentioned that.

It should not only check TYPE_ALIGN (), but also consider current
structure packing in effect (and consequently, it should also be emitted
e.g. when #pragma pack(1) is in effect on a target that doesn't default
to packed structures).

Jan


Re: g++.dg/ext/packed3.C

2005-12-07 Thread Jan Beulich
>>> Nathan Sidwell <[EMAIL PROTECTED]> 07.12.05 16:58:11 >>>
>Jan Beulich wrote:
>> This test contains three invocations of Ref(), but only two of them
are
>> considered ill. What I'd like to get an explanation for is why the
third
>> (middle) instance is considered correct. After all, the u member of
>> Packed is packed, and hence all the members of Unpacked in that
context
>> are, too. Namely, even if the object referenced by p is properly
>> aligned, p.u isn't and hence p.u.i isn't either.
>> 
>> I'm asking this because for *-*-netware*, which uses packed
structures
>> by default, this test fails with an error message like the ones
expected
>> on the other two calls to Ref().
>
>Although Unpacked is a pod type, if it contained non-static member
functions, 
>those member functions would expect a this pointer that is correctly
aligned. 

And that is precisely the reason why I think binding a reference to the
whole object or any of its members, when the object itself is a member
of a packed object, is illegal, hence requiring a diagnostic (unless,
like for both other cases, the default is to pack structures).

>We have two options
>1) don't pack fields of structure type
>2) don't pack fields of non-pod or non-static member function
containing structs
>
>#2 means the alignment of your field can change, depending on whether
the 
>field's type contains a non-static member or not.  C++ has no term for
such a class.
>
>#1) breaks GNU C compatibility, I think.
>
>I'm not sure what the best answer is here.
>
>If your system packs structs by default, you should not be getting the
warning 
>on any of the uses.

But I do, and if I use a native Linux compiler with -fpack-struct, I
also get it (along with a second one on one of the two other
instances).

Jan


Re: g++.dg/ext/packed3.C

2005-12-07 Thread Jan Beulich
>>> Nathan Sidwell <[EMAIL PROTECTED]> 07.12.05 17:22:10 >>>
>Jan Beulich wrote:
>
>> And that is precisely the reason why I think binding a reference to
the
>> whole object or any of its members, when the object itself is a
member
>> of a packed object, is illegal, hence requiring a diagnostic
(unless,
>> like for both other cases, the default is to pack structures).
>
>Doing that will break
>   struct Foo { void operator=(Foo const &);};
>   struct Baz __attribute__((packed))
>   {
> char c;
> Foo m;
>   }
>
>   void Bar (Baz *ptr)
>   {
> ptr->m = something;
>   }
>This is something we need to make work.

Why? It's broken. You just cannot embed something that requires
alignment into something that doesn't guarantee alignment, except that
for built-in types, the compiler can synthesize the necessary splitting,
but Foo's assignment operator, in your example, may be totally unaware
that it might get called with an unaligned object.

>>>If your system packs structs by default, you should not be getting
the
>> 
>> warning 
>> 
>>>on any of the uses.
>> 
>> 
>> But I do, and if I use a native Linux compiler with -fpack-struct,
I
>> also get it (along with a second one on one of the two other
>> instances).
>
>Then I think you have a bug.

Of course I can enter this in bugzilla, but things like that (in my
experience) will likely never get addressed, so it seems a little
pointless...

Jan


Re: testsuite issue

2005-12-07 Thread Jan Beulich
>Just for the record (in case someone else has the same thoughts)
>and because I'd already written most of the reply, I also
>replied to your first email.  See last for your follow-up.
>
>> What has the alignment of type 'long long' to do with structure
>> packing?
>
>Only the obvious(?): if long long is usually padded in a
>structure, then marking it with attribute-packed has effect.
>IIUC for netware, you pack everything in structures *except*
>long long, which you pad to 32-bit alignment.  Right?  If not,
>then the warning emission machinery is wrong.

For NetWare, everything gets packed, including 'long long'. That's why
I stumbled across the problem, as the test case (actually there's a
second, similar one) failed because of broken warning emission.

Thanks, Jan


build breakage due to r108059

2005-12-09 Thread Jan Beulich
Paolo,

>toplevel:
>2005-12-05  Paolo Bonzini  <[EMAIL PROTECTED]>
>
>   * configure.in (CONFIGURED_BISON, CONFIGURED_YACC,
CONFIGURED_M4,
>   CONFIGURED_FLEX, CONFIGURED_LEX, CONFIGURED_MAKEINFO): Remove
>   "CONFIGURED_" from the AC_CHECK_PROGS invocation.  Move below.
>   Find in-tree tools if available.
>   (EXPECT, RUNTEST, LIPO, STRIP): Find them and substitute them.
>   (CONFIGURED_*_FOR_TARGET): Don't set nor substitute.
>   (*_FOR_TARGET): Set them with GCC_TARGET_TOOL.
>   (COMPILER_*_FOR_TARGET): New.
>...
>config:
>2005-12-05  Paolo Bonzini  <[EMAIL PROTECTED]>
>
>* acx.m4 (GCC_TARGET_TOOL): New.

the way GCC_TARGET_TOOL gets evaluated breaks builds in a tree without
binutils sources, but with (some) binaries put into a binutils
subdirectory of the output tree. Many years ago, I had been advised that
putting ar, ranlib, and nm binaries in a binutils subdirectory would
allow gcc's configure to find and use them, and in fact gcc is still
looking there. However, since AR_FOR_TARGET gets passed on the make
command line, the attempt to set AR_FOR_TARGET in gcc/Makefile is simply
ignored.

In the given case, building cross tools for i686-novell-netware on
i686-pc-linux-gnu, I don't have i686-novell-netware-ar (and similar
other tools; I entirely dislike this naming scheme), but I do have
(which doesn't seem to matter at all, and never did)
/usr/local/i686-novell-netware/bin/ar etc, which (prior to running
configure) I create links to in $(objdir)/binutils/.

I would suppose that GCC_TARGET_TOOL should not only check whether the
directory of $4 is among $(configdirs), but also if $4 itself
pre-exists.

Thanks, Jan


Re: g++.dg/ext/packed3.C

2005-12-12 Thread Jan Beulich
>It can be made to work by not packing Baz::m, and that is what g++ does
(with a 
>warning%).  Issuing an error in this case I don't think is acceptable
-- I know 
>of users who would complain.  If the user explicitly packed Baz::m
field, rather 
>than the containing structure, I would be happy with a diagnostic.

I don't think this is the case. The questionable code (from the test
case) really is

struct Unpacked { int i; };
struct  __attribute__ ((packed)) Packed
{
  char c;
  int i;
  Unpacked u;
};

and the test expects that you cannot bind Packed::u to Unpacked& (error
expected), but that you can bind Packed::u::i to int& (not even a
warning expected). No warning is expected on the definition of Packed's
u member.

>> (In my idea world, ptr->m has type "packed Foo" in this case, and
it's
>> not permissible to binding a "packed Foo" to a "Foo const&", so
this
>> would be invalid, but I could live with undefined.)
>
>We need to distinguish the meanings of placing the packed attribute on
the 
>structure and on the field itself.  I agree with you that when the
attribute is 
>on the field itself, the type should be 'packed Foo' and unbindable. 
When the 
>attribute is on the whole struct, I'm not so sure.

Yes, except that 'packed' on the containing structure doesn't really
imply 'packed' on the contained structure, it rather implies 'unaligned'
(unless [silently] overridden by ignoring the containing structure's
'packed' attribute, as you say is happening, but as I'm inclined to say
is not, given the expectations of the test case). Unfortunately gcc
(still) doesn't support an 'unaligned' attribute.

>>% ah, I think that warning should only be given on non-default-packed
arches. 
>>Is this your problem Jan?

Not exactly. My problems are

(a) the above mentioned inconsistency in allowing binding to the whole
structure but not to its members, and

(b) on a default-packed target (or with -fpack-struct) getting reverse
behavior in allowing binding to Packed::u but not allowing binding to
Packed::u::i.

Jan


Re: selection or target tools

2005-12-23 Thread Jan Beulich
Yes, this seems to meet the needs I expressed. Thanks, Jan

>>> Paolo Bonzini <[EMAIL PROTECTED]> 23.12.05 10:10:01 >>>

> One appropriate default for --with-build-tools could be the same as
> the defaults for --program-transform-name.  A default native build
> would use 'as', a default cross build would use '$target-as'.  Most
> people using --program-prefix would probably also pass the same
value
> to --with-build-tools.

So --with-build-tools would be a *prefix* and not a path in which to 
find it?

I have a prototype patch that follows this logic:

1) if not a Canadian cross and the appropriate directory is being built

as a host module, use it.  So combined trees are not affected, of 
course, by the patch.

2) look into the --with-build-tools path, for both a Canadian cross and

a native build.  This defaults to $exec_prefix/$target/bin, so the 
default build tools (used in autoconf tests and by the being-built GCC)

would be, if found, something like 
/usr/local/i686-pc-linux-gnu/bin/{ar,as,ld,...}.  These would be
"naked" 
names, not $target-prefixed, even when building a cross (because they 
are in a directory named after the target).

3) in a native build we try to use the host tool

4) if no host tool is found, we look for a pre-installed tool.  The 
macro NCN_STRICT_CHECK_TARGET_TOOLS imposes $target-prefixed names for

cross builds, while on a native build a "naked" name would be okay as
well.

Step 2 is new.  The previous logic covers step 1, 3, 4.

I believe that this is covering Jan's use case.  If it covers Dan's as

well, I'm going to post the patch in a few hours or else in the new
year 
(in the meanwhile, there are other toplevel patches to review, i.e. 
parts 2/4 and 3/4)...

We also have the problem of backwards compatibility.  I can try to work

out a patch to implement --with-build-tools in 4.0 and 4.1 (with the
new 
logic kicking in only if the option is specified, of course).

Paolo


Re: Broken check rejecting -fcf-protection and -mindirect-branch=thunk-extern

2020-04-28 Thread Jan Beulich
On 28.04.2020 17:00, H.J. Lu wrote:
> On Tue, Apr 28, 2020 at 6:41 AM Andrew Cooper  
> wrote:
>>
>> On 28/04/2020 14:00, H.J. Lu wrote:
>>> On Tue, Apr 28, 2020 at 5:43 AM Andrew Cooper  
>>> wrote:
 Hello,

 I raised https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93654 but it has
 had nothing but tumbleweeds in months, and it is continuing to cause
 problems for Xen.

 During the Spectre embargo period, it was specifically identified that
 kernels would need to be able to compile one single binary, which was
 retpoline safe on older hardware, and able to use CET on newer hardware.

 thunk-extern was deliberately constructed (along with
 -mindirect-branch-register) such that the thunk could be turned into
 something which wasn't a ROP gadget when hardware was less broken.  Both
 Linux and Xen use this, with the ability to substitute the exact thunk
 in use to be suitable for the CPU booted on.  (In particular, AMD
 recommend `lfence; jmp *%reg` over the traditional retpoline thunk.)


 A consequence of GCC rejecting this combination is that Linux has
 unilaterally disabled -fcf-protection

 # ensure -fcf-protection is disabled when using retpoline as it is
 # incompatible with -mindirect-branch=thunk-extern
 ifdef CONFIG_RETPOLINE
 KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none)
 endif

 and a change similar to this is being proposed for Xen.  However, doing
 this will leave distros with the choice between disabling retpoline or
 not using CET, which is not in the best security interest of the user.

 Please can the original change be partially reverted?  thunk-extern
 means "I'm providing the thunks, and I'll take care of ensuring that
 they are appropriate", and that includes not being a ROP gadget when CET
 is active.

>>> Please DO disable -fcf-protection in the kernel build.  We are enabling
>>> CET for the user space first.   The kernel CET will be the next.
>>>
>>> I am enclosing a proposal to make -fcf-protection compatible with retpoline.
>>> It targets user space.  It can be made compatible with kernel.
>>
>> Its fine to focus on userspace first, but the kernel is far more simple.
>>
>> Looking at that presentation, the only thing missing for kernel is the
>> notrack thunks, in the unlikely case that such code would be tolerated
>> (Frankly, I don't expect Xen or Linux to run with notrack enabled, as
>> there is no legacy code to be concerned with).
>>
>> What is going to happen about unbreaking this combination of options?
>> How will we know when kernel mode is supported (not that I can see
>> anything further required from the toolchain)?  I really hope you're not
> 
> My proposal requires assembler, linker and compiler changes.
> 
>> suggesting that we'll need to use something separate such as
>> -fcf-protection=magic-kernel-mode when plain -fcf-protection would do.
> 
> -mcmodel=kernel should be sufficient.  If
> 
> -mcmodel=kernel -fcf-protection -mindirect-branch=thunk-extern
> 
> works, your toolchain has implemented my proposal.

But please note that Xen doesn't get built with -mcmodel=kernel, so
the two remaining options ought to work together also without this
one.

Jan


Re: New x86-64 micro-architecture levels

2020-07-13 Thread Jan Beulich
On 13.07.2020 09:40, Florian Weimer wrote:
> * Richard Biener:
>>> 2. I have a library with AVX2 and FMA, which directory should it go?
>>
>> Eventually GCC/gas can annotate objects with the lowest architecture
>> level that is applicable?
> 
> H.J. has patches for ELF program properties.  I think
> GNU_PROPERTY_X86_ISA_1_NEEDED would convey this information.  This
> proposal and the glibc patches are independent of that.

>From (partly just halfway) recent discussions with H.J. I gained
the understanding that the piece we're aiming at getting to work
properly is the recording of GNU_PROPERTY_X86_FEATURE_2_*, not
so much GNU_PROPERTY_X86_ISA_1_*. If the ISA one is to be used as
a basis here, a lot of new flags will need adding (and properly
setting) first, I think.

Jan


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Jan Beulich
On 21.07.2020 20:04, Florian Weimer wrote:
> * Premachandra Mallappa:
> 
>> [AMD Public Use]
>>
>> Hi Floarian,
>>
>>> I'm including a proposal for the levels below.  I use single letters for 
>>> them, but I expect that the concrete implementation of this proposal will 
>>> use 
>>> names like “x86-100”, “x86-101”, like in the glibc patch referenced above.  
>>> (But we can discuss other approaches.)
>>
>> Personally I am not a big fan of this, for 2 reasons 
>> 1. uses just x86 in name on x86_64 as well
> 
> That's deliberate, so that we can use the same x86-* names for 32-bit
> library selection (once we define matching micro-architecture levels
> there).

While indeed I did understand it to be deliberate, in the light of
64-bit only ISA extensions (like AMX, and I suspect we're going to
see more) I nevertheless think Premachandra has a point here.

Jan


Re: New x86-64 micro-architecture levels

2020-07-22 Thread Jan Beulich
On 22.07.2020 12:34, Florian Weimer wrote:
> The remaining issue is the - vs _ issue.  I think GCC currently uses
> “x86-64” in places that are not part of identifiers or target triplets.
> Richard mentioned “x86_64-” as a potential choice.  Would it be too
> awkward to have ”-march=x86_64-…”?

Personally I'm advocating for avoiding underscores whenever dashes
can also be used, and whenever they're not needed to distinguish
themselves from dashes (like in target triplets). But this doesn't
make their use "awkward" here of course - it's just my personal
view on it. And maybe, despite the main was sent _to_ just me, it
was really me you meant to ask ...

Jan


configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc

2015-07-14 Thread Jan Beulich
Alan, gcc maintainers,

I was quite surprised for my gcc 4.9.3 build (using binutils 2.25 instead
of 2.24 as I had in use with 4.9.2) to fail in rather obscure ways. Quite
a bit of digging resulted in me finding that gcc/configure.ac looks for
configure.in in a number of binutils subtrees. Globally replacing
configure.in by configure.[ai][cn] appears to address this, but I'm not
sure whether that would be an acceptable change (there doesn't seem
to be a fix for this in gcc trunk either, which I originally expected I could
simply backport).

Thoughts?

Thanks, Jan



Re: configure.{in -> ac} rename (commit 35eafcc71b) broke in-tree binutils building of gcc

2015-07-14 Thread Jan Beulich
>>> On 15.07.15 at 03:20,  wrote:
> On Tue, Jul 14, 2015 at 10:13:06AM +0100, Jan Beulich wrote:
>> (there doesn't seem
>> to be a fix for this in gcc trunk either, which I originally expected I could
>> simply backport).
> 
> The configure.in->configure.ac rename happened over a year ago so I
> guess this shows that not too many people use combined binutils+gcc
> builds nowadays.  I've always found combined binutils+gcc builds not
> worth the bother compared to simply building and installing binutils
> first, as Jim suggests.

That doesn't work well when you want to specifically avoid
installing, instead running directly from the build tree.

Jan



Re: PATCH: Also check configure.ac in binutils source tree

2015-07-15 Thread Jan Beulich
>>> On 15.07.15 at 17:18,  wrote:
> Here is a patch.  Tested on Linux/x86-64 with native GCC as well as
> cross-toolchain.
> Any comments, feedbacks, objections?

Comparing to the patch I sent earlier today I miss an adjustment
to gcc/acinclude.m4, which also has one bad use.

Jan



common subexpression elimination no longer working for asm()?

2014-10-22 Thread Jan Beulich
I noticed the issue with 4.9.1 (in that x86 Linux'es
this_cpu_read_stable() no longer does what the comment preceding
its definition promises), and the example below demonstrates this in
a simplified (but contrived) way. I just now verified that trunk has
the same issue; 4.8.3 still folds redundant ones as expected. Is this
known, or possibly even intended (in which case I'd be curious as to
what the reasons are, and how the functionality Linux wants can be
gained back)?

Thanks, Jan

void dummy(int, int);
extern int m, p;

static inline int read_m(void) {
int i;

asm("nop %1" : "=r" (i) : "m" (m));
return i;
}

static inline int read_p(void) {
int i;

asm("nop %P1" : "=r" (i) : "p" (&p));
return i;
}

void test(void) {
dummy(read_m(), read_m());
dummy(read_p(), read_p());
dummy(read_m(), read_m());
dummy(read_p(), read_p());
}




Re: common subexpression elimination no longer working for asm()?

2014-10-23 Thread Jan Beulich
>>> On 23.10.14 at 13:32,  wrote:
> On Wed, Oct 22, 2014 at 5:28 PM, Jan Beulich  wrote:
>> I noticed the issue with 4.9.1 (in that x86 Linux'es
>> this_cpu_read_stable() no longer does what the comment preceding
>> its definition promises), and the example below demonstrates this in
>> a simplified (but contrived) way. I just now verified that trunk has
>> the same issue; 4.8.3 still folds redundant ones as expected. Is this
>> known, or possibly even intended (in which case I'd be curious as to
>> what the reasons are, and how the functionality Linux wants can be
>> gained back)?
> 
> For 4.8 the CSE happened at the RTL level.  On the GIMPLE level
> we inline too early to CSE based on the fact the functions are pure.

Inlining doesn't really matter here according to my tests: Two
side by side identical asm()s without other side effects don't get
folded either.

In any event - is there anything that can be done about this?

> Note that dummy() may change m and p, so what 4.8 did was bogus:
> 
> #APP
> # 7 "t.c" 1
> nop m(%rip)
> # 0 "" 2
> #NO_APP
> movl%edi, %esi
> #APP
> # 14 "t.c" 1
> nop p
> # 0 "" 2
> #NO_APP
> calldummy
> movl%ebx, %esi
> movl%ebx, %edi
> calldummy

It wasn't bogus at all - by telling the compiler that the input to the
respective asm() is just the address of an object, the programmer
takes the responsibility that the pointed to object is not going to
change (or it changing doesn't matter). It's the whole purpose of
the use of the similar construct in Linux to allow the compiler to
eliminate multiple instances _across_ function calls.

Jan



Re: common subexpression elimination no longer working for asm()?

2014-10-24 Thread Jan Beulich
>>> On 23.10.14 at 15:42,  wrote:
> On Wed, Oct 22, 2014 at 04:28:52PM +0100, Jan Beulich wrote:
>> I noticed the issue with 4.9.1 (in that x86 Linux'es
>> this_cpu_read_stable() no longer does what the comment preceding
>> its definition promises), and the example below demonstrates this in
>> a simplified (but contrived) way. I just now verified that trunk has
>> the same issue; 4.8.3 still folds redundant ones as expected. Is this
>> known, or possibly even intended (in which case I'd be curious as to
>> what the reasons are, and how the functionality Linux wants can be
>> gained back)?
> 
> This changed because of my http://gcc.gnu.org/PR60663 fix.
> In your testcase the inline asm doesn't have more than one output
> (which IMNSHO is very much desirable not to CSE), and doesn't have explicit
> clobbers either, but happens to have implicit clobbers (fprs and cc),
> so CSE still could generate invalid code out of that without the fix
> (if it decided to materialize the inline asm somewhere, instead of reusing
> existing inline asm).
> So, if we e.g. weakened the PR60663 fix so that it only bails out
> if the inline asm contains more than one output. we'd need to fix up CSE, so
> that it analyzes all the clobbers and doesn't consider asms as equivalent
> just based on the ASM_OPERANDS, it needs to have the same clobbers too,
> and either doesn't try to materialize it out without preexisting insn
> if it has any clobbers.

So why would clobbers in general matter? I can see memory clobbers
to need special care, but any others? If two asm()-s only differ in the
registers they clobber, surely this is (1) a programmer mistake and
(2) irrelevant which of the two forms are to be picked. I first thought
hard register variables could matter here, but looking at the (x86)
code generated (at -O2) for

int test1(int x) {
register int y asm("edx");
int z = y;

asm("" ::: "edx");
return z + y + x;
}

register int y asm("ebx");

int test2(int x) {
int z = y;

asm("" ::: "ebx");
return z + y + x;
}

shows that the clobbers don't have the theoretically possible effect
of forcing y to be re-evaluated after the asm()-s (i.e. both cases
get translated as "return z * 2 + x").

Jan



Re: common subexpression elimination no longer working for asm()?

2014-10-24 Thread Jan Beulich
>>> On 24.10.14 at 11:11,  wrote:
> On Fri, Oct 24, 2014 at 10:01:52AM +0100, Jan Beulich wrote:
>> > This changed because of my http://gcc.gnu.org/PR60663 fix.
>> > In your testcase the inline asm doesn't have more than one output
>> > (which IMNSHO is very much desirable not to CSE), and doesn't have explicit
>> > clobbers either, but happens to have implicit clobbers (fprs and cc),
>> > so CSE still could generate invalid code out of that without the fix
>> > (if it decided to materialize the inline asm somewhere, instead of reusing
>> > existing inline asm).
>> > So, if we e.g. weakened the PR60663 fix so that it only bails out
>> > if the inline asm contains more than one output. we'd need to fix up CSE, 
>> > so
>> > that it analyzes all the clobbers and doesn't consider asms as equivalent
>> > just based on the ASM_OPERANDS, it needs to have the same clobbers too,
>> > and either doesn't try to materialize it out without preexisting insn
>> > if it has any clobbers.
>> 
>> So why would clobbers in general matter? I can see memory clobbers
>> to need special care, but any others? If two asm()-s only differ in the
> 
> Please start by looking at the PR the change fixed.

This is what I have been doing. But your previous reply got me into
some trouble parsing what you wrote (likely because I'm only
occasionally looking into compiler details)...

> There CSE decided (ok, with the help of not very smart costs, but as the
> testcase shows, it clearly can happen) to rematerialize the asm in a place
> where the asm wasn't originally at all.  At that point it just inserted the
> single ASM_OPERANDS, without anything else, leaving the other ASM_OPERANDS
> (the testcase had asm with two outputs) and in theory anything else (like
> clobbers) out.  Leaving the clobbers out completely is definitely not
> desirable.
> 
> IMHO we should never CSE together asm with different clobbers, GCC
> intentionally does not try to think what exactly the asm pattern does,
> it is a black box, and if the programmer decides to use one set of clobbers
> in one case and a different in another case, he might have a reason for
> that.

Aren't these two completely different things? One being to never fold
asm()-s with different operands (with it being open whether clobbers
would count here), and the other to make sure individual pieces of a
parallel would end up in the table? I.e. by relaxing your original fix and
adding code to compare clobbers too we'd deal with the first case, but
I can't see what would prevent such a parallel to be broken up when
there's just one output, but an arbitrary number of clobbers. The only
alternative to this not being the case that I can see would be if the
parallel as a whole got entered into the table, but if that was the case,
why wouldn't the example I provided be properly CSE'd without any
change?

Apart from that I think it's high time for x86 to have a way to allow
the programmer to suppress adding the two default clobbers. I
drafted a respective (seemingly pretty non-intrusive) change, which
seems to work fine. Would that be acceptable as a second means to
at least partially gain back what we had before (in this case of course
requiring the programmer to adjust the asm()-s or, if they're all safe,
pass a new command line option)?

Jan



aarch64 asm operand checking

2015-01-28 Thread Jan Beulich
Hello,

in the Xen project we had (meanwhile fixed) code like this (meant to
be uniform between 32- and 64-bit):

static inline int fls(unsigned int x) {
int ret;
asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
return BITS_PER_LONG - ret;
}

Being mainly an x86 person, when I first saw this I didn't understand
how this could be correct, as for aarch64 BITS_PER_LONG is 64, and
both operands being 32-bit I expected "clz w, w" to result.
Yet I had to learn that no matter what size the C operands, x
registers are always being picked. Which still doesn't mean the above
is correct - a suitable call chain can leave a previous operation's
64-bit result unconverted, making the above produce a supposedly
impossible result greater than 32.

Therefore I wonder whether aarch64_print_operand() shouldn't,
when neither the 'x' not the 'w' modifier is given, either - like
ix86_print_operand() (via print_reg()) - honor
GET_MODE_SIZE (GET_MODE (x)), or at the very least warn
when that one is more narrow than 64 bits. And yes, I realize that
this isn't going to be optimal (and could even be considered
inconsistent) as there's no way to express the low half word or
byte of a general register, i.e. operands more narrow than 32 bits
couldn't be fully checked without also knowing/evaluating the
instruction suffix, e.g. by introducing a 'z' operand modifier like
x86 has, or extending the existing 'e' one.

Jan



Re: aarch64 asm operand checking

2015-01-29 Thread Jan Beulich
>>> On 29.01.15 at 09:20,  wrote:
> On Wed, Jan 28, 2015 at 11:54 PM, Jan Beulich  wrote:
>> Hello,
>>
>> in the Xen project we had (meanwhile fixed) code like this (meant to
>> be uniform between 32- and 64-bit):
>>
>> static inline int fls(unsigned int x) {
>> int ret;
>> asm("clz\t%0, %1" : "=r" (ret) : "r" (x));
>> return BITS_PER_LONG - ret;
>> }
> 
> You want:
> asm("clz\t%w0, %w1" : "=r" (ret) : "r" (x));

I understand that - as said, we fixed the issue already.

>> Being mainly an x86 person, when I first saw this I didn't understand
>> how this could be correct, as for aarch64 BITS_PER_LONG is 64, and
>> both operands being 32-bit I expected "clz w, w" to result.
>> Yet I had to learn that no matter what size the C operands, x
>> registers are always being picked. Which still doesn't mean the above
>> is correct - a suitable call chain can leave a previous operation's
>> 64-bit result unconverted, making the above produce a supposedly
>> impossible result greater than 32.
> 
> That is because the full register is xN but you want only the 32bit part of 
> it.
> It is the same issue as on x86_64 where you want the lower 32bit part
> of it that is eax vs rax.

No, it's not: An "unsigned int" asm() operand will result in %eax to
be used (in the absence of any modifiers), while an "unsigned long"
one will produce %rax.

>> Therefore I wonder whether aarch64_print_operand() shouldn't,
>> when neither the 'x' not the 'w' modifier is given, either - like
>> ix86_print_operand() (via print_reg()) - honor
>> GET_MODE_SIZE (GET_MODE (x)), or at the very least warn
>> when that one is more narrow than 64 bits. And yes, I realize that
>> this isn't going to be optimal (and could even be considered
>> inconsistent) as there's no way to express the low half word or
>> byte of a general register, i.e. operands more narrow than 32 bits
>> couldn't be fully checked without also knowing/evaluating the
>> instruction suffix, e.g. by introducing a 'z' operand modifier like
>> x86 has, or extending the existing 'e' one.
> 
> No because sometimes you want to use the full register size as not all
> places where you use a register allows for wN (memory locations for
> one).

Above I was specifically talking about register operands only. If
instead you meant the register used inside a memory reference,
that's true for the base address register (but not the second one
usable as offset, where again the C operand type could control
the one chosen), yet the respective operand should never be of
a 32-bit type (but instead ought to be pointer size).

Jan



Re: [x86-64-psABI] RFC: Add R_X86_64_RELAX_PC32 and R_X86_64_RELAX_PLT32

2015-05-12 Thread Jan Beulich
>>> On 12.05.15 at 20:42,  wrote:
> Here is the updated proposal.  I changed nop prefix from 0x48
> to 0x67 and clarified how foo@GOTPCREL(%rip) should be
> resolved.

Mind clarifying how 67 is better than 48?

> I am proposing to add 2 new relocations, R_X86_64_RELAX_PC32 and
> R_X86_64_RELAX_PLT32:
> 
> 1. They can only be used on 32-bit direct call/jmp instructions.
> 2. call/jmp instructions must have a 0x67 prefix, which is the address
> size prefix and is ignored by 32-bit direct call/jmp instructions.

The same could have been said several years ago about segment
overrides used with conditional branches, yet they obtained a
meaning (even if only affecting performance, not correctness). Is
it anywhere publicly stated that the address size override will
continue to be ignored?

Jan



AVX512 woes

2018-09-20 Thread Jan Beulich
Kirill, others,

in the course of putting together test harness extensions for AVX512
additions to the Xen hypervisor's built-in instruction emulator I've
come across a number of issues. Since it may easily be that I'm
simply not knowing the full background, rather than adding bugzilla
entries for all of them I thought I'd inquire first:

1) An initial idea of mine was to use -ffixed-* to force the use of the
   high 16 {x,y,z}mm registers (effectively by disallowing the use of
   the lower ones), such that I'd easily get EVEX encoded insns for
   whatever is possible to be EVEX-encoded with the given -mavx512*
   option(s). This doesn't even come close to working - all sorts of
   internal compiler errors result for other than the most trivial
   examples, most notably with AVX512VL support enabled. I can't
   observe similar bad effects from using -ffixed-* for other register
   sub-groups. I realize the interactions between the various insns
   the *.md files provide may be difficult to sort out, and perhaps
   the root cause is the same as that of bug 87354, but is this really
   something that's not supposed to work?

2) There looks to be quite wide a mixup of Yk and k constraints on
   insns. Most instructions having mask register outputs can very
   well use %k0, yet they're commonly using "=Yk". Exceptions are
   scatter/gather insns only, afaict. And of course insns using
   destination field masking have to use "Yk" inputs. Is there
   anything I'm overlooking here that prevents "=k" to be used as
   outlined?

2b) Both k and Yk are marked @internal in constraints.md, suggesting
(to me) that I'm not supposed to use these constraints in inline
asm() constructs. If that implication of mine is correct, how would
I express respective constraints?

3) Certain AVX512_VBMI2, AVX512_BITALG, and GFNI+AVX512F inline
   functions are unavailable without AVX512BW also enabled (other than
   implied by SDM, XED, and binutils/gas, and other than for AVX512_VBMI).
   I can see why, without the SDM suggesting so, VBMI implies BW, but
   if this is done, other ISA extensions imo should also enable BW if need
   be, rather than hiding part of their inline/builtin helpers. Or the
   opposite position should be taken and no such implications should be
   made at all - aiui they're there solely for mask register size
   considerations, yet the respective insns could be used without
   masking, in which case no direct dependency on BW exists.

4) Even in very obvious situations there does not appear to be any
   use of embedded broadcasting. Is this something that's planned,
   or something I can only possibly make use of using inline assembly?

5) The VPTERNLOG* instructions look to be heavily underutilized. Not
   only do I observe strange VPTERNLOG*/VMODQA* (and alike) pairs,
   where the latter uses zeroing-masking just to produce a mix of
   all-zeroes and all-ones vector elements, when this same effect
   could have been achieved by using zeroing-masking on the
   VPTERNLOG* right away. Afaict the instructions can even be used
   for any up to 3-way logical (bit-wise boolean) operation for which
   no specific insn exists (with a suitably calculated immediate), yet
   even a simple ~ gets carried out by VPXOR-ing with a vector of all
   ones.

Thanks, Jan




Re: [PATCH] x86: fix CVT{,T}PD2PI insns

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 11:03,  wrote:
> With just an "m" constraint misaligned memory operands won't be forced
> into a register, and hence cause #GP. So far this was guaranteed only
> in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
> x86-64 only).
> 
> Instead of switching the second alternative to Bm, use just m on the
> first and replace nonimmediate_operand by vector_operand.

While doing this and the others where I'm also replacing Bm by uses of
vector_operand, I've started wondering whether Bm couldn't (and then
shouldn't) be dropped altogether, replacing it everywhere by "m"
combined with vector_operand (or vector_memory_operand when
register operands aren't allowed anyway).

Furthermore there's an issue with Bm and vector_memory_operand:
Whether alignment gets enforced depends on TARGET_AVX. However,
just like the two insns in question here, the SHA ones too don't have
VEX-encoded equivalents, and hence require alignment enforced even
with -mavx. Together with the above I wonder whether Bm shouldn't
be re-purposed to express this special requirement.

Jan




Re: [PATCH] x86: fix CVT{,T}PD2PI insns

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 12:22,  wrote:
> On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich  wrote:
>>
>> >>> On 27.06.19 at 11:03,  wrote:
>> > With just an "m" constraint misaligned memory operands won't be forced
>> > into a register, and hence cause #GP. So far this was guaranteed only
>> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
>> > x86-64 only).
>> >
>> > Instead of switching the second alternative to Bm, use just m on the
>> > first and replace nonimmediate_operand by vector_operand.
>>
>> While doing this and the others where I'm also replacing Bm by uses of
>> vector_operand, I've started wondering whether Bm couldn't (and then
>> shouldn't) be dropped altogether, replacing it everywhere by "m"
>> combined with vector_operand (or vector_memory_operand when
>> register operands aren't allowed anyway).
> 
> No. Register allocator will propagate unaligned memory in non-AVX
> case, which is not allowed with vector_operand.

I'm afraid I don't understand: Unaligned SIMD memory accesses will
generally fault in non-AVX mode, so such propagation would seem
wrong to me and hence would seem to be correctly not allowed.
Furthermore both vector_operand and Bm resolve to the same
vector_memory_operand. The TARGET_AVX check actually is inside
vector_memory_operand, i.e. affects both the same way.

Jan




Re: [PATCH] x86: fix CVT{,T}PD2PI insns

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 13:02,  wrote:
> On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich  wrote:
>>
>> >>> On 27.06.19 at 12:22,  wrote:
>> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich  wrote:
>> >>
>> >> >>> On 27.06.19 at 11:03,  wrote:
>> >> > With just an "m" constraint misaligned memory operands won't be forced
>> >> > into a register, and hence cause #GP. So far this was guaranteed only
>> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
>> >> > x86-64 only).
>> >> >
>> >> > Instead of switching the second alternative to Bm, use just m on the
>> >> > first and replace nonimmediate_operand by vector_operand.
>> >>
>> >> While doing this and the others where I'm also replacing Bm by uses of
>> >> vector_operand, I've started wondering whether Bm couldn't (and then
>> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
>> >> combined with vector_operand (or vector_memory_operand when
>> >> register operands aren't allowed anyway).
>> >
>> > No. Register allocator will propagate unaligned memory in non-AVX
>> > case, which is not allowed with vector_operand.
>>
>> I'm afraid I don't understand: Unaligned SIMD memory accesses will
>> generally fault in non-AVX mode, so such propagation would seem
>> wrong to me and hence would seem to be correctly not allowed.
>> Furthermore both vector_operand and Bm resolve to the same
>> vector_memory_operand. The TARGET_AVX check actually is inside
>> vector_memory_operand, i.e. affects both the same way.
> 
> "Bm" *prevents* propagation of unaligned access for non-AVX targets.
> As said, register allocator does not care for operand predicates (it
> only looks at operand constraints), so it will propagate unaligned
> access with "m" operand. To avoid propagation, "Bm" should and does
> use vector_memory_operand constraint internally.

Okay, I think I got it now (also because of your reply on the other
thread). It means in the patch here I need to retain Bm rather than
dropping it, too, and additionally use it on the other alternative.

Jan




Re: [PATCH] x86: fix CVT{,T}PD2PI insns

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 14:07,  wrote:
> On Thu, Jun 27, 2019 at 1:31 PM Jan Beulich  wrote:
>>
>> >>> On 27.06.19 at 13:02,  wrote:
>> > On Thu, Jun 27, 2019 at 12:47 PM Jan Beulich  wrote:
>> >>
>> >> >>> On 27.06.19 at 12:22,  wrote:
>> >> > On Thu, Jun 27, 2019 at 11:10 AM Jan Beulich  wrote:
>> >> >>
>> >> >> >>> On 27.06.19 at 11:03,  wrote:
>> >> >> > With just an "m" constraint misaligned memory operands won't be 
>> >> >> > forced
>> >> >> > into a register, and hence cause #GP. So far this was guaranteed only
>> >> >> > in the case that CVT{,T}PD2DQ were chosen (which looks to be the 
>> >> >> > case on
>> >> >> > x86-64 only).
>> >> >> >
>> >> >> > Instead of switching the second alternative to Bm, use just m on the
>> >> >> > first and replace nonimmediate_operand by vector_operand.
>> >> >>
>> >> >> While doing this and the others where I'm also replacing Bm by uses of
>> >> >> vector_operand, I've started wondering whether Bm couldn't (and then
>> >> >> shouldn't) be dropped altogether, replacing it everywhere by "m"
>> >> >> combined with vector_operand (or vector_memory_operand when
>> >> >> register operands aren't allowed anyway).
>> >> >
>> >> > No. Register allocator will propagate unaligned memory in non-AVX
>> >> > case, which is not allowed with vector_operand.
>> >>
>> >> I'm afraid I don't understand: Unaligned SIMD memory accesses will
>> >> generally fault in non-AVX mode, so such propagation would seem
>> >> wrong to me and hence would seem to be correctly not allowed.
>> >> Furthermore both vector_operand and Bm resolve to the same
>> >> vector_memory_operand. The TARGET_AVX check actually is inside
>> >> vector_memory_operand, i.e. affects both the same way.
>> >
>> > "Bm" *prevents* propagation of unaligned access for non-AVX targets.
>> > As said, register allocator does not care for operand predicates (it
>> > only looks at operand constraints), so it will propagate unaligned
>> > access with "m" operand. To avoid propagation, "Bm" should and does
>> > use vector_memory_operand constraint internally.
>>
>> Okay, I think I got it now (also because of your reply on the other
>> thread). It means in the patch here I need to retain Bm rather than
>> dropping it, too, and additionally use it on the other alternative.
> 
> The correct solution is a bit more complicated. I don't know if these
> instructions tolerate unaligned operand in non-AVX case.

They don't.

> If they
> don't, then vector_operand should be used and the first alternative
> should be split to avx and non-avx part, where non-avx part uses Bm
> constraint.

Why? Bm takes care to distinguish the AVX and non-AVX cases. That's
how things work elsewhere too, afaict. The bug here really is that the
(non-AVX-only) second alternative didn't also use Bm.

Jan




loading of zeros into {x,y,z}mm registers

2017-11-29 Thread Jan Beulich
Kirill,

in an unrelated context I've stumbled across a change of yours
from Aug 2014 (revision 213847) where you "extend" the ways
of loading zeros into registers. I don't understand why this was
done, and the patch submission mail also doesn't give any reason.
My point is that simple VEX-encoded vxorps/vxorpd/vpxor with
128-bit register operands ought to be sufficient to zero any width
registers, due to the zeroing of the high parts the instructions do.
Hence by using EVEX encoded insns it looks like all you do is grow
the instruction length by one or two bytes (besides making the
source somewhat more complicated to follow). At the very least
the shorter variants should be used for -Os imo.

Thanks for any insight,
Jan



Re: loading of zeros into {x,y,z}mm registers

2017-12-01 Thread Jan Beulich
>>> On 01.12.17 at 06:45,  wrote:
> On 29 Nov 08:59, Jan Beulich wrote:
>> in an unrelated context I've stumbled across a change of yours
>> from Aug 2014 (revision 213847) where you "extend" the ways
>> of loading zeros into registers. I don't understand why this was
>> done, and the patch submission mail also doesn't give any reason.
>> My point is that simple VEX-encoded vxorps/vxorpd/vpxor with
>> 128-bit register operands ought to be sufficient to zero any width
>> registers, due to the zeroing of the high parts the instructions do.
>> Hence by using EVEX encoded insns it looks like all you do is grow
>> the instruction length by one or two bytes (besides making the
>> source somewhat more complicated to follow). At the very least
>> the shorter variants should be used for -Os imo.
> As far as I can recall, this was done since we cannot load zeroes
> into upper 16 MM registers, which are available in EVEX exclusively.

Ah, I did overlook this aspect indeed. I still think the smaller VEX
encoding should then be used for the low 16 registers.

Furthermore this

typedef double __attribute__((vector_size(16))) v2df_t;
typedef double __attribute__((vector_size(32))) v4df_t;

void test1(void) {
register v2df_t x asm("xmm31") = {};
asm volatile("" :: "v" (x));
}

void test2(void) {
register v4df_t x asm("ymm31") = {};
asm volatile("" :: "v" (x));
}

translates to "vxorpd %xmm31, %xmm31, %xmm31" for both
functions with -mavx512vl, yet afaict the instructions would #UD
without AVX-512DQ, which suggests to me that the original
intention wasn't fully met.

Jan



Re: x32 psABI draft version 0.2

2011-02-17 Thread Jan Beulich
>>> On 16.02.11 at 21:04, "H. Peter Anvin"  wrote:
> On 02/16/2011 11:22 AM, H.J. Lu wrote:
>> Hi,
>> 
>> I updated  x32 psABI draft to version 0.2 to change x32 library path
>> from lib32 to libx32 since lib32 is used for ia32 libraries on Debian,
>> Ubuntu and other derivative distributions. The new x32 psABI is
>> available from:
>> 
>> https://sites.google.com/site/x32abi/home 
>> 
> 
> I'm wondering if we should define a section header flag (sh_flags)
> and/or an ELF header flag (e_flags) for x32 for the people unhappy about
> keying it to the ELF class...

Thanks for supporting this!

Besides that I also wonder why all the 64-bit relocations get
marked as LP64-only. It is clear that some of them can be useful
in ILP32 as well, and there's no reason to preclude future uses
even if currently no-one can imagine any.

Furthermore, it seems questionable to continue to require rela
relocations when for all normal ones (leaving aside the 8- and 16-
bit ones) the addend can fit in the relocated field.

Finally, shouldn't R_X86_64_GLOB_DAT and R_X86_64_JUMP_SLOT
also have a field specifier of wordclass rather than word64 (though
'wordclass' by itself would probably be wrong if the tying of the ABI
to the ELF class was eliminated)? And how about R_X86_64_*TP*64
and R_X86_64_TLSDESC?

Jan



Re: x32 psABI draft version 0.2

2011-02-17 Thread Jan Beulich
>>> On 17.02.11 at 16:49, "H.J. Lu"  wrote:
> On Thu, Feb 17, 2011 at 7:44 AM, Jan Hubicka  wrote:
>>> > According to Mozilla folks however REL+RELA scheme used by EABI leads
>>> > to significandly smaller libxul.so size
>>> >
>>> > According to http://glandium.org/blog/?p=1177 the difference is about 
>>> > 4-5MB
>>> > (out of approximately 20-30MB shared lib)
>>>
>>> This is orthogonal to x32 psABI.
>>
>> Understood.  I am just pointing out that x86-64 Mozilla suffers from startup
>> problems (extra 5MB of disk read needed) compared to both x86 and ARM EABI
>> because x86-64 ABI is RELA only. If x86-64 ABI was REL+RELA like EABI is, we
>> would not have this problem here.
>>
> 
> If people want to see REL+RELA in x32, they have to contribute codes.

That's exactly the wrong way round: First the specification has to allow
for (but not require) it, and only then does it make sense to write code.

Jan



Re: x32 psABI draft version 0.2

2011-02-18 Thread Jan Beulich
>>> On 17.02.11 at 18:59, "H.J. Lu"  wrote:
> On Thu, Feb 17, 2011 at 8:11 AM, Jan Beulich  wrote:
>>>>> On 17.02.11 at 16:49, "H.J. Lu"  wrote:
>>> On Thu, Feb 17, 2011 at 7:44 AM, Jan Hubicka  wrote:
>>>>> > According to Mozilla folks however REL+RELA scheme used by EABI leads
>>>>> > to significandly smaller libxul.so size
>>>>> >
>>>>> > According to http://glandium.org/blog/?p=1177 the difference is about 
>>>>> > 4-5MB
>>>>> > (out of approximately 20-30MB shared lib)
>>>>>
>>>>> This is orthogonal to x32 psABI.
>>>>
>>>> Understood.  I am just pointing out that x86-64 Mozilla suffers from 
>>>> startup
>>>> problems (extra 5MB of disk read needed) compared to both x86 and ARM EABI
>>>> because x86-64 ABI is RELA only. If x86-64 ABI was REL+RELA like EABI is, 
>>>> we
>>>> would not have this problem here.
>>>>
>>>
>>> If people want to see REL+RELA in x32, they have to contribute codes.
>>
>> That's exactly the wrong way round: First the specification has to allow
>> for (but not require) it, and only then does it make sense to write code.
>>
> 
> No, it has to be supported at least by static linker and dynamic
> linker. Otherwise, no one can use it.

I'm afraid I have to disagree: ELF (and the psABI) is not specific to
a particular OS, and hence it allowing something doesn't mean the
OS ABI may not restrict it. Hence the psABI first has to at least not
forbid something (as it currently does for REL on x86-64), in order
for an implementation of that something to make sense.

Jan



Re: x32 psABI draft version 0.2

2011-02-18 Thread Jan Beulich
>>> On 18.02.11 at 00:07, Jakub Jelinek  wrote:
> So one way to cut down the size of .rela.dyn section would be a relocation
> like
> R_X86_64_RELATIVE_BLOCK where applying such a relocation with r_offset O and
> r_addend N would be:
> uint64_t *ptr = O;
> for (i = 0; i < N; i++)
>   ptr[i] += bias;
> Then e.g.
> 003ec6d86008  0008 R_X86_64_RELATIVE 
>   003ec5aef3f3
> 003ec6d86010  0008 R_X86_64_RELATIVE 
>   003ec5af92f6
> 003ec6d86018  0008 R_X86_64_RELATIVE 
>   003ec5b06d17
> 003ec6d86020  0008 R_X86_64_RELATIVE 
>   003ec5b1dc5f
> 003ec6d86028  0008 R_X86_64_RELATIVE 
>   003ec5b1edaf
> 003ec6d86030  0008 R_X86_64_RELATIVE 
>   003ec5b27358
> 003ec6d86038  0008 R_X86_64_RELATIVE 
>   003ec5b30f9f
> 003ec6d86040  0008 R_X86_64_RELATIVE 
>   003ec5b3317d
> 003ec6d86048  0008 R_X86_64_RELATIVE 
>   003ec5b34479
> could be represented as:
> 003ec6d86008  00MN R_X86_64_RELATIVE_BLOCK   
>   0009
> I see many hundreds of consecutive R_X86_64_RELATIVE relocs in libxul.so, 
> though
> of course it would need much better analysis over larger body of code.
> 
> In most programs if the library is prelinked all relative relocs are skipped
> and .rela.dyn for them doesn't need to be even paged in, but Mozilla is 
> quite
> special in that it one of the most common security relevant packages and 
> thus
> wants randomization, but is linked against huge libraries, so the question 
> is
> if Mozilla is the right candidate to drive our decisions on.
> 
> Another alternative to compress relative relocations would be an indirect
> relative relocation, which would give you in r_offset address of a block of 
> addresses
> and r_addend the size of that block, and the block would just contain 
> offsets
> on which words need to be += bias.  Then, instead of changing RELA to REL to
> save 8 bytes from 24 you'd save 16 bytes from those 24 (well, for x32 half 
> of that).

For relocations where the relocated field is large enough, considering
chained relocations (as seen in NetWare NLMs) would also be a
possibility, i.e. r_offset specifies just the first relocation that all need
the same addend (and eventual other properties), and the relocated
field holds the r_offset of the next field to be relocated.

Jan



Re: x32 psABI draft version 0.2

2011-02-21 Thread Jan Beulich
>>> On 18.02.11 at 18:53, "H.J. Lu"  wrote:
> How about only allowing REL relocations in executables and DSOes?

That'd be at least part of it, but I'd still prefer not forbidding them
altogether, but also not requiring an implementation to support
them (just to repeat it - in a long abandoned new OS of ours we
had ignored the forbidding, and allowed REL in relocatable objects
[which were the only objects used there, the loadable ones
distinguished from "normal" ones by the presence of some OS-
specific data structures], with the static linker picking the type
depending on the module's needs).

Jan



Re: [x32] Allow R_X86_64_64

2011-08-12 Thread Jan Beulich
>>> On 12.08.11 at 06:37, "H.J. Lu"  wrote:
> On Mon, Aug 1, 2011 at 3:15 PM, H.J. Lu  wrote:
>> Hi,
>>
>> It turns out that x32 needs R_X86_64_64.  One major reason is
>> the displacement range of x32 is -2G to +2G.  It isn't a problem
>> for compiler since only small model is required for x32.
>>
>> However, to address 0 to 4G directly in assembly code, we have
>> to use R_X86_64_64 with movabs.  I am checking the follow patch
>> into x32 psABI to allow R_X86_64_64.
>>
>>
> 
> X32  Linker should treats R_X86_64_64 as R_X86_64_32
> zero-extended to 64bit for output.  I will update x32 psABI with

I'm sorry to say that, but the situation about x32 seems to be
getting worse with each change you do, every time again
revolving around mixing up ABI specification and a particular
implementation thereof.

Here, if you need something zero-extended (though I can't see
why you would), then you should use a new relocation type. As
pointed out before, there are valid possible uses of R_X86_64_64
that would require the semantics of x86-64.

Jan

> ---
> diff --git a/object-files.tex b/object-files.tex
> index 7f0fd14..d1543b5 100644
> --- a/object-files.tex
> +++ b/object-files.tex
> @@ -451,7 +451,7 @@ or \texttt{Elf32_Rel} relocation.
>\multicolumn{1}{c}{Calculation} \\
>\hline
>\texttt{R_X86_64_NONE}  & 0 & none & none \\
> -  \texttt{R_X86_64_64} & 1 & \textit{word64} & \texttt{S + A} \\
> +  \texttt{R_X86_64_64} $^{\dagger\dagger}$ & 1 & \textit{word64}
> & \texttt{S + A} \\
>\texttt{R_X86_64_PC32}  & 2 & \textit{word32} & \texttt{S + A - P} \\
>\texttt{R_X86_64_GOT32} & 3 & \textit{word32} & \texttt{G + A} \\
>\texttt{R_X86_64_PLT32} & 4 & \textit{word32} & \texttt{L + A - P} \\
> @@ -487,6 +487,8 @@ or \texttt{Elf32_Rel} relocation.
>  %  \texttt{R_X86_64_PLT64} & 17 & \textit{word64} & \texttt{L + A - P} \\
>   \cline{1-4}
>  \multicolumn{3}{l}{\small $^\dagger$ This relocation is used only
> for LP64.}\\
> +\multicolumn{3}{l}{\small $^{\dagger\dagger}$ This relocation only
> +appears in relocatable files for X32.}\\
>  \end{tabular}
>\end{center}
>  \Hrule
> 
> 
> I opened:
> 
> http://sourceware.org/bugzilla/show_bug.cgi?id=13082 
> 
> and will fix it.
> 






Re: [x32] Allow R_X86_64_64

2011-08-12 Thread Jan Beulich
>>> On 12.08.11 at 14:09, "H.J. Lu"  wrote:
> On Fri, Aug 12, 2011 at 12:30 AM, Jan Beulich  wrote:
>>>>> On 12.08.11 at 06:37, "H.J. Lu"  wrote:
>>> On Mon, Aug 1, 2011 at 3:15 PM, H.J. Lu  wrote:
>>>> Hi,
>>>>
>>>> It turns out that x32 needs R_X86_64_64.  One major reason is
>>>> the displacement range of x32 is -2G to +2G.  It isn't a problem
>>>> for compiler since only small model is required for x32.
>>>>
>>>> However, to address 0 to 4G directly in assembly code, we have
>>>> to use R_X86_64_64 with movabs.  I am checking the follow patch
>>>> into x32 psABI to allow R_X86_64_64.
>>>>
>>>>
>>>
>>> X32  Linker should treats R_X86_64_64 as R_X86_64_32
>>> zero-extended to 64bit for output.  I will update x32 psABI with
>>
>> I'm sorry to say that, but the situation about x32 seems to be
>> getting worse with each change you do, every time again
>> revolving around mixing up ABI specification and a particular
>> implementation thereof.
>>
>> Here, if you need something zero-extended (though I can't see
>> why you would), then you should use a new relocation type. As
>> pointed out before, there are valid possible uses of R_X86_64_64
>> that would require the semantics of x86-64.
>>
> 
> When does x32 need the semantics of x86-64 for R_X86_64_64?

When referencing an assembler or linker defined constant that
exceeds 32-bit in width. Given that this is a 64-bit architecture
with 32-bit addresses, at least I would expect such to work.

> No, you can't mix ELF32 with ELF64.

That wasn't the question, but clearly your tying of x32 to ELF32 is
another limitation that makes such assembler defined constants not
work (not sure about linker script defined ones).

Jan



Re: [x32] Allow R_X86_64_64

2011-08-12 Thread Jan Beulich
>>> On 12.08.11 at 15:22, "H.J. Lu"  wrote:
> On Fri, Aug 12, 2011 at 6:17 AM, Jan Beulich  wrote:
>>>>> On 12.08.11 at 14:09, "H.J. Lu"  wrote:
>>> On Fri, Aug 12, 2011 at 12:30 AM, Jan Beulich  wrote:
>>>>>>> On 12.08.11 at 06:37, "H.J. Lu"  wrote:
>>>>> On Mon, Aug 1, 2011 at 3:15 PM, H.J. Lu  wrote:
>>>>>> Hi,
>>>>>>
>>>>>> It turns out that x32 needs R_X86_64_64.  One major reason is
>>>>>> the displacement range of x32 is -2G to +2G.  It isn't a problem
>>>>>> for compiler since only small model is required for x32.
>>>>>>
>>>>>> However, to address 0 to 4G directly in assembly code, we have
>>>>>> to use R_X86_64_64 with movabs.  I am checking the follow patch
>>>>>> into x32 psABI to allow R_X86_64_64.
>>>>>>
>>>>>>
>>>>>
>>>>> X32  Linker should treats R_X86_64_64 as R_X86_64_32
>>>>> zero-extended to 64bit for output.  I will update x32 psABI with
>>>>
>>>> I'm sorry to say that, but the situation about x32 seems to be
>>>> getting worse with each change you do, every time again
>>>> revolving around mixing up ABI specification and a particular
>>>> implementation thereof.
>>>>
>>>> Here, if you need something zero-extended (though I can't see
>>>> why you would), then you should use a new relocation type. As
>>>> pointed out before, there are valid possible uses of R_X86_64_64
>>>> that would require the semantics of x86-64.
>>>>
>>>
>>> When does x32 need the semantics of x86-64 for R_X86_64_64?
>>
>> When referencing an assembler or linker defined constant that
>> exceeds 32-bit in width. Given that this is a 64-bit architecture
>> with 32-bit addresses, at least I would expect such to work.
>>
> 
> Yes, it should work just fine for x32 by zero-extending 32bit
> address to 64bit.

For a constant that has more than 32 significant bits???

Jan



Re: [x32] Allow R_X86_64_64

2011-08-14 Thread Jan Beulich
>>> On 12.08.11 at 19:53, "H.J. Lu"  wrote:
> For R_X86_64_64 relocation, if addend is 0, linker will
> turn it to R_X86_64_32 and zero-extends it to 64bit. Otherwise,

Rather odd a treatment, especially since at the source level one can't
necessarily control the addend emitted.

> linker will generate R_X86_64_RELATIVE64 if it is a relocation
> against a local symbol or keep R_X86_64_64 for relocations
> against external symbols. We need R_X86_64_RELATIVE64
> since R_X86_64_RELATIVE only updates 32bit destination.
> 

I can't see how this deals with the example I gave, but I guess I
should first try it out before complaining...

Jan



Re: RFC: Add 32bit x86-64 support to binutils

2011-01-03 Thread Jan Beulich
>>> On 30.12.10 at 21:02, "H.J. Lu"  wrote:
> 
> Here is the ILP32 psABI:
> 
> http://www.kernel.org/pub/linux/devel/binutils/ilp32/ 
> 

I think it is a gross misconception to tie the ABI to the ELF class of
an object. Specifying the ABI should imo be done via e_flags or
one of the unused bytes of e_ident, and in all reality the ELF class
should *only* affect the file layout (and 64-bit should never have
forbidden to use 32-bit ELF containers; similarly 64-bit ELF objects
may have uses for 32-bit architectures/ABIs, e.g. when debug
information exceeds the 4G boundary).

Jan



Re: RFC: Add 32bit x86-64 support to binutils

2011-01-04 Thread Jan Beulich
>>> On 04.01.11 at 21:02, Jakub Jelinek  wrote:
> On Tue, Jan 04, 2011 at 10:35:42AM -0800, H. Peter Anvin wrote:
>> On 01/04/2011 09:56 AM, H.J. Lu wrote:
>> >>
>> >> I think it is a gross misconception to tie the ABI to the ELF class of
>> >> an object. Specifying the ABI should imo be done via e_flags or
>> >> one of the unused bytes of e_ident, and in all reality the ELF class
>> >> should *only* affect the file layout (and 64-bit should never have
>> >> forbidden to use 32-bit ELF containers; similarly 64-bit ELF objects
>> >> may have uses for 32-bit architectures/ABIs, e.g. when debug
>> >> information exceeds the 4G boundary).
>> > 
>> > I agree with you in principle. But I think it should be done via
>> > a new attribute section, similar to ARM.
>> > 
>> 
>> Oh god, please, no.
>> 
>> I have to say I'm highly questioning to Jan's statement in the first
>> place.  Crossing 32- and 64-bit ELF like that sounds like a kernel
>> security hole waiting to happen.

A particular OS/kernel has the freedom to not implement support for
other than the default format. But having the ABI disallow it
altogether certainly isn't the right choice. And yes, we had been
allowing cross-bitness ELF in an experimental (long canceled) OS
of ours.

> Yeah, and there are other targets where the elf class determines ABI
> too (e.g. EM_S390 is used for both 31-bit and 64-bit binaries and
> the ELF class determines which).

So the usual thing is going to happen - someone made a mistake (I'm
convinced the ELF class was never meant to affect anything but the
file format), and this gets taken as an excuse to let the mistake
spread.

Jan



Re: RFC: Add 32bit x86-64 support to binutils

2011-01-05 Thread Jan Beulich
>>> On 05.01.11 at 09:01, "H. Peter Anvin"  wrote:
> On 01/04/2011 11:46 PM, Jan Beulich wrote:
>>>>
>>>> Oh god, please, no.
>>>>
>>>> I have to say I'm highly questioning to Jan's statement in the first
>>>> place.  Crossing 32- and 64-bit ELF like that sounds like a kernel
>>>> security hole waiting to happen.
>> 
>> A particular OS/kernel has the freedom to not implement support for
>> other than the default format. But having the ABI disallow it
>> altogether certainly isn't the right choice. And yes, we had been
>> allowing cross-bitness ELF in an experimental (long canceled) OS
>> of ours.
>> 
>>> Yeah, and there are other targets where the elf class determines ABI
>>> too (e.g. EM_S390 is used for both 31-bit and 64-bit binaries and
>>> the ELF class determines which).
>> 
>> So the usual thing is going to happen - someone made a mistake (I'm
>> convinced the ELF class was never meant to affect anything but the
>> file format), and this gets taken as an excuse to let the mistake
>> spread.
>> 
> 
> I don't think it's all that unreasonable to say the ELF class affects
> the ABI.  After all, there are lots of things about the ABI that is
> related to the ELF class -- the format of the GOT and PLT, for one thing.

That's in executables and dynamic objects only. I'm not aware of
anything in relocatable objects, and I'd question it for core files.
The ABI, however, has to cover all of them.

Jan



preprocessing question

2006-09-25 Thread Jan Beulich
Can anyone set me strait on why, in the following code fragment

int x(unsigned);

struct alt_x {
unsigned val;
};

#define xalt_x
#define alt_x(p) x(p+1)

int test(struct x *p) {
return x(p->val);
}

the function invoked in test() is alt_x (rather than x)? I would have
expected that the preprocessor
- finds that x is an object like macro, and replaces it with alt_x
- finds that alt_x is a function-like macro and replaces it with x(...)
- finds that again x is an object like macro, but recognizes that it
already participated in expansion, so doesn't replace x by alt_x a
second time.

Our compiler team also considers this misbehavior, but since I
tested three other compilers, and they all behave the same, I
continue to wonder if I'm mis-reading something in the standard.

Thanks a lot, Jan


Re: preprocessing question

2006-09-26 Thread Jan Beulich
>>> Daniel Jacobowitz <[EMAIL PROTECTED]> 25.09.06 18:43 >>>
>On Mon, Sep 25, 2006 at 05:23:34PM +0200, Jan Beulich wrote:
>> Can anyone set me strait on why, in the following code fragment
>> 
>> int x(unsigned);
>> 
>> struct alt_x {
>>  unsigned val;
>> };
>> 
>> #define xalt_x
>> #define alt_x(p) x(p+1)
>> 
>> int test(struct x *p) {
>>  return x(p->val);
>> }
>> 
>> the function invoked in test() is alt_x (rather than x)? I would have
>> expected that the preprocessor
>> - finds that x is an object like macro, and replaces it with alt_x
>> - finds that alt_x is a function-like macro and replaces it with x(...)
>> - finds that again x is an object like macro, but recognizes that it
>> already participated in expansion, so doesn't replace x by alt_x a
>> second time.
>
>Why do you think that x has already participated in expansion?  It
>hasn't paricipated in the expansion of the function-like macro
>alt_x, which is what is being considered, if I'm reading c99 right,
>because no nested replacement of x occurred within the processing
>of alt_x().  It's a different scan.

While, as Andreas also pointed out, the standard is a little vague in
some of what it tries to explain here, it is in my opinion clearly said
that the re-scanning restrictions are bound to the macro name, not
the fact that a function-like macro's expansion result is being
re-scanned. Hence, the re-scanning process of x has to be
considered still in progress while expanding alt_x, and consequently
x should not be subject to expansion anymore.

Jan


RE: preprocessing question

2006-09-26 Thread Jan Beulich
 #define xalt_x
>
>the preprocessor token "x" is an object-like macro standing for "alt_x", so
>when we get to
>
 #define alt_x(p) x(p+1)
>
>  what the preprocessor sees after the first round of expansion is
>
>#define alt_x(p) alt_x(p+1)

As pointed out before - there is *no* expansion for preprocessing
directives, except where the standard explicitly says otherwise.

Jan


Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Jan Beulich
In building Xen we observed a build problem when using binutils 2.15
that wasn't visible for those of us using newer binutils versions.
However, I believe that we should have seen this in all cases.

Xen gets compiled with -fPIC, and we recently added a global visibility
pragma to avoid the cost of going through the GOT for all access to
global data objects (PIC isn't really needed here, all we need is
sufficient compiler support to get the final image located outside the
+/-2Gb ranges, but large model support is neither there in older
compilers nor do we really need all of it either).

In a kallsyms-like approach, symbol information gets embedded in the
final executable, with the first linking stage not having available the
respective table symbols. For that reason, they are being attributed
weak.
After adding the global visibility hidden pragma, even these weak
symbols get accessed (or their address calculated) via RIP-relative
addressing. While accessing them this way is probably acceptable
from the compiler's perspective (given the hidden attribute it may
safely assume the symbol is in the same executable image as the
accessing code), calculating its address certainly isn't, as the symbol
may not be present at all (and after all, comparing the address of
the weak object against NULL is the only method I know to check
presence of the symbol at runtime).

So the questions are:

1) Why does gcc not use a GOT reference when calculating the 
address of a weak symbol here?

2) Why does the linker silently resolve the (32-bit PC-relative)
relocation targeting an undefined weak symbol, yielding at
run-time a non-zero address? While I can see the point of
assisting the compiler here under the assumption that it has
checked the address elsewhere and hence the actual access
is supposed to never happen at runtime, detecting the
(incorrect) use of the same relocation (access method) in
either assembly code or address calculations should be
mandatory; to distinguish the two, two distinct relocation
types would then be needed (one that keeps the linker
silent, and another one that doesn't).

Thanks, Jan

>>> Keir Fraser <[EMAIL PROTECTED]> 28.09.06 10:56 >>>
>On 28/9/06 09:23, "Keir Fraser" <[EMAIL PROTECTED]> wrote:
>> On 28/9/06 07:46, "Jan Beulich" <[EMAIL PROTECTED]> wrote:
>>>>>> Keir Fraser <[EMAIL PROTECTED]> 27.09.06 20:14 >>>
>>>> So it seems that older versions of gcc (before 4.1.1) don't do anything 
>>>> more
>>>> with the pragma than -fvisibility=hidden. So currently the pragma at best
>>>> does nothing (extern references still go through GOT) and at worst breaks
>>>> the build. :-)
>>> 
>>> That'd be contrary to my observations; I'll check into this.
>> 
>> Thanks. I am using a personal build of vanilla gcc-4.1.1 by the way.
>
>...and that's the problem. I'm using it with a too-old version of binutils
>(version 2.15). Pretty much any newer version seems to relocate the weak
>reference to 8300 (i.e., I guess rounds down to a 2GB boundary).




frame unwind issue with discontiguous code

2006-09-28 Thread Jan Beulich
While I'm not certain whether gcc is able to split one function's code
between different sections (if for nothing else, this might help reduce
TLB pressure by moving code unlikely to be executed not just out of
the main function body), by way of inline assembly the Linux kernel
certainly does in many places. Obviously, pure assembly make use
of such even more heavily.

However, when frame unwind information is generated, one quickly
becomes aware of a problem with this - the unwind information at a
continuation point in other than the base section would need to
replicate all unwind directives (note that DW_CFA_remember_state
and DW_CFA_restore_state are not suitable here, as there need
to be separate FDEs attached to the secondary code fragments).
While this is generally possible (albeit tedious) in pure assembly code,
doing so in inline assembly doesn't seem to be possible in any way
(the compiler may not even use .cfi_* directives to emit frame
unwind info).

To cover all cases, it would basically appear to be necessary to
add a referral op to the set of DW_CFA_* ops, which would
indicate that the frame state at the given point is to be derived
by assuming the location counter would in fact be at the origin
of the control transfer).

As I don't know how to approach requesting an addition like this
to the Dwarf standard, I'm trying my luck here.

Any pointers or suggestions are greatly appreciated.

Thanks, Jan


Re: Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Jan Beulich
>>> "H. J. Lu" <[EMAIL PROTECTED]> 28.09.06 15:24 >>>
>On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote:
>> 
>> 2) Why does the linker silently resolve the (32-bit PC-relative)
>> relocation targeting an undefined weak symbol, yielding at
>> run-time a non-zero address? While I can see the point of
>
>Do you have a testcase? I can't reproduce it. If it is true, I consider
>it a linker bug.

Attached. The linker script likely is not minimal, but I think the important
point is that it sets the origin to a non-zero value.

Compiling this with gcc 4.1.1 (-c -fPIC) and linking with ld 2.17 (no other
options than those necessary to specify input and output) succeeds,
while linking with ld 2.15 fails (due to relocation overflow).

But again, if this is plainly a linker bug, then the compiler also must not
access weak objects through RIP-relative addressing (i.e. then we also
have a compiler bug here), while I continue to think that the fact that
there is a 'hidden' attribute should allow the compiler to do better than
going through GOT (at the expense of a new relocation type).

Jan


got.lds
Description: Binary data


got.c
Description: Binary data


unusable libatomic.so built in certain environments

2013-06-17 Thread Jan Beulich
In an environment with relatively old core components (dynamic
loader and glibc) but with up-to-date binutils (perhaps built along
with gcc) libatomic.so gets built in a way such that it is unusable
on the build system. A similar issue was reported in a mail leading
to http://gcc.gnu.org/ml/gcc-patches/2013-02/msg00315.html,
but I don't view switching back to old binutils as an acceptable
option.

Looking at the libatomic configury, I also do not see a way to
suppress the use of GNU IFUNC symbols. Am I overlooking
something, or is this an outright bug in a configuration no-one
really ever thought about? At least in a non-cross build I'd
expect runtime properties to be taken into account here. And
for cross builds I'd expect a way to control whether the final
binary would be using GNU IFUNC symbols rather than just
making this dependent upon tool chain capabilities.

Thanks, Jan



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-08-08 Thread Jan Beulich
>>> On 08.08.13 at 02:33, "H.J. Lu"  wrote:
> We use the .gnu_attribute directive to record an object attribute:
> 
> enum
> {
>   Tag_GNU_X86_EXTERN_BRANCH = 4,
> };
> 
> for the types of external branch instructions in relocatable files.
> 
> enum
> {
>   /* All external branch instructions are legacy.  */
>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>   /* There is at lease one external branch instruction with BND prefix.  */
>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
> };
> 
> An x86 feature note section, .note.x86-feature, is used to indicate
> features in executables and shared library. The contents of this note
> section are:
> 
> .section.note.x86-feature
> .align  4
> .long   .L1 - .L0
> .long   .L3 - .L2
> .long   1
> .L0:
> .asciz "x86 feature"
> .L1:
> .align  4
> .L2:
> .longFeatureFlag (Feature flag)
> .L3:
> 
> The current valid bits in FeatureFlag are
> 
> #define NT_X86_FEATURE_PLT_BND(0x1 << 0)
> 
> It should be set if PLT entry has BND prefix to preserve bound registers.
> 
> The remaining bits in FeatureFlag are reserved.
> 
> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
> Val_GNU_X86_EXTERN_BRANCH_BND.
> 
> When generating executable or shared library, if PLT is needed and
> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
> the 32-byte PLT entry should be used and the feature note section should
> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
> note section should be included in PT_NOTE segment. The benefit of the
> note section is it is backward compatible with existing run-time and tools.

While I can see the purpose of the attribute section, I don't see
what the note section is for: You don't mention at all what it's
consumed for, and I also can't see how it validly would be for
anything. That's because iirc note section contents, if not
understood by the consumer, is required to not have any effect
on the correctness of the program. Hence if loaded on a system
that MPX capable, has an MPX aware kernel, but no MPX aware
user space (apart from this one executable or shared library, or
a set thereof), it ought to still work correctly. Which - afaict - it
won't if the dynamic loader itself isn't MPX aware.

Jan



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-08-09 Thread Jan Beulich
>>> On 08.08.13 at 18:01, "H.J. Lu"  wrote:
> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich  wrote:
>>>>> On 08.08.13 at 02:33, "H.J. Lu"  wrote:
>>> We use the .gnu_attribute directive to record an object attribute:
>>>
>>> enum
>>> {
>>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>>> };
>>>
>>> for the types of external branch instructions in relocatable files.
>>>
>>> enum
>>> {
>>>   /* All external branch instructions are legacy.  */
>>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>>   /* There is at lease one external branch instruction with BND prefix.  */
>>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>>> };
>>>
>>> An x86 feature note section, .note.x86-feature, is used to indicate
>>> features in executables and shared library. The contents of this note
>>> section are:
>>>
>>> .section.note.x86-feature
>>> .align  4
>>> .long   .L1 - .L0
>>> .long   .L3 - .L2
>>> .long   1
>>> .L0:
>>> .asciz "x86 feature"
>>> .L1:
>>> .align  4
>>> .L2:
>>> .longFeatureFlag (Feature flag)
>>> .L3:
>>>
>>> The current valid bits in FeatureFlag are
>>>
>>> #define NT_X86_FEATURE_PLT_BND(0x1 << 0)
>>>
>>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>>
>>> The remaining bits in FeatureFlag are reserved.
>>>
>>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>>
>>> When generating executable or shared library, if PLT is needed and
>>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>>> the 32-byte PLT entry should be used and the feature note section should
>>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>>> note section should be included in PT_NOTE segment. The benefit of the
>>> note section is it is backward compatible with existing run-time and tools.
>>
>> While I can see the purpose of the attribute section, I don't see
>> what the note section is for: You don't mention at all what it's
>> consumed for, and I also can't see how it validly would be for
>> anything. That's because iirc note section contents, if not
>> understood by the consumer, is required to not have any effect
>> on the correctness of the program. Hence if loaded on a system
>> that MPX capable, has an MPX aware kernel, but no MPX aware
>> user space (apart from this one executable or shared library, or
>> a set thereof), it ought to still work correctly. Which - afaict - it
>> won't if the dynamic loader itself isn't MPX aware.
>>
> 
> The note section isn't required for correctness.  But it can be used
> by ld.so to select an alternate MPX aware shared library in a different
> directory, instead of a legacy one.

Okay, that clarifies your intentions with the note section. However,
then you need something else to make sure an MPX aware app can't
load on an MPX enabled kernel without MPX-enabled ld.so.

> There is another way to encode this information in the first entry
> of PLT:
> 
>0:ff 35 00 00 00 00pushq  GOT+8(%rip)
>6:f2 ff 25 00 00 00 00 bnd jmpq *GOT+16(%rip)
>d:0f 1f 44 00 00   nopl   0x0(%rax,%rax,1)
>   12:0f 1f 80 00 00 00 00 nopl   0x0(%rax)
>   19:0f 1f 80 00 00 00 01 nopl   0x100(%rax)
> 
> We can encode PLT property in 10 (4 + 4 + 2) bytes of
> displacements of 3 nops.  In this example, the first bit
> of the last byte of PLT0 is 1.

While a nice idea, I think that's worse, because much harder to
determine from simply dumping information for a given binary.

Jan



Re: [x86-64 psABI] RFC: Extend x86-64 PLT entry to support MPX

2013-08-12 Thread Jan Beulich
>>> On 09.08.13 at 19:03, "H.J. Lu"  wrote:
> On Fri, Aug 9, 2013 at 12:08 AM, Jan Beulich  wrote:
>>>>> On 08.08.13 at 18:01, "H.J. Lu"  wrote:
>>> On Thu, Aug 8, 2013 at 12:19 AM, Jan Beulich  wrote:
>>>>>>> On 08.08.13 at 02:33, "H.J. Lu"  wrote:
>>>>> We use the .gnu_attribute directive to record an object attribute:
>>>>>
>>>>> enum
>>>>> {
>>>>>   Tag_GNU_X86_EXTERN_BRANCH = 4,
>>>>> };
>>>>>
>>>>> for the types of external branch instructions in relocatable files.
>>>>>
>>>>> enum
>>>>> {
>>>>>   /* All external branch instructions are legacy.  */
>>>>>   Val_GNU_X86_EXTERN_BRANCH_LEGACY = 0,
>>>>>   /* There is at lease one external branch instruction with BND prefix.  
>>>>> */
>>>>>   Val_GNU_X86_EXTERN_BRANCH_BND = 1,
>>>>> };
>>>>>
>>>>> An x86 feature note section, .note.x86-feature, is used to indicate
>>>>> features in executables and shared library. The contents of this note
>>>>> section are:
>>>>>
>>>>> .section.note.x86-feature
>>>>> .align  4
>>>>> .long   .L1 - .L0
>>>>> .long   .L3 - .L2
>>>>> .long   1
>>>>> .L0:
>>>>> .asciz "x86 feature"
>>>>> .L1:
>>>>> .align  4
>>>>> .L2:
>>>>> .longFeatureFlag (Feature flag)
>>>>> .L3:
>>>>>
>>>>> The current valid bits in FeatureFlag are
>>>>>
>>>>> #define NT_X86_FEATURE_PLT_BND(0x1 << 0)
>>>>>
>>>>> It should be set if PLT entry has BND prefix to preserve bound registers.
>>>>>
>>>>> The remaining bits in FeatureFlag are reserved.
>>>>>
>>>>> When merging Tag_GNU_X86_EXTERN_BRANCH, if any input relocatable
>>>>> file has Tag_GNU_X86_EXTERN_BRANCH set to Val_GNU_X86_EXTERN_BRANCH_BND,
>>>>> the resulting Tag_GNU_X86_EXTERN_BRANCH value should be
>>>>> Val_GNU_X86_EXTERN_BRANCH_BND.
>>>>>
>>>>> When generating executable or shared library, if PLT is needed and
>>>>> Tag_GNU_X86_EXTERN_BRANCH value is Val_GNU_X86_EXTERN_BRANCH_BND,
>>>>> the 32-byte PLT entry should be used and the feature note section should
>>>>> be generated with the NT_X86_FEATURE_PLT_BND bit set to 1 and the feature
>>>>> note section should be included in PT_NOTE segment. The benefit of the
>>>>> note section is it is backward compatible with existing run-time and 
>>>>> tools.
>>>>
>>>> While I can see the purpose of the attribute section, I don't see
>>>> what the note section is for: You don't mention at all what it's
>>>> consumed for, and I also can't see how it validly would be for
>>>> anything. That's because iirc note section contents, if not
>>>> understood by the consumer, is required to not have any effect
>>>> on the correctness of the program. Hence if loaded on a system
>>>> that MPX capable, has an MPX aware kernel, but no MPX aware
>>>> user space (apart from this one executable or shared library, or
>>>> a set thereof), it ought to still work correctly. Which - afaict - it
>>>> won't if the dynamic loader itself isn't MPX aware.
>>>>
>>>
>>> The note section isn't required for correctness.  But it can be used
>>> by ld.so to select an alternate MPX aware shared library in a different
>>> directory, instead of a legacy one.
>>
>> Okay, that clarifies your intentions with the note section. However,
>> then you need something else to make sure an MPX aware app can't
>> load on an MPX enabled kernel without MPX-enabled ld.so.
> 
> The MPX enabled app will still run correctly.  ld.so will clear the bound
> registers (that makes unlimited bound) for the first call with lazy binding.

Only if those registers are used for their primary purpose. The
documentation specifically says that this isn't a requirement.
But anyway, I see we're once again not going to get anywhere
with this...

Jan



Re: [Xen-devel] coverage license information

2013-02-04 Thread Jan Beulich
>>> On 04.02.13 at 17:46, Ian Lance Taylor  wrote:
> On Mon, Feb 4, 2013 at 6:54 AM, Frediano Ziglio
>  wrote:
>>
>> I imported some headers from Linux kernel which mainly came from
>> gcov-io.h and the structures used internally by GCC.
>>
>> Our problem is currently about the license. In gcov-io.h is stated that
>> license is mainly GPL2 which the exception that linking the "library"
>> with other files does not cause these files to be GPL2. Now however I'm
>> not linking to any library but just using the structure declaration
>> inside the header to produce a blob that is currently converted into GCC
>> files by an external utility (Xen has not file system so we extract
>> coverage information).
>>
>> It's not a problem to use these headers/structure from Xen (which is
>> GPL2) but we'd like to have these defines in our public include headers.
>> The license however of these headers is quite open and allow to be used
>> for instance in commercial programs. How the license would affect these
>> programs?
>>
>> Another question we have is the stability of these structures. Can we
>> just check the version field of gcov_info to make sure that the internal
>> structure is not changed or is it expected that even this field would
>> change (for instance position or size inside the structure) ?
> 
> You neglected to say which version of GCC you are using.  In current
> GCC the header file gcov-io.h is under GPLv3 with the GCC Runtime
> Library Exception 3.1
> (http://www.gnu.org/licenses/gcc-exception-3.1.html).
> 
> I don't fully grasp the situation in which a user of xen would want to
> #include this header file.  But if a program does #include the header
> file, then in the strictest possible reading that program would be
> covered by GPLv3 plus the GCC Runtime Library Exception.  That would
> impose certain requirements on the program, basically that if it is
> compiled by a version of GCC with a proprietary extension, the program
> may not be distributed in binary form.

You probably meant "binary only form" here?

Jan

>  Those requirements already
> apply to essentially any program compiled by a current version of GCC.
>  Inciuding the header file gcov-io.h should not add any additional
> requirements.
> 
> Hope this helps.  This is of course not legal advice, but you are
> unlikely to get good legal advice in this area.
> 
> Ian
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org 
> http://lists.xen.org/xen-devel 





Re: PATCH: PR gas/5534: "XXX PTR" isn't checked properly in Intel syntax

2008-01-04 Thread Jan Beulich
While I agree on the subject, I slightly disagree on the approach you took: The 
added flags shouldn't go on the instructions, but on their operands (otherwise 
you'll likely end up creating more special case code namely for movzx/movsx, 
but perhaps also elsewhere): Just like for registers, memory operands should 
properly specify what sizes are acceptable (basically, operand type and operand 
size should probably be decoupled). Jan

>>> "H.J. Lu" <[EMAIL PROTECTED]> 01/02/08 9:54 PM >>>
If an instruction is marked with IgnoreSize, we don't check for
memory size in Intel mode. I am checking in this patch to create
the infrastructure to handle it properly. I will fix movq first
and work on others later. Eventually, the x86 assembler will check
memory size for all instructions in Intel mode.


H.J.

gas/

2008-01-02  H.J. Lu  <[EMAIL PROTECTED]>

PR gas/5534
* config/tc-i386.c (match_template): Handle XMMWORD_MNEM_SUFFIX.
Check memory size in Intel mode.
(process_suffix): Handle XMMWORD_MNEM_SUFFIX.
(intel_e09): Likewise.

* config/tc-i386.h (XMMWORD_MNEM_SUFFIX): New.

gas/testsuite/

2008-01-02  H.J. Lu  <[EMAIL PROTECTED]>

PR gas/5534
* gas/i386/intel.s: Use QWORD on movq instead of DWORD.

* gas/i386/inval.s: Add tests for movq.
* gas/i386/x86-64-inval.s: Likewise.

* gas/i386/inval.l: Updated.
* gas/i386/x86-64-inval.l: Likewise.

opcodes/

2008-01-02  H.J. Lu  <[EMAIL PROTECTED]>

PR gas/5534
* i386-gen.c (opcode_modifiers): Add No_xSuf, CheckSize,
Byte, Word, Dword, QWord and Xmmword.

* i386-opc.h (No_xSuf): New.
(CheckSize): Likewise.
(Byte): Likewise.
(Word): Likewise.
(Dword): Likewise.
(QWord): Likewise.
(Xmmword): Likewise.
(FWait): Updated.
(i386_opcode_modifier): Add No_xSuf, CheckSize, Byte, Word,
Dword, QWord and Xmmword.

* i386-opc.tbl: Add CheckSize|QWord to movq if IgnoreSize is
used.
* i386-tbl.h: Regenerated.

--- binutils/gas/config/tc-i386.c.ptr   2007-12-31 10:53:14.0 -0800
+++ binutils/gas/config/tc-i386.c   2008-01-02 12:24:58.0 -0800
@@ -3047,6 +3047,8 @@ match_template (void)
 suffix_check.no_qsuf = 1;
   else if (i.suffix == LONG_DOUBLE_MNEM_SUFFIX)
 suffix_check.no_ldsuf = 1;
+  else if (i.suffix == XMMWORD_MNEM_SUFFIX)
+suffix_check.no_xsuf = 1;
 
   for (t = current_templates->start; t < current_templates->end; t++)
 {
@@ -3077,6 +3079,18 @@ match_template (void)
  || (t->opcode_modifier.no_ldsuf && suffix_check.no_ldsuf)))
continue;
 
+  /* Check memory size in Intel mode if needed when it is provided
+and isn't ignored.  */
+  if (intel_syntax
+ && (i.suffix || !t->opcode_modifier.ignoresize)
+ && t->opcode_modifier.checksize
+ && !((t->opcode_modifier.byte && suffix_check.no_bsuf)
+  || (t->opcode_modifier.word && suffix_check.no_wsuf)
+  || (t->opcode_modifier.dword && suffix_check.no_lsuf)
+  || (t->opcode_modifier.qword && suffix_check.no_qsuf)
+  || (t->opcode_modifier.xmmword && suffix_check.no_xsuf)))
+   continue;
+
   for (j = 0; j < MAX_OPERANDS; j++)
operand_types [j] = t->operand_types [j];
 
@@ -3453,6 +3467,11 @@ process_suffix (void)
  if (!check_word_reg ())
return 0;
}
+  else if (i.suffix == XMMWORD_MNEM_SUFFIX)
+   {
+ /* Skip if the instruction has x suffix.  match_template
+should check if it is a valid suffix.  */
+   }
   else if (intel_syntax && i.tm.opcode_modifier.ignoresize)
/* Do nothing if the instruction is going to ignore the prefix.  */
;
@@ -3535,7 +3554,9 @@ process_suffix (void)
   /* Change the opcode based on the operand size given by i.suffix;
  We don't need to change things for byte insns.  */
 
-  if (i.suffix && i.suffix != BYTE_MNEM_SUFFIX)
+  if (i.suffix
+  && i.suffix != BYTE_MNEM_SUFFIX
+  && i.suffix != XMMWORD_MNEM_SUFFIX)
 {
   /* It's not a byte, select word/dword operation.  */
   if (i.tm.opcode_modifier.w)
@@ -8166,8 +8187,7 @@ intel_e09 (void)
 
  else if (prev_token.code == T_XMMWORD)
{
- /* XXX ignored for now, but accepted since gcc uses it */
- suffix = 0;
+ suffix = XMMWORD_MNEM_SUFFIX;
}
 
  else
--- binutils/gas/config/tc-i386.h.ptr   2007-11-01 11:48:52.0 -0700
+++ binutils/gas/config/tc-i386.h   2008-01-02 10:40:23.0 -0800
@@ -116,12 +116,14 @@ extern const char *i386_comment_chars;
 #define IMMEDIATE_PREFIX '$'
 #define ABSOLUTE_PREFIX '*'
 
-/* these are the instruction mnemonic suffixes.  */
+/* these are the instruction mnemonic suffixes in AT&T syntax or
+   memory operand size in Intel syntax.  */
 #define WORD_MNEM_SUFFIX  '

-fpic support detection in testsuite

2008-02-19 Thread Jan Beulich
gcc/testsuite/lib/target-supports.exp checks whether the compiler spits
out any messages when using -fpic/-fPIC; this doesn't cover the case
where the compiler happily processes everything, but the linker cannot
deal with the result (in the given case, because the specific gas (x86) in
use accepts @ as a normal symbol character, and hence the usual
@ syntax doesn't yield the expected result; note
that the target doesn't really need PIC code, not does it support TLS,
thus all the constructs are really meaningless).

Should the testsuite not instead do a test whether all involved tools
are able to handle -fPIC and its results)? Or should the target simply
disallow -fPIC (and if so, how is that supposed to be done)?

Thanks, Jan



Re: [Bug c/33076] Warning when passing a pointer to a const array to a function that expects a point

2007-09-19 Thread Jan Beulich
Andreas,

besides doing this act of bookkeeping, could you also point out a solution to
the problem at hand?

Thanks, Jan

>>> "schwab at suse dot de" <[EMAIL PROTECTED]> 19.09.07 11:47 >>>


--- Comment #4 from schwab at suse dot de  2007-09-19 09:47 ---


*** This bug has been marked as a duplicate of 16602 ***


-- 

schwab at suse dot de changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution||DUPLICATE


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33076 

--- You are receiving this mail because: ---
You are on the CC list for the bug, or are watching someone who is.



Re: [RFC] [PATCH] 32-bit pointers in x86-64

2007-11-26 Thread Jan Beulich
>You can't use conventional 32-bit x86 code, so there seems little or no 
>benefit in allowing 32 and 64-bit code to be mixed.

Why not? Switching between 32- and 64-bit modes doesn't involve anything
(apart from knowing the proper selector register values) that cannot be done
purely in user mode. Specifically, I once successfully tried executing 64-bit
code in a 32-bit process.

Jan



Re: [RFC] [PATCH] 32-bit pointers in x86-64

2007-11-26 Thread Jan Beulich
>>You can't use conventional 32-bit x86 code, so there seems little or no 
>>benefit in allowing 32 and 64-bit code to be mixed.
>
>Why not? Switching between 32- and 64-bit modes doesn't involve anything
>(apart from knowing the proper selector register values) that cannot be done
>purely in user mode. Specifically, I once successfully tried executing 64-bit
>code in a 32-bit process.

Oh, I didn't see the original post first (the spam filter ate it for some 
reason).
Of course, if the intention of running this in 64-bit mode makes my comment
void (except for the rumor-like thing I heard a few times that there's an
undocumented EFER bit allowing x86-32 mode in the sense intended here,
but that again would require a kernel running in the same mode underneath).

Jan



Re: [RFC] [PATCH] 32-bit pointers in x86-64

2007-12-05 Thread Jan Beulich
>>> "Andrew Pinski" <[EMAIL PROTECTED]> 25.11.07 19:45 >>>
>On 11/25/07, Luca <[EMAIL PROTECTED]> wrote:
>> 7.1. Add __attribute__((pointer_size(XXX))) and #pragma pointer_size
>> to allow 64-bit pointers in 32-bit mode and viceversa
>
>This is already there, try using __attribute__((mode(DI) )).

Hmm, unless this is a new feature in 4.3, I can't seem to get this to work on
either i386 (using mode DI) or x86-64 (using mode SI). Could you clarify? If
this worked consistently on at least all 64-bit architectures, I would have a
use for it in the kernel (cutting down the kernel size by perhaps several
pages). Btw., I continue to think that the error message 'initializer element
is not computable at load time' on 64-bit code like this

extern char array[];
unsigned int p = (unsigned long)array;

or 32-bit code like this

extern char array[];
unsigned long long p = (unsigned long)array;

is incorrect - the compiler generally has no knowledge what 'array' is (it may
know whether the architecture is generally capable of expressing the
necessary relocation, but if 'array' is really a placeholder for an assembly
level constant, possibly even defined through __asm__() in the same
translation unit, this diagnostic should at best be a warning). I'm pretty
sure I have an open bug for this, but the sad thing is that bugs like this
never appear to really get looked at.

Thanks, Jan



libiberty D tuple demangling

2022-07-24 Thread Jan Beulich via Gcc
Hello,

while commit 3f30a274913b ("libiberty: Update D symbol demangling
for latest ABI spec") mentions in its description that tuple encoding
has changed, there's no real adjustment to dlang_parse_tuple() there,
nor are there any new (or replaced) test cases for that. Was this
simply overlooked?

Furthermore the current ABI specifies "B Parameters Z". As I don't
know what the old ABI said, I can only wonder whether the present
code decoding (in a loop) merely a Type (and not a Parameter) was
actually correct.

Thanks for any insight, Jan


Re: libiberty D tuple demangling

2022-07-25 Thread Jan Beulich via Gcc
On 25.07.2022 14:05, ibuc...@gdcproject.org wrote:
>> On 25/07/2022 08:45 CEST Jan Beulich  wrote:
>> while commit 3f30a274913b ("libiberty: Update D symbol demangling
>> for latest ABI spec") mentions in its description that tuple encoding
>> has changed, there's no real adjustment to dlang_parse_tuple() there,
>> nor are there any new (or replaced) test cases for that. Was this
>> simply overlooked?
> 
> Is there any specific example that fails to demangle, or are you just 
> skimming?

I'm merely looking at the code alongside the ABI spec.

> From what I recall, there is a couple places in the dlang_demangle parser 
> that handle ambiguities in a mangled symbol.  The ABI change only added a 
> terminating 'Z', which makes said code that handles ambiguity redundant - but 
> of course kept around so we handle both old and new symbols.

It's not just the addition of Z at the end but also the dropping of the
number of elements at the beginning, aiui. It's actually that aspect
which caught my attention, since the ABI doesn't talk about any number
there, but the code fetches one.

>> Furthermore the current ABI specifies "B Parameters Z". As I don't
>> know what the old ABI said, I can only wonder whether the present
>> code decoding (in a loop) merely a Type (and not a Parameter) was
>> actually correct.
>>
> 
> Do you think we should instead be calling dlang_function_args instead?
> 
> (Having a quick look at both, that does seem to be the case).

Well - with a number of elements specified, it might have needed to
be a function processing a single argument only. For the new ABI -
yes, that's the function I would have expected to be called.

Jan


Re: libiberty D tuple demangling

2022-07-25 Thread Jan Beulich via Gcc
On 25.07.2022 17:45, ibuc...@gdcproject.org wrote:
>> On 25/07/2022 14:13 CEST Jan Beulich  wrote:
>>
>>  
>> On 25.07.2022 14:05, ibuc...@gdcproject.org wrote:
>>>> On 25/07/2022 08:45 CEST Jan Beulich  wrote:
>>>> while commit 3f30a274913b ("libiberty: Update D symbol demangling
>>>> for latest ABI spec") mentions in its description that tuple encoding
>>>> has changed, there's no real adjustment to dlang_parse_tuple() there,
>>>> nor are there any new (or replaced) test cases for that. Was this
>>>> simply overlooked?
>>>
>>> Is there any specific example that fails to demangle, or are you just 
>>> skimming?
>>
>> I'm merely looking at the code alongside the ABI spec.
>>
>>> From what I recall, there is a couple places in the dlang_demangle parser 
>>> that handle ambiguities in a mangled symbol.  The ABI change only added a 
>>> terminating 'Z', which makes said code that handles ambiguity redundant - 
>>> but of course kept around so we handle both old and new symbols.
>>
>> It's not just the addition of Z at the end but also the dropping of the
>> number of elements at the beginning, aiui. It's actually that aspect
>> which caught my attention, since the ABI doesn't talk about any number
>> there, but the code fetches one.
>>
> 
> Went to have a look at docarchives, but it appears to be down (that's on me, 
> I have been meaning to migrate the service to new servers).
> 
> Yes, your right, the number was indeed dropped too from the ABI.
> 
> https://web.archive.org/web/20170812061158/https://dlang.org/spec/abi.html#TypeTuple
> 
> TypeTuple:
> B Number Parameters
> 
> https://dlang.org/spec/abi.html#TypeTuple
> 
> TypeTuple:
> B Parameters Z
> 
> However, it gets worse the more I stare at it. Looks like it was not 
> understood what 'Number' meant in the old ABI. I assumed it was the encoded 
> number of tuple elements - same as static arrays - however what I see in the 
> front-end is instead an encoded buffer length.
> 
> https://github.com/gcc-mirror/gcc/blob/releases/gcc-10/gcc/d/dmd/dmangle.c#L312-L313
> 
> So the loop should instead be more like:
> ---
>   unsigned long len;
> 
>   mangled = dlang_number (mangled, &len);
>   if (mangled == NULL)
> return NULL;
> 
>   string_append (decl, "Tuple!(");
> 
>   const char *endp = mangled + len;
>   int elements = 0;
>   while (mangled != endp)
> {
>   if (elements++)
> string_append (decl, ", ");
> 
>   mangled = dlang_type (decl, mangled, info);
>   if (mangled == NULL || mangled > endp)
>   return NULL;
> }
> 
>   string_append (decl, ")");
>   return mangled;
> ---

Oh. Then two of the testcases are actually wrong as well:

_D8demangle4testFB2OaaZv
_D8demangle4testFB3aDFZaaZv

I would have assumed they had been taken from observable output of a
compiler, ...

> On top of that, TypeTuple is a compile-time-only type - it never leaks to the 
> code generator - so the grammar entry in the ABI is frivolous (although 
> internally, that it gets a mangle at all would save some memory as duplicated 
> types are merged).

... but one way of reading this would make me infer that can't have
been the case.

Jan


Re: Problems when building NT kernel drivers with GCC / LD

2022-10-31 Thread Jan Beulich via Gcc
On 30.10.2022 02:06, Pali Rohár via Binutils wrote:
> * GCC or LD (not sure who) sets memory alignment characteristics
>   (IMAGE_SCN_ALIGN_MASK) into the sections of PE executable binary.
>   These characteristics should be only in COFF object files, not
>   executable binaries. Specially they should not be in NT kernel
>   drivers.

Like Martin pointed out in reply for another item, I'm pretty sure
this one was taken care of in bfd already (and iirc is in 2.39). You
fail to mention at all what versions of the various components you
use. I guess before reporting such a long list of issue you would
have wanted to test at least with the most recent releases of each
of the involved components. I wouldn't exclude some further items
could then be scratched off your list.

Jan


Re: Problems when building NT kernel drivers with GCC / LD

2022-11-20 Thread Jan Beulich via Gcc
On 20.11.2022 14:10, Pali Rohár wrote:
> On Saturday 05 November 2022 02:26:52 Pali Rohár wrote:
>> On Saturday 05 November 2022 01:57:49 Pali Rohár wrote:
>>> On Monday 31 October 2022 10:55:59 Jan Beulich wrote:
>>>> On 30.10.2022 02:06, Pali Rohár via Binutils wrote:
>>>>> * GCC or LD (not sure who) sets memory alignment characteristics
>>>>>   (IMAGE_SCN_ALIGN_MASK) into the sections of PE executable binary.
>>>>>   These characteristics should be only in COFF object files, not
>>>>>   executable binaries. Specially they should not be in NT kernel
>>>>>   drivers.
>>>>
>>>> Like Martin pointed out in reply for another item, I'm pretty sure
>>>> this one was taken care of in bfd already (and iirc is in 2.39). You
>>>> fail to mention at all what versions of the various components you
>>>> use.
>>>
>>> Ou, sorry for that. I take care to write issues in all details and
>>> totally forgot to write such important information like tool versions.
>>>
>>> Now I retested all issues on Debian 11 which has LD 2.35.2 and GCC
>>> 10.2.1 and all issues are there still valid except data characteristic
>>> IMAGE_SCN_CNT_INITIALIZED_DATA for code sections IMAGE_SCN_CNT_CODE.
>>>
>>> I can easily retest it with LD 2.39 and GCC 10.3.0 which is in Debian
>>> testing.
>>
>> Retested with LD 2.39 and GCC 10.3.0 which is in Debian testing and
>> following problems are additionally fixed: --exclude-all-symbols,
>> --dynamicbase and IMAGE_SCN_ALIGN_MASK (which you mentioned above). All
>> other still reminds.
>>
>> Do you need some other information?
> 
> Hello! I would like to ask if you need some other details or something
> else for these issues.

Well, generally speaking it might help if you could provide smallish
testcases for every item individually. But then, with you replying to
me specifically, perhaps you're wrongly assuming that I would be
planning to look into addressing any or all of these? My earlier reply
was merely to point out that _some_ work has already been done ...

Jan


Re: Problems when building NT kernel drivers with GCC / LD

2022-11-28 Thread Jan Beulich via Gcc
On 26.11.2022 20:04, Pali Rohár wrote:
> On Monday 21 November 2022 08:24:36 Jan Beulich wrote:
>> But then, with you replying to
>> me specifically, perhaps you're wrongly assuming that I would be
>> planning to look into addressing any or all of these? My earlier reply
>> was merely to point out that _some_ work has already been done ...
> 
> I added into CC also gcc, ld and mingw mailing list. If this is not
> enough, could you tell me who to contact about those issues?

That's probably enough, sure. I merely tried to set expectations right,
since you did reply To: me (and lists were only on Cc: - it being the
other way around would have demonstrated that you're not asking me
specifically).

Jan


Re: Problems when building NT kernel drivers with GCC / LD

2022-11-28 Thread Jan Beulich via Gcc
On 28.11.2022 09:40, Jonathan Wakely wrote:
> On Mon, 28 Nov 2022, 08:08 Jan Beulich via Gcc,  wrote:
> 
>> On 26.11.2022 20:04, Pali Rohár wrote:
>>> On Monday 21 November 2022 08:24:36 Jan Beulich wrote:
>>>> But then, with you replying to
>>>> me specifically, perhaps you're wrongly assuming that I would be
>>>> planning to look into addressing any or all of these? My earlier reply
>>>> was merely to point out that _some_ work has already been done ...
>>>
>>> I added into CC also gcc, ld and mingw mailing list. If this is not
>>> enough, could you tell me who to contact about those issues?
>>
>> That's probably enough, sure. I merely tried to set expectations right,
>> since you did reply To: me (and lists were only on Cc: - it being the
>> other way around would have demonstrated that you're not asking me
>> specifically).
>>
> 
> That's just how most mailers do "Reply All", I don't think it out implies
> anything.

I know mailers behave that way. But when replying you can adjust To:
vs Cc:. That's what I'm doing all the time (or at least I'm trying to
remember to do so), because it makes a difference to me whether mail
is sent To: me vs I'm only being Cc:-ed. Otherwise - why do we have
To: and Cc: as different categories?

> Removing the Cc list and *only* replying to you would be different.

Sure - that would have meant sending private mail, which is yet worse.

Jan


x86: making better use of vpternlog{d,q}

2023-05-24 Thread Jan Beulich via Gcc
Hello,

for a couple of years I was meaning to extend the use of these AVX512F
insns beyond the pretty minimalistic ones there are so far. Now that I've
got around to at least draft something, I ran into a couple of issues I
cannot explain. I'd like to start with understanding the unexpected
effects of a change to an existing insn I have made (reproduced at the
bottom). I certainly was prepared to observe testsuite failures, but it
ends up failing tests I didn't expect it would fail, and - upon looking
at sibling ones - also ends up leaving intact tests which I would expect
would then need adjustment (because of using the new alternative).

In particular (all mentioned tests are in gcc.target/i386/)
- avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because
  for whatever reason generated code reverts back to using vpbroadcastd,
- avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use
  vpandnq with embedded broadcast),
- avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when
  at the same time a made-up DI variant of the test (akin to what might
  be an avx512f-andn-di-zmm-2.c testcase) does.
IOW: How is SI mode element size different here from DI mode one? Is
there anything wrong with the 4th alternative I'm adding, or is this
hinting at some anomaly elsewhere?

Just to mention it, avx512f-andn-si-zmm-5.c similarly fails
unexpectedly, but I guess for the same reason (and there aren't AVX512VL
or DI mode element counterparts thereof).

Jan

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17019,11 +17019,11 @@
   "TARGET_AVX512F")
 
 (define_insn "*andnot3"
-  [(set (match_operand:VI 0 "register_operand" "=x,x,v")
+  [(set (match_operand:VI 0 "register_operand" "=x,x,v,v")
(and:VI
- (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
- (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
-  "TARGET_SSE"
+ (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,mBr"))
+ (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v")))]
+  "TARGET_SSE && (REG_P (operands[1]) || REG_P (operands[2]))"
 {
   char buf[64];
   const char *ops;
@@ -17090,6 +17090,11 @@
 case 2:
   ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
   break;
+case 3:
+  tmp = "pternlog";
+  ssesuffix = "";
+  ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}";
+  break;
 default:
   gcc_unreachable ();
 }
@@ -17098,7 +17103,7 @@
   output_asm_insn (buf, operands);
   return "";
 }
-  [(set_attr "isa" "noavx,avx,avx")
+  [(set_attr "isa" "noavx,avx,avx,avx512f")
(set_attr "type" "sselog")
(set (attr "prefix_data16")
  (if_then_else
@@ -17106,7 +17111,7 @@
(eq_attr "mode" "TI"))
(const_string "1")
(const_string "*")))
-   (set_attr "prefix" "orig,vex,evex")
+   (set_attr "prefix" "orig,vex,evex,evex")
(set (attr "mode")
(cond [(match_test "TARGET_AVX2")
 (const_string "")
@@ -17119,7 +17124,11 @@
(match_test "optimize_function_for_size_p (cfun)"))
 (const_string "V4SF")
  ]
- (const_string "")))])
+ (const_string "")))
+   (set (attr "enabled")
+   (if_then_else (eq_attr "alternative" "3")
+ (symbol_ref " == 64 ? TARGET_AVX512F : 
TARGET_AVX512VL")
+ (const_string "*")))])
 
 ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
 (define_split


Re: x86: making better use of vpternlog{d,q}

2023-05-25 Thread Jan Beulich via Gcc
On 24.05.2023 11:01, Hongtao Liu wrote:
> On Wed, May 24, 2023 at 3:58 PM Jan Beulich via Gcc  wrote:
>>
>> Hello,
>>
>> for a couple of years I was meaning to extend the use of these AVX512F
>> insns beyond the pretty minimalistic ones there are so far. Now that I've
>> got around to at least draft something, I ran into a couple of issues I
>> cannot explain. I'd like to start with understanding the unexpected
>> effects of a change to an existing insn I have made (reproduced at the
>> bottom). I certainly was prepared to observe testsuite failures, but it
>> ends up failing tests I didn't expect it would fail, and - upon looking
>> at sibling ones - also ends up leaving intact tests which I would expect
>> would then need adjustment (because of using the new alternative).
>>
>> In particular (all mentioned tests are in gcc.target/i386/)
>> - avx512f-andn-si-zmm-1.c (and its AVX512VL counterparts) fails because
>>   for whatever reason generated code reverts back to using vpbroadcastd,
>> - avx512f-andn-di-zmm-1.c, otoh, is unaffected (i.e. continues to use
>>   vpandnq with embedded broadcast),
>> - avx512f-andn-si-zmm-2.c doesn't use the new 4th insn alternative when
>>   at the same time a made-up DI variant of the test (akin to what might
>>   be an avx512f-andn-di-zmm-2.c testcase) does.
>> IOW: How is SI mode element size different here from DI mode one? Is
>> there anything wrong with the 4th alternative I'm adding, or is this
>> hinting at some anomaly elsewhere?
> __m512i is defined as __v8di, when it's used for _mm512_andnot_epi32,
> it's explicitlt converted to (__v16si) and creates an extra subreg
> which is not needed for DImode cases.
> And pass_combine try to match the below pattern but failed due to the
> condition REG_P (operands[1]) || REG_P (operands[2]). Here I think you
> want register_operand instead of REG_P.

Thanks, this has indeed made things match my expectations wrt testsuite
results. Sadly similar adjustments for other (new) insns didn't make
any difference with the further issues I'm facing. I may therefore need
to ask more questions; I hope they're not going to be too dumb.

Jan


Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-18 Thread Jan Beulich via Gcc
On 18.01.2024 06:34, LIU Hao wrote:
> My complete proposal can be found at 
> . 
> Some ideas actually 
> reflect the AT&T syntax. I hope it helps.

I'm sorry, but most of your proposal may even be considered for being
acceptable only if you would gain buy-off from the MASM guys. Anything
MASM treats as valid ought to be permitted by gas as well (within the
scope of certain divergence that cannot be changed in gas without
risking to break people's code). It could probably be considered to
introduce a "strict" mode of Intel syntax, following some / most of
what you propose; making this the default cannot be an option.

Commenting on individual aspects of your proposal is a little difficult,
as you didn't provide the proposal inline (and hence it cannot be easily
used as context in a reply). But to mention the imo worst aspect:
Declaring

mov eax, [rcx]

as invalid is a no-go. I also don't see how this would be related to the
issue at hand. What's in the square brackets may as well be a symbol
name, so requiring the "mode specifier" doesn't disambiguate things at
all.

Otoh the "offset" part of point 3 may be possible to accept even by
default, provided (didn't check) that current gas consistently rejects
that (as an invalid use of a register name).

One remark regarding the underlying pattern leading to the issue:
Personally I view it as questionable practice to have extern or static
variables in C code with names as short as register names are. Avoiding
them does not only avoid the issue here, but also is quite likely going
to improve the code (by having more descriptive variable names). And
automatic variables aren't affected aiui, so can remain short (after
all, commonly automatic variable names are as short as a single char).

That said, I can certainly also see how the introduction of new
registers can lead to new conflicts, which isn't nice. Iirc old 32-bit
MASM escaped this problem by requiring architecture extensions to be
explicitly enabled (may have changed in newer MASM). Gas, otoh, enables
everything by default (and I don't see how we could change that).

Jan


Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-18 Thread Jan Beulich via Gcc
On 19.01.2024 02:42, LIU Hao wrote:
> In addition, `as -msyntax=intel -mnaked-reg` doesn't seem to be equivalent to 
> `.intel_syntax noprefix`:
> 
> $ as -msyntax=intel -mnaked-reg <<< 'mov eax, DWORD PTR gs:0x48' -o a.o
> {standard input}: Assembler messages:
> {standard input}:1: Error: invalid use of register
> 
> $ as <<< '.intel_syntax noprefix;  mov eax, DWORD PTR gs:0x48' -o a.o && 
> objdump -Mintel -d a.o
> ...
>  <.text>:
>0: 65 8b 04 25 48 00 00moveax,DWORD PTR gs:0x48

This (the error above) looks like a bug to me; I'll look into where this
odd difference in behavior is coming from.

Jan


Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-19 Thread Jan Beulich via Gcc
On 18.01.2024 17:40, LIU Hao wrote:
> 在 2024-01-18 20:54, Jan Beulich 写道:
>> I'm sorry, but most of your proposal may even be considered for being
>> acceptable only if you would gain buy-off from the MASM guys. Anything
>> MASM treats as valid ought to be permitted by gas as well (within the
>> scope of certain divergence that cannot be changed in gas without
>> risking to break people's code). It could probably be considered to
>> introduce a "strict" mode of Intel syntax, following some / most of
>> what you propose; making this the default cannot be an option.
> 
> Thanks for your reply.
> 
> I have attached the Markdown source for that page, modified a few hours ago. 
> I am planning to make 
> some updates according to your advice tomorrow.

Just to mention it: Attaching is in no way better than providing a link,
commenting-wise.

> And yes, I am proposing a 'strict' mode, however not for humans, only for 
> compilers.
> 
> My first message references a GCC bug report, where the problematic symbol 
> `bx` comes from C source. 
> I have been aware of the `/APP` and `/NO_APP` markers in generated assembly, 
> so I suspect that GAS 
> should be able to tell which parts are generated from a compiler and which 
> parts are composed by 
> hand. The proposed strict mode may apply only to the output from GCC, which 
> are much more likely to 
> contain bad symbols, but are also more controllable on the GCC side.
> 
> I believe that skillful people who write x86 assembly have known that 
> `offset`, `shr`, `si` etc. are 
> 'bad' names for symbols. Therefore, it's like an issue there.
> 
> 
>> Commenting on individual aspects of your proposal is a little difficult,
>> as you didn't provide the proposal inline (and hence it cannot be easily
>> used as context in a reply). But to mention the imo worst aspect:
>> Declaring
>>
>>  mov eax, [rcx]
>>
>> as invalid is a no-go.
> 
> I agree. I am considering to declare the lack of a symbol as a special case.

Well, I took this as the simplest example. But clearly there should never
be a need for an assembly programmer to needlessly write "dword ptr" or
alike, when operand size is unambiguous. Limiting "strict mode" to compiler
output would take away concerns in this regard (as machine generated
assembly has no issue with uniformly adding such redundant specifiers, much
like in AT&T mode suffixes would typically be emitted even when not needed).
But I see a severe issue with your aim at confining strict mode to
compiler generated code only: In inline assembly (see your mentioning of
APP / NO_APP above) you still potentially reference C symbols. So the
ambiguities don't disappear in APP / NO_APP regions.

>> I also don't see how this would be related to the
>> issue at hand. What's in the square brackets may as well be a symbol
>> name, so requiring the "mode specifier" doesn't disambiguate things at
>> all.
> 
> If someone declares a variable called `rcx` in C, it has be translated to
> 
> mov eax, DWORD PTR rcx  # `movl rcx, %eax`
> 
> instead of
> 
> mov eax, DWORD PTR [rcx]# `movl (%rcx), %eax`

And an array happening to be indexed by rcx would then result in

mov eax, DWORD PTR rcx[rcx]# `movl rcx(%rcx), %eax`

? That's going to be confusing at best. I think this whole issue needs
taking care of differently, and iirc I did already suggest an alternative
in one of the bugzilla entries involved: Potentially ambiguous names
(which to a compiler may mean: all symbol names) ought to simply be
quoted, and it ought to be specified that quoted symbols are never
registers. Iirc this will require gas changes, yes, but it'll address all
ambiguities afaict.

Jan


Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-22 Thread Jan Beulich via Gcc
On 20.01.2024 13:40, LIU Hao wrote:
> 在 2024-01-19 17:13, Jan Beulich 写道:
>> But I see a severe issue with your aim at confining strict mode to
>> compiler generated code only: In inline assembly (see your mentioning of
>> APP / NO_APP above) you still potentially reference C symbols. So the
>> ambiguities don't disappear in APP / NO_APP regions.
> 
> My suggestion is that people who write inline assembly should have been aware 
> of the existence of 
> bad names, and should have been careful to avoid them.
> 
> 
>> And an array happening to be indexed by rcx would then result in
>>
>>  mov eax, DWORD PTR rcx[rcx]# `movl rcx(%rcx), %eax`
>>
>> ? That's going to be confusing at best. 
> 
> This is always confusing, no matter how it is written.
> 
>> I think this whole issue needs
>> taking care of differently, and iirc I did already suggest an alternative
>> in one of the bugzilla entries involved: Potentially ambiguous names
>> (which to a compiler may mean: all symbol names) ought to simply be
>> quoted, and it ought to be specified that quoted symbols are never
>> registers. Iirc this will require gas changes, yes, but it'll address all
>> ambiguities afaict.
> 
> The OP of GCC PR53929 said that 'the problem does _not_ go away even if I 
> quote the symbol name by 
> hand in the assembly output' which was 12 years ago. I tried my local 
> installation and quoting the 
> symbol turned out to avoid the issue:
> 
> > as --version
> GNU assembler (GNU Binutils) 2.41.0.20240108
> 
> > cat test.s
> .intel_syntax noprefix
> lea rax, "bx"[rip]
> 
> > as test.s -o test.o
> 
> > objdump -d test.o
> test.o: file format pe-x86-64
> (...)
>0:   48 8d 05 00 00 00 00learax,[rip+0x0]# 7 
> <.text+0x7>
>7:   90  nop

Right, I did some work in that direction a while ago. But iirc there are
still cases left to be addressed.

Jan


Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-23 Thread Jan Beulich via Gcc
On 23.01.2024 02:27, LIU Hao wrote:
> 在 2024-01-22 16:39, Jan Beulich 写道:
>> Right, I did some work in that direction a while ago. But iirc there are
>> still cases left to be addressed.
> 
> Attached is a draft patch for GCC, bootstrapped on {i686,x86_64}-w64-mingw32 
> with GCC 13.2 and 
> binutils 2.41.0.

Right, but this is very "draft". You can't blindly assume the gas you use
actually can deal with quotation.

> This addresses the issue when a bad name exists in the same translation unit. 
> In the case of an 
> external symbol there's still an error:
> 
> ```
> extern int bx;
> int get(const char* p) { return p[bx]; }
> ```
> 
> ```
> lh_mouse@lhmouse-pc ~/Desktop $ x86_64-w64-mingw32-gcc -S -o - -masm=intel 
> test.c | fgrep bx
>  mov rax, QWORD PTR .refptr.bx[rip]
>  .section.rdata$.refptr.bx, "dr"
>  .globl  .refptr.bx
> .refptr.bx:
>  .quad   bx

Sure, this one needs quoting then, too.

Jan

> lh_mouse@lhmouse-pc ~/Desktop $ x86_64-w64-mingw32-gcc  -masm=intel test.c | 
> fgrep bx
> C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s: Assembler messages:
> C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s:29: Error: invalid use of 
> register
> C:\Users\lh_mouse\AppData\Local\Temp\ccuyuu6c.s:29: Warning: register value 
> used as expression
> lh_mouse@lhmouse-pc ~/Desktop $
> ```
> 
> 
> 
> 



Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-23 Thread Jan Beulich via Gcc
On 23.01.2024 10:00, LIU Hao wrote:
> 在 2024-01-23 16:38, Jan Beulich 写道:
>> Right, but this is very "draft". You can't blindly assume the gas you use
>> actually can deal with quotation.
> 
> Let's assume that for the time being, but there's something else; see below.
> 
> 
>>> .refptr.bx:
>>>   .quad   bx
>>
>> Sure, this one needs quoting then, too.
> 
> The attached patch contains `&& name[0] != '*'` with a reason: In the 
> function `assemble_name_raw` 
> in 'gcc/varasm.cc', if `name` starts with a `*`, then its remaining part is 
> output without 
> decoration. I have no idea what `*` means; this `.quad bx` thing apparently 
> results from something like
> 
> assemble_name_raw (file, "*bx");
> 
> Quoting this would break the i686 DWARF2 code, which may contain an 
> arithmetic expression like
> 
> .long LXXYY-1# "LXXYY" minus one
> 
> If it was quoted like `.long "LXXYY-1"`, it would mean something very 
> different and cause linker errors.

Hmm, that would suggest to me that the Dwarf code abuses the interface.
A "name" certainly shouldn't be an expression. And hence the result of
the example ought to be

 .long "LXXYY"-1# "LXXYY" minus one

Jan


Re: RFC: Formalization of the Intel assembly syntax (PR53929)

2024-01-23 Thread Jan Beulich via Gcc
On 23.01.2024 10:21, LIU Hao wrote:
> 在 2024-01-23 17:03, Jan Beulich 写道:
>> Hmm, that would suggest to me that the Dwarf code abuses the interface.
>> A "name" certainly shouldn't be an expression. And hence the result of
>> the example ought to be
>>
>>   .long "LXXYY"-1# "LXXYY" minus one
> 
> So I shouldn't have checked for `*` right?

I don't know.

> The calls to `output_addr_const()` are from `dw2_assemble_integer (int size, 
> rtx x)` in 
> 'gcc/dwarf2asm.cc'. Now I need some directives on how to fix this; parsing 
> the symbol seems awkward.

Indeed.

Jan


Re: CREL relocation format for ELF

2024-03-28 Thread Jan Beulich via Gcc
On 28.03.2024 08:43, Fangrui Song wrote:
> On Fri, Mar 22, 2024 at 6:51 PM Fangrui Song  wrote:
>>
>> On Thu, Mar 14, 2024 at 5:16 PM Fangrui Song  wrote:
>>>
>>> The relocation formats REL and RELA for ELF are inefficient. In a
>>> release build of Clang for x86-64, .rela.* sections consume a
>>> significant portion (approximately 20.9%) of the file size.
>>>
>>> I propose RELLEB, a new format offering significant file size
>>> reductions: 17.2% (x86-64), 16.5% (aarch64), and even 32.4% (riscv64)!
>>>
>>> Your thoughts on RELLEB are welcome!
>>>
>>> Detailed analysis:
>>> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf
>>> generic ABI (ELF specification):
>>> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw
>>> binutils feature request: 
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=31475
>>> LLVM: 
>>> https://discourse.llvm.org/t/rfc-relleb-a-compact-relocation-format-for-elf/77600
>>>
>>> Implementation primarily involves binutils changes. Any volunteers?
>>> For GCC, a driver option like -mrelleb in my Clang prototype would be
>>> needed. The option instructs the assembler to use RELLEB.
>>
>> The format was tentatively named RELLEB. As I refine the original pure
>> LEB-based format, “RELLEB” might not be the most fitting name.
>>
>> I have switched to SHT_CREL/DT_CREL/.crel and updated
>> https://maskray.me/blog/2024-03-09-a-compact-relocation-format-for-elf
>> and
>> https://groups.google.com/g/generic-abi/c/yb0rjw56ORw/m/eiBcYxSfAQAJ
>>
>> The new format is simpler and better than RELLEB even in the absence
>> of the shifted offset technique.
>>
>> Dynamic relocations using CREL are even smaller than Android's packed
>> relocations.
>>
>> // encodeULEB128(uint64_t, raw_ostream &os);
>> // encodeSLEB128(int64_t, raw_ostream &os);
>>
>> Elf_Addr offsetMask = 8, offset = 0, addend = 0;
>> uint32_t symidx = 0, type = 0;
>> for (const Reloc &rel : relocs)
>>   offsetMask |= crels[i].r_offset;
>> int shift = std::countr_zero(offsetMask)
>> encodeULEB128(relocs.size() * 4 + shift, os);
>> for (const Reloc &rel : relocs) {
>>   Elf_Addr deltaOffset = (rel.r_offset - offset) >> shift;
>>   uint8_t b = deltaOffset * 8 + (symidx != rel.r_symidx) +
>>   (type != rel.r_type ? 2 : 0) + (addend != rel.r_addend ? 4 : 
>> 0);
>>   if (deltaOffset < 0x10) {
>> os << char(b);
>>   } else {
>> os << char(b | 0x80);
>> encodeULEB128(deltaOffset >> 4, os);
>>   }
>>   if (b & 1) {
>> encodeSLEB128(static_cast(rel.r_symidx - symidx), os);
>> symidx = rel.r_symidx;
>>   }
>>   if (b & 2) {
>> encodeSLEB128(static_cast(rel.r_type - type), os);
>> type = rel.r_type;
>>   }
>>   if (b & 4) {
>> encodeSLEB128(std::make_signed_t(rel.r_addend - addend), os);
>> addend = rel.r_addend;
>>   }
>> }
>>
>> ---
>>
>> While alternatives like PrefixVarInt (or a suffix-based variant) might
>> excel when encoding larger integers, LEB128 offers advantages when
>> most integers fit within one or two bytes, as it avoids the need for
>> shift operations in the common one-byte representation.
>>
>> While we could utilize zigzag encoding (i>>31) ^ (i<<1) to convert
>> SLEB128-encoded type/addend to use ULEB128 instead, the generate code
>> is inferior to or on par with SLEB128 for one-byte encodings.
> 
> 
> We can introduce a gas option --crel, then users can specify `gcc
> -Wa,--crel a.c` (-flto also gets -Wa, options).
> 
> I propose that we add another gas option --implicit-addends-for-data
> (does the name look good?) to allow non-code sections to use implicit
> addends to save space
> (https://sourceware.org/PR31567).
> Using implicit addends primarily benefits debug sections such as
> .debug_str_offsets, .debug_names, .debug_addr, .debug_line, but also
> data sections such as .eh_frame, .data., .data.rel.ro, .init_array.
> 
> -Wa,--implicit-addends-for-data can be used on its own (6.4% .o
> reduction in a clang -g -g0 -gpubnames build)

And this option will the switch from RELA to REL relocation sections,
effectively in violation of most ABIs I'm aware of?

Furthermore, why just data? x86 at least could benefit almost as much
for code. Hence maybe better --implicit-addends=data, with an
option for architectures to also permit --implicit-addends=text.

Jan

>   or together with
> CREL to achieve more incredible size reduction, one single byte for
> most .debug_* relocations!
> With CREL, concerns of debug section relocations will become a thing
> of the past.



Re: Patches submission policy change

2024-04-03 Thread Jan Beulich via Gcc
On 03.04.2024 10:22, Christophe Lyon wrote:
> Dear release managers and developers,
> 
> TL;DR: For the sake of improving precommit CI coverage and simplifying
> workflows, I’d like to request a patch submission policy change, so
> that we now include regenerated files. This was discussed during the
> last GNU toolchain office hours meeting [1] (2024-03-28).
> 
> Benefits or this change include:
> - Increased compatibility with precommit CI
> - No need to manually edit patches before submitting, thus the “git
> send-email” workflow is simplified
> - Patch reviewers can be confident that the committed patch will be
> exactly what they approved
> - Precommit CI can test exactly what has been submitted
> 
> Any concerns/objections?

Yes: Patch size. And no, not sending patches inline is bad practice.
Even assuming sending patches bi-modal (inline and as attachment) works
(please indicate whether that's the case), it would mean extra work on
the sending side.

Jan


Re: Patches submission policy change

2024-04-03 Thread Jan Beulich via Gcc
On 03.04.2024 10:45, Jakub Jelinek wrote:
> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>> Any concerns/objections?
> 
> I'm all for it, in fact I've been sending it like that myself for years
> even when the policy said not to.  In most cases, the diff for the
> regenerated files is very small and it helps even in patch review to
> actually check if the configure.ac/m4 etc. changes result just in the
> expected changes and not some unrelated ones (e.g. caused by user using
> wrong version of autoconf/automake etc.).
> There can be exceptions, e.g. when in GCC we update from a new version
> of Unicode, the regenerated ucnid.h diff can be large and
> uname2c.h can be huge, such that it can trigger the mailing list size
> limits even when the patch is compressed, see e.g.
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
> But I think most configure or Makefile changes should be pretty small,
> usual changes shouldn't rewrite everything in those files.

Which may then call for a policy saying "include generate script diff-s,
but don't include generated data file ones"? At least on the binutils
side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
possible by having something along the lines of "maintainer mode light".

Jan


Re: Patches submission policy change

2024-04-03 Thread Jan Beulich via Gcc
On 03.04.2024 10:57, Richard Biener wrote:
> On Wed, 3 Apr 2024, Jan Beulich wrote:
>> On 03.04.2024 10:45, Jakub Jelinek wrote:
>>> On Wed, Apr 03, 2024 at 10:22:24AM +0200, Christophe Lyon wrote:
>>>> Any concerns/objections?
>>>
>>> I'm all for it, in fact I've been sending it like that myself for years
>>> even when the policy said not to.  In most cases, the diff for the
>>> regenerated files is very small and it helps even in patch review to
>>> actually check if the configure.ac/m4 etc. changes result just in the
>>> expected changes and not some unrelated ones (e.g. caused by user using
>>> wrong version of autoconf/automake etc.).
>>> There can be exceptions, e.g. when in GCC we update from a new version
>>> of Unicode, the regenerated ucnid.h diff can be large and
>>> uname2c.h can be huge, such that it can trigger the mailing list size
>>> limits even when the patch is compressed, see e.g.
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636427.html
>>> https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636426.html
>>> But I think most configure or Makefile changes should be pretty small,
>>> usual changes shouldn't rewrite everything in those files.
>>
>> Which may then call for a policy saying "include generate script diff-s,
>> but don't include generated data file ones"? At least on the binutils
>> side, dealing (for CI) with what e.g. opcodes/*-gen produce ought to be
>> possible by having something along the lines of "maintainer mode light".
> 
> I'd say we should send generated files when it fits the mailing list
> limits (and possibly simply lift those limits?).

Well, that would allow patches making it through, but it would still
severely increase overall size. I'm afraid more people than not also
fail to cut down reply context, so we'd further see (needlessly) huge
replies to patches as well.

Additionally - how does one up front determine "fits the mailing list
limits"? My mail UI (Thunderbird) doesn't show me the size of a message
until I've actually sent it.

>  As a last resort
> do a series splitting the re-generation out (but I guess this would
> confuse the CI as well and of course for the push you want to squash
> again).

Yeah, unless the CI would only ever test full series, this wouldn't help.
It's also imo even more cumbersome than simply stripping the generated
file parts from emails.

Jan


Re: Patches submission policy change

2024-04-04 Thread Jan Beulich via Gcc
On 03.04.2024 15:11, Christophe Lyon wrote:
> On Wed, 3 Apr 2024 at 10:30, Jan Beulich  wrote:
>>
>> On 03.04.2024 10:22, Christophe Lyon wrote:
>>> Dear release managers and developers,
>>>
>>> TL;DR: For the sake of improving precommit CI coverage and simplifying
>>> workflows, I’d like to request a patch submission policy change, so
>>> that we now include regenerated files. This was discussed during the
>>> last GNU toolchain office hours meeting [1] (2024-03-28).
>>>
>>> Benefits or this change include:
>>> - Increased compatibility with precommit CI
>>> - No need to manually edit patches before submitting, thus the “git
>>> send-email” workflow is simplified
>>> - Patch reviewers can be confident that the committed patch will be
>>> exactly what they approved
>>> - Precommit CI can test exactly what has been submitted
>>>
>>> Any concerns/objections?
>>
>> Yes: Patch size. And no, not sending patches inline is bad practice.
> Not sure what you mean? Do you mean sending patches as attachments is
> bad practice?

Yes. It makes it difficult to reply to them (with proper reply context).

>> Even assuming sending patches bi-modal (inline and as attachment) works
>> (please indicate whether that's the case), it would mean extra work on
>> the sending side.
>>
> For the CI perspective, we use what patchwork is able to detect as patches.
> Looking at recent patches submissions, it seems patchwork is able to
> cope with the output of git format-patch/git send-email, as well as
> attachments.
> There are cases where patchwork is not able to detect the patch, but I
> don't know patchwork's exact specifications.

Question was though: If a patch was sent inline plus attachment, what
would CI use as the patch to apply? IOW would it be an option to
attach the un-stripped patch, while inlining the stripped one?

Jan