Re: Discussion about arm/aarch64 testcase failures seen with patch for PR111673

2023-12-15 Thread Surya Kumari Jangala via Gcc
Hi Richard,
Here are more details about the testcase failure and my analysis/fix:

Testcase:

void f(int *i)
{
if (!i)
return;
else
{
__builtin_printf("Hi");
*i=0;
}
}

--

Assembly w/o patch:
cbz x0, .L7
stp x29, x30, [sp, -32]!
mov x29, sp
str x19, [sp, 16]
mov x19, x0
adrpx0, .LC0
add x0, x0, :lo12:.LC0
bl  printf
str wzr, [x19]
ldr x19, [sp, 16]
ldp x29, x30, [sp], 32
ret
.p2align 2,,3
.L7:
ret

---

Assembly w/ patch:
stp x29, x30, [sp, -32]!
mov x29, sp
str x0, [sp, 24]
cbz x0, .L1
adrpx0, .LC0
add x0, x0, :lo12:.LC0
bl  printf
ldr x1, [sp, 24]
str wzr, [x1]
.L1:
ldp x29, x30, [sp], 32
ret


As we can see above, w/o patch the test case gets shrink wrapped.

Input RTL to the LRA pass (the RTL is same both w/ and w/o patch):

BB2:
  set r95, x0
  set r92, r95
  if (r92 eq 0) jump BB4
BB3:
  set x0, symbol-ref("Hi")
  x0 = call printf
  set mem(r92), 0
BB4:
  ret


Register assignment by IRA:
w/o patch:
  r92-->x19
  r95-->x0
  r94-->x0

w/ patch:
  r92-->x1
  r95-->x0
  r94-->x0


RTL after LRA:

w/o patch:
BB2:
  set x19, x0
  if (x19 eq 0) jump BB4
BB3:
  set x0, symbol-ref("Hi")
  x0 = call printf
  set mem(x19), 0
BB4:
  ret


w/ patch:
BB2:
  set x1, x0
  set mem(sp+24), x1
  if (x1 eq 0) jump BB4
BB3:
  set x0, symbol-ref("Hi")
  x0 = call printf
  set x1, mem(sp+24)
  set mem(x1), 0
BB4:
  ret


The difference between w/o patch and w/ patch is that w/o patch, a callee-save
register (x19) is chosen to hold the value of x0 (input parameter register). 
While
w/ patch, a caller-save register (x1) is chosen.

W/o patch, during the shrink wrap pass, first copy propagation is done and
the 'if' insn in BB2 is changed as follows:
  set x19, x0
  if (x19 eq 0) jump BB4

changed to:
  set x19, x0
  if (x0 eq 0) jump BB4   

Next, the insn "set x19, x0" is moved down the cfg to BB3. Since x19 is a
callee-save register, prolog gets generated in BB3 thereby resulting in
successful shrink wrapping.

W/ patch, during the shrink wrap pass, copy propagation changes BB2 as follows:
  set x1, x0
  set mem(sp+24), x1
  if (x1 eq 0) jump BB4

changed to:
  set x1, x0
  set mem(sp+24), x0
  if (x0 eq 0) jump BB4

However the store insn (set mem[sp+24], x0) cannot be moved down to BB3.
hence prolog gets generated in BB2 itself due to the use of 'sp'. Thereby
shrink wrap fails.

The store insn (which basically saves x1 to stack) is generated by the
LRA pass. This insn is needed because x1 is a caller-save register and we
have a call insn that will clobber this register. However, the store insn is 
generated
in the entry BB (BB2) instead of in BB3 which has the call insn. If the store
is generated in BB3, then the testcase will be shrink wrapped successfully.
In fact, it is more efficient if the store occurs only in the path containing
the printf call instead of occurring in the entry bb.

The reason why LRA generates the store insn in the entry bb is as follows:
LRA emits insns to save caller-save registers in the inheritance/splitting pass.
In this pass, LRA builds EBBs (Extended Basic Block) and traverses the insns in
the EBBs in reverse order from the last insn to the first insn. When LRA sees a
write to a pseudo (that has been assigned a caller-save register), and there is 
a
read following the write, with an intervening call insn between the write and 
read,
then LRA generates a spill immediately after the write and a restore immediately
before the read. The spill is needed because the call insn will clobber the
caller-save register.

In the above testcase, LRA forms two EBBs: the first EBB contains BB2 & BB3 
while
the second EBB contains BB4. 

In BB2, there is a write to x1 in the insn : 
set r92, r95 //r92 is assigned x1 and r95 is assigned x0

In BB3, there is a read of x1 after the call
insn.
set mem(r92), 0   // r92 is assigned x1

So LRA generates a spill in BB2 after the write to x1.

I have a patch (bootstrapped and regtested on powerpc) that makes changes in
LRA to save caller-save registers before a call instead of after the write to 
the
caller-save register. With this patch, both the above test gets successfully
shrink wrapped. After committing the patch for PR111673, I plan to get the 
LRA fix reviewed.

Please let me know if you need more information.

Regards,
Surya


On 14/12/23 9:41 pm, Richard Earnshaw (lists) wrote:
> On 14/12/2023 07:17, Surya Kumari Jangala via Gcc wrote:
>> Hi Richard,
>> Thanks a lot for your response!
>>
>> Another failure reported by the Linaro CI is as follows:
>>
>> Running gcc:gcc.dg/dg.exp ...
>> FAIL: gcc.dg/ira-shrinkwrap-prep-1.c scan-rtl-dump pro_and_epilogue 
>> "Performing shrink-wrapping"

Re: Question about creating global varaiable during IPA PASS.

2023-12-15 Thread Thomas Schwinge
Hi Hanke!

On 2023-12-13T17:04:57+0800, Hanke Zhang via Gcc  wrote:
> Hi, I'm trying to create a global variable in my own PASS which
> located at the LATE_IPA_PASSES. (I'm using GCC 10.3.0.)

I can't comment on IPA aspects, or whether something was different on
oldish GCC 10 (why using that one, by the way?), and I've not actually
verified what you're doing here:

> And after creating it, I added the attributes like the following.
>
> // 1. create the var
> tree new_name = get_identifier (xx);
> tree new_type = build_pointer_type (xx);
> tree new_var = build_decl (UNKNOWN_LOCATION, VAR_DECL, new_name, new_type);
> add_attributes (new_var);
>
> static void
> add_attributes (tree var)
> {
> DECL_ARTIFICIAL (var) = 1;
> DECL_EXTERNAL (var) = 0;
> TREE_STATIC (var) = 1;
> TREE_PUBLIC (var) = 1;
> TREE_USED (var) = 1;
> DECL_CONTEXT (var) = NULL_TREE;
> TREE_THIS_VOLATILE (var) = 0;
> TREE_ADDRESSABLE (var) = 0;
> TREE_READONLY (var) = 0;
> if (is_global_var (var))
>   set_decl_tls_model (var, TLS_MODEL_NONE);
> }
>
> But when I try to compile some example files with -flto, error occurs.
>
> /usr/bin/ld: xxx.ltrans0.ltrans.o: in function `xxx':
> xxx.c: undefined reference to `glob_var'
> xxx.c: undefined reference to `glob_var'
> xxx.c: undefined reference to `glob_var'
>
> Here `glob_var' is the global varaiable created in my PASS.
>
> I would like to ask, am I using some attributes incorrectly?

..., but are you maybe simply missing to
'varpool_node::add (var);' or 'varpool_node::finalize_decl (var);' or
something like that?  See other uses of those, and description in
'gcc/cgraph.h', 'struct [...] varpool_node':

  /* Add the variable DECL to the varpool.
 Unlike finalize_decl function is intended to be used
 by middle end and allows insertion of new variable at arbitrary point
 of compilation.  */
  static void add (tree decl);

  /* Mark DECL as finalized.  By finalizing the declaration, frontend 
instruct
 the middle end to output the variable to asm file, if needed or 
externally
 visible.  */
  static void finalize_decl (tree decl);

If that's not it, we'll have to look in more detail.


Grüße
 Thomas


Request for Direction.

2023-12-15 Thread David H. Lynch Jr. via Gcc
I am part of a project developing content addressable memory.  I am the
2nd author for a paper on this presented at MEMSYS 2023, and with
additions likely to be accepted by ACM shortly. 
https://www.memsys.io/wp-content/uploads/2023/09/10.pdf

My role is to develop software to demonstrate the benefits of the
hardware/memory. A part of that is implimenting language extensions to
provide native support for content addressible memory.  
And then to modify some applications to utilize those extensions and
demonstrate the value. 

We have already developed a C/C++ preprocessor, that is mostly
functional,  but are looking to move to altering some actual compilers.

At this time this work is purely proof of the value proposition to
content addressible memory. Presuming that our work proves valuable, 
that will provide an impetus for further works. 

Right now I am just focused on some means to deliver support. 

So I am looking for direction regarding how to easily extend gcc to
provide support for content addressible memory.  

Basically I need to be able to tag variables as Content addressable,
rather than normally addressed, and then change code generation for CA
variables such that they reference memory by key rather than address. 

Is there a guide anywhere to developing language extensions for GCC
and/or making changes to code generation ?

I am a competent embedded software developer, with some ancient
experience with compilers, but starting from scratch with GCC. 
Pointers would be appreicated.  Help would be appreciated. While I am
leading this part of the project, there is some funding available for
assistance. 

Some recent languages have some form of content based addressing, but
this is implimented by the CPU.  We have altered the address logic of
memory to alter the way an "address" is handled ushc that it can
function as a key rather than a traditional linear address. 

We have demonstrated Sort in Memory with relatively simple changes to
memory addressing logic, and we have extended the addressing
capabilities to things like sparse array notation which has
applications to AI. 

We are not looking to feed anything into the GCC distribution. 
But the software  will be open source. 



  








 





  


gcc-12-20231215 is now available

2023-12-15 Thread GCC Administrator via Gcc
Snapshot gcc-12-20231215 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20231215/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 5c3ab44771d0524140cf2ce5de594fcf7fefcd6f

You'll find:

 gcc-12-20231215.tar.xz   Complete GCC

  SHA256=d4781bdacb5dc60f013067fab33100f8b1dc142e15e4a913d26260cd6d790f4b
  SHA1=3eadf821d9a547482620710861e707c29787eddd

Diffs from 12-20231208 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Request for Direction.

2023-12-15 Thread James K. Lowden
On Fri, 15 Dec 2023 14:43:22 -0500
"David H. Lynch Jr. via Gcc"  wrote:

> Right now I am just focused on some means to deliver support. 

Hi David, 

My colleague Bob Dubner and I have been extending GCC every day for
the last two years.  I wonder if we might be of some use to you.  

I only faintly hope our project can benefit from your work. We're
adding a Cobol front end to GCC.  Cobol has built-in sort functions,
both on disk and in memory, and a rich data-description language.
There is more potential there than might seem at first blush, and I
would welcome the opportunity to explain in detail if you're
interested.  

If your objective is simply to extend C to support content addressable
memory, then we might still be of some help.  I don't know anything,
really, about the C front-end, but Bob has experience getting
Generic to generate code.  He might be able to answer some of your
questions, if nothing else.

Let me know what you think.  

Kind regards, 

--jkl