Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

Cesar Philippidis Fri, 30 Mar 2018 08:02:13 -0700

On 03/30/2018 07:45 AM, Tom de Vries wrote:
> On 03/30/2018 03:07 AM, Tom de Vries wrote:
>> On 03/02/2018 05:55 PM, Cesar Philippidis wrote:
>>> As a follow up patch will show, the nvptx BE falls back to using
>>> vector_length = 32 when a vector loop is nested inside a worker loop.
>>
>> I disabled the fallback, and analyzed the vred2d-128.c illegal memory
>> access execution failure.
>>
>> I minimized that down to this ptx:
>> ...
>> .shared .align 8 .u8 __oacc_bcast[176];
>>
>> {
>>    {
>>      .reg .u32 %x;
>>      mov.u32 %x,%tid.x;
>>      setp.ne.u32 %r86,%x,0;
>>    }
>>
>>    {
>>      .reg .u32 %tidy;
>>      .reg .u64 %t_bcast;
>>      .reg .u64 %y64;
>>      mov.u32 %tidy,%tid.y;
>>      cvt.u64.u32 %y64,%tidy;
>>      add.u64 %y64,%y64,1;
>>      cvta.shared.u64 %t_bcast,__oacc_bcast;
>>      mad.lo.u64 %r66,%y64,88,%t_bcast;
>>    }
>>
>>    @ %r86 bra $L28;
>>    st.u32 [%r66+80],0;
>>   $L28:
>>    ret;
>> }
>> ...
>>
>> The ptx is called with 2 workers and 128 vector_length.
>>
>> So, 2 workers mean %tid.y has values 0 and 1.
>> Then %y64 has values 1 and 2.
>> Then %r66 has values __oacc_bcast + (1 * 88) and __oacc_bcast + (2 * 88).
>> Then the st.u32 accesss __oacc_bcast + (1 * 88) + 80 and __oacc_bcast
>> + (2 * 88) + 80.
>>
>> So we're accessing memory at location 256, while the __oacc_bcast is
>> only 176 bytes big.
>>
>> I formulated this assert that AFAIU detects this situation in the
>> compiler:
>> ...
>> @@ -1125,6 +1125,8 @@ nvptx_init_axis_predicate (FILE *file, int
>> regno, const char *name)
>>     fprintf (file, "\t}\n");
>>   }
>>
>> +static int nvptx_mach_max_workers ();
>> +
>>   /* Emit code to initialize OpenACC worker broadcast and synchronization
>>      registers.  */
>>
>> @@ -1148,6 +1150,7 @@ nvptx_init_oacc_workers (FILE *file)
>>                 "// vector broadcast offset\n",
>>                 REGNO (cfun->machine->bcast_partition),
>>                 oacc_bcast_partition);
>> +      gcc_assert (oacc_bcast_partition * (nvptx_mach_max_workers () +
>> 1) <= oacc_bcast_size);
>>       }
>>     if (cfun->machine->sync_bar)
>>       fprintf (file, "\t\tadd.u32\t\t%%r%d, %%tidy, 1; "
>> ...
>>
>> The assert is not triggered when the fallback is used.
> 
> I've tracked the problem down to:
> ...
>> -      if (oacc_bcast_size <
>> data.offset)                                                                 
>>         
>> -       oacc_bcast_size =
>> data.offset;                                                                 
>>            
>> +      if (oacc_bcast_partition <
>> data.offset)                                                                 
>>    
>> +      
>> {                                                                            
>>                              
>> +         int psize =
>> data.offset;                                                                 
>>                
>> +         psize = (psize + oacc_bcast_align - 1) & ~(oacc_bcast_align
>> - 1);                                        +        
>> oacc_bcast_partition =
>> psize;                                                                       
>>     
>> +         oacc_bcast_size = psize * (nvptx_mach_max_workers () +
>> 1);                                               +      
>> }                                                                            
>>                              
> 
> ...
> 
> We hit this if clause for a first compiled function, with num_workers(1).
> 
> This sets oacc_bcast_partition and oacc_bcast_size as required for that
> functions.
> 
> Then we hit this if clause for a second compiled function, with
> num_workers (2).
> 
> We need oacc_bcast_size updated, but the 'oacc_bcast_partition <
> data.offset' is false, so the update doesn't happen.
> 
> I managed to fix this by making the code unconditional, and using MAX to
> update oacc_bcast_partition and oacc_bcast_size.


It looks like that's fallout from this patch
<https://gcc.gnu.org/ml/gcc-patches/2018-03/msg01212.html>. I should
have checked that patch with the vector length fallback disabled.

Cesar

Re: [og7] vector_length extension part 2: Generalize state propagation and synchronization

Reply via email to