[VTA merge] Some dwarf problems

2009-09-21 Thread Hariharan

Hi Alexandre,
I was having some trouble with dwarf sections in picochip port. I am not 
a dwarf expert, but when i looked at the changes in r151312, file 
dwarf2out.c, function dwarf2out_var_location on line 17965, we have


 sprintf (loclabel, "%s-1", last_label);
 ...

What is last_label-1 supposed to point to?

Thanks for your help.

Hari




Re: [VTA merge] Some dwarf problems

2009-09-21 Thread Hariharan

Thanks for the pointer, Jakub.

Cheers
Hari

Jakub Jelinek wrote:

On Mon, Sep 21, 2009 at 05:04:27PM +0100, Hariharan wrote:
  

Hi Alexandre,
I was having some trouble with dwarf sections in picochip port. I am not  
a dwarf expert, but when i looked at the changes in r151312, file  
dwarf2out.c, function dwarf2out_var_location on line 17965, we have


 sprintf (loclabel, "%s-1", last_label);
 ...

What is last_label-1 supposed to point to?



See http://gcc.gnu.org/ml/gcc-patches/2009-06/msg01317.html
for details.  1 byte before last_label label (which is usually right after a
call insn).  The intent is to have something in the middle of a call insn.

Jakub
  


Re: fbranch-probabilities bug

2009-01-08 Thread Hariharan

Hi Seongbae,
Does that mean that someone cant use the profile just to annotate 
branches (and get better code by that), without having to get the 
additional baggage of "unroll-loops", "peel-loops" etc?


In my case, i am interested in not bloating the code size, but get any 
performance that is to be had from profiling. Is that possible?


Note: My profile generate phase was also just -fprofile-arcs since i am 
not interested in other kinds of profile.


Cheers
Hari

Seongbae Park ??? ??? wrote:

This is the intended behavior, though now I see that the documentation
isn't very clear.
You need to use -fprofile-use - the typical usage scenario is to
compile with -fprofile-generate
to build an executable to do profile collection, and then compile with
-fprofile-use
to build optimized code using the profile data.

Seongbae

On Thu, Jan 8, 2009 at 6:30 AM, Hariharan Sandanagobalane
 wrote:

Hi Seongbae,
I was doing some work on profiling for picochip, when i noticed what looks
to me like a bug. It looks to me that using fbranch-probabilities on the
commandline (after a round of profile-generate or profile-arcs) would just
not work on any target. Reason..

Coverage.c:1011

 if (flag_profile_use)
   read_counts_file ();

Should this not be

 if (flag_profile_use || flag_branch_probabilities)  // Maybe more flags
   read_counts_file ();

??

Of course, i hit the problem later on since the counts were not read, it
just assumed that the .gcda file were not available, when it actually was.

Thanks
Hari



Re: fbranch-probabilities bug

2009-01-09 Thread Hariharan



Seongbae Park ??? ??? wrote:

On Thu, Jan 8, 2009 at 10:11 AM, Hariharan  wrote:

Hi Seongbae,
Does that mean that someone cant use the profile just to annotate branches
(and get better code by that), without having to get the additional baggage
of "unroll-loops", "peel-loops" etc?


You can do that by selectively turning optimizations off (e.g.
-fprofile-use -fno-unroll-loops -fno-peel-loops ).


In my case, i am interested in not bloating the code size, but get any
performance that is to be had from profiling. Is that possible?

Note: My profile generate phase was also just -fprofile-arcs since i am not
interested in other kinds of profile.


Have you measured the impact on the performance and the code size from
using full -fprofile-generate/-fprofile-use ?


Well, No... I cannot. I have just about managed to get -fprofile-arcs 
and -fbranch-probabilities work with picochip. The code runs under a 
simulator and i have had to hack both GCC code and libgcov code to get 
the simulator output the profile in the format that would be acceptable 
to Gcc in the second run. Doing the same with the additional profiles is 
going to be a hard task.


Its a target that has MEM processor versions, which has 6KB (yes KB not 
MB or GB) Instruction memory, at best. So, you can understand why code 
size is very important to us.


Anyway, it looks to me that we might get very little performance benefit 
without bloating the code with PBO, so that makes it very unattractive 
for me to do anything along this line.


By the way, your changes to smoothen the profile information in GCC 4.4 
helped a lot. From where I had 13 profiling tests(drawn from GCC dejagnu 
testsuite) failing in GCC 4.3.2 with "corrupted profile info" messages, 
it got down to just one failure in GCC 4.4. Thanks for that.


Cheers
Hari


If yes, and you have seen any performance degradation or unnecessary
code bloat from other optimization,
please file a bug.
If not, then I'd say you probably want to try measuring it - in
particular, value profiling has been
becoming more and more useful. And in my experience, majority of the
code size increase as well as the performance benefit
with -fprofile-use comes from extra inlining (which -fprofile-arcs
then -fbranch-probabilities also enable).

Seongbae


GCC Profile base optimizations using simulator profile

2009-01-21 Thread Hariharan

Hi,
I just wanted to see if there are others out there who get profile 
information from a simulator and feed that information back for GCC's 
PBO, in the .gcda format.


I had tried this on picoChip, by changing the instrumentation code in 
GCC for fprofile-arcs and got edge profile working quite well (but GCC 
4.4 would not accept just edge profile). I have never attempted to try 
others (indirect call, value profile etc), but would like to know the 
results of anyone who might have tried.


Thanks.

Hari


Inline limits

2009-01-26 Thread Hariharan

Hi,
I ran into some code-size/stack size bloat using -Os for a piece of 
code. This seemed to happen only when certain single call-site functions 
are defined "static" and not otherwise. On investigating further on 
this, i see that the inline_functions_called_once seems to rely only on 
"cgraph_check_inline_limits", whereas other inlining code go through 
more rigorous cost-benefit analysis to decide on inlining (especially 
with INLINE_SIZE).


I have been looking at re-setting some of the parameters used in 
"cgraph_check_inline_limits" for inlining for picochip. I could not 
understand the way PARAM_LARGE_FUNCTION_GROWTH and 
PARAM_STACK_FRAME_GROWTH are used in this function. Both of these 
parameters are used as a fraction of the bigger (or to) function.


I want to be able to say, if the inlining would increase the code size 
or stack frame size, dont inline. Otherwise, go ahead an inline. Of 
course, i am compiling this code at -Os, so this condition is probably 
obvious. Can you advice me on how to use these parameters to do that?


A side question... Are 'static' single call-site functions always 
inlined? I would hope not (under -Os), but just checking.


Thanks
Hari

PS: If this were to be considered a "bug", i will file a report with a 
testcase.




scheduler dependency bug in the presence of var_location and unspec_volatile

2010-04-28 Thread Hariharan

Hello,
I saw a bug in sched1 where it reorders two unspec_volatile 
instructions. These instructions do port communications (from the same 
port) and doing them in the wrong order is unacceptable. I digged a bit 
deeper to see what is happening. Going into sched1, the relevant bit of 
basic block is


(debug_insn 184 183 185 12 autogenerated_UlSymbolRateCtrlDummy.c:58 
(var_location:SI converter$rawValue (unspec_volatile:SI [

   (const_int 3 [0x3])
   ] 8)) -1 (nil))

(insn 185 184 186 12 
/home/gccuser/systems/products/lib/umtsfdd/rel8_200903/uplink/UlSymbolRate/src/UlSymbolRateCtrlDummy.c:58 
(set (subreg:SI (reg/v:DI 299 [ trchHeader ]) 
0)   
(unspec_volatile:SI [

   (const_int 3 [0x3])
   ] 8)) 80 {commsGet} (nil))

(note 186 185 188 12 NOTE_INSN_DELETED)

(note 188 186 189 12 NOTE_INSN_DELETED)

(insn 189 188 190 12 
/home/gccuser/systems/products/lib/umtsfdd/rel8_200903/uplink/UlSymbolRate/src/UlSymbolRateCtrlDummy.c:58 
(set (reg:HI 280 [ trchHeader$D1530$channelCodingEnum ])

   (lshiftrt:HI (subreg:HI (reg/v:DI 299 [ trchHeader ]) 0)
   (const_int 14 [0xe]))) 64 {lshrhi3} (nil))

(debug_insn 190 189 191 12 (var_location:QI 
trchHeader$D1530$channelCodingEnum (subreg:QI (reg:HI 280 [ 
trchHeader$D1530$channelCodingEnum ]) 0)) -1 (nil))


(debug_insn 191 190 192 12 (var_location:QI 
trchHeader$D1530$channelCodingEnum (subreg:QI (reg:HI 280 [ 
trchHeader$D1530$channelCodingEnum ]) 0)) -1 (nil))


(note 192 191 193 12 NOTE_INSN_DELETED)

(debug_insn 193 192 194 12 autogenerated_UlSymbolRateCtrlDummy.c:58 
(var_location:SI converter$rawValue (unspec_volatile:SI [

   (const_int 3 [0x3])
   ] 8)) -1 (nil))

(insn 194 193 195 12 
/home/gccuser/systems/products/lib/umtsfdd/rel8_200903/uplink/UlSymbolRate/src/UlSymbolRateCtrlDummy.c:59 
(set (subreg:SI (reg/v:DI 299 [ trchHeader ]) 4)

   (unspec_volatile:SI [
   (const_int 3 [0x3])
   ] 8)) 80 {commsGet} (nil))


Note that 185 and 194 are the actual port communication instructions 
here. If i look at the scheduler forward dependency for this basic block 
(at sched1), it looks like this


;;   ==
;;   -- basic block 12 from 185 to 212 -- before reload
;;   ==

;;   --- forward dependences: 

;;   --- Region Dependences --- b 12 bb 0
;;  insn  codebb   dep  prio  cost   reservation
;;    --   ---       ---
;;  1858012 0 2 1   slot1   : 212 193 191 
190 189

;;  1896412 1 1 1   slot0|slot1 : 212 193 191 190
;;  190-112 2 0 0   nothing : 193 191
;;  191-112 3 0 0   nothing : 193
;;  193-112 4 0 0   nothing : 199 194
;;  1948012 0 5 1   slot1   : 212 206 205 
204 203 202 201 200 199 198
;;  1986412 1 4 1   slot0|slot1 : 212 206 202 
200 199

;;  199-112 3 0 0   nothing : 206 200
;;  200-112 3 0 0   nothing : 206 201
;;  201-112 2 0 0   nothing : 206 202
;;  202-112 3 0 0   nothing : 206 203
;;  203-112 2 0 0   nothing : 206 204
;;  204-112 2 0 0   nothing : 206 205
;;  205-112 2 0 0   nothing : 207 206
;;  2068212 2 3 1   slot1   : 212 210 209 
208 207

;;  207-112 2 0 0   nothing : 210 208
;;  208-112 2 0 0   nothing : 210 209
;;  209-112 2 0 0   nothing : 211 210
;;  2108212 1 2 1   slot1   : 212 211
;;  211-112 2 0 0   nothing :
;;  212 712 6 1 1   (slot0+slot1+slot2) :

;;  dependencies resolved: insn 185
;;  tick updated: insn 185 into ready
;;  dependencies resolved: insn 194
;;  tick updated: insn 194 into ready
;;  Advanced a state.
;;  Ready list after queue_to_ready:194:87  185:82
;;  Ready list after ready_sort:185:82  194:87
;;  Clock 0
;;  Ready list (t =   0):185:82  194:87
;;  Chosen insn : 194
;;0-->   194 r299#4=unspec/v[0x3] 8:slot1
;;  resetting: debug insn 193

Note that there is a dependency 185->193->194. Insn 193 is a debug_insn 
for var_location. When we actually get to scheduling, we seem to ignore 
this dependency and put both 185 and 194 into ready state and 194 gets 
picked, causing my test to go wrong.


I do not have much experience working

Machine description question

2010-05-12 Thread Hariharan

Hello all,
Picochip has communication instructions that allow one array element to 
pass data to another. There are 3 such instructions PUT/GET/TSTPORT. 
Currently, all three of these use UNSPEC_VOLATILE side-effect 
expressions to make sure they don't get reordered. But, i wonder if it 
is an overkill to use UNSPEC_VOLATILE for this purpose and whether i 
should use UNSPEC instead. The only thing we care here is that they 
don't reordered with respect to each other. It is okay for other 
instructions to move around the communication instructions (as long as 
normal scheduler dependencies are taken care of). There are possibly one 
of two things i can do.


1. Introduce an implicit dependency between all communication 
instructions by adding a use/clobber of an imaginary register.
2. Introduce explicit dependency between them by using some target hook 
to add dependency links. I have not found any appropriate target hook to 
do this.


Can you tell me which one i should try? Has anyone tried doing anything 
similar? Any pointers/suggestions on this will be greatly appreciated.


Thanks
Hari


delay branch bug?

2010-05-24 Thread Hariharan

Hello all,
I found something a little odd with delay slot scheduling. If i had the 
following bit of code (Note that "get" builtin functions in picochip 
stand for port communication)


int mytest ()
{
 int a[5];
 int i;
 for (i = 0; i < 5; i++)
 {
   a[i] = (int) getctrlIn();
 }
 switch (a[3])
 {
   case 0:
   return 4;
   default:
   return 13;
 }
}

The relevant bit of assembly for this compiled at -Os is

_L2:
   GET 0,R[5:4]// R[5:4] := PORT(0)
_picoMark_LBE5=
_picoMark_LBE4=
   .loc 1 13 0
   STW R4,(R3)0// Mem((R3)0{byte}) := R4
   ADD.0 R3,2,R3   // R3 := R3 + 2 (HI)
   .loc 1 11 0
   SUB.0 R3,R2,r15 // CC := (R3!=R2)
   BNE _L2
   =-> LDW (FP)3,R5// R5 = Mem((FP)6{byte})
   .loc 1 22 0

=-> is the delay slot marker. Note that the LDW instruction has been 
moved into the delay slot. This corresponds to the load in "switch 
(a[3]" statement above. The first 3 times around this loop, LDW would be 
loading uninitialised memory. The loaded value is ignored until we come 
out of the loop and hence the code is functionally correct, but i am not 
sure introduction of uninitialised memory access by the compiler when 
there was none in the source is good.


I browsed around the delay branch code in reorg.c, but couldn't find 
anything that checks for this. Is this the intended behaviour? Can 
anyone familiar with delay branch code help?


Thanks
Hari



Re: [Bug rtl-optimization/44013] VTA produces wrong code

2010-06-01 Thread Hariharan

Hi Jakub,
I have not had any response from Alexandre on this yet and i haven't had 
much luck in mailing list either 
(http://gcc.gnu.org/ml/gcc/2010-04/msg00917.html). Is there anyone else 
who is familiar with VTA who could help?


Thanks
Hari

jakub at gcc dot gnu dot org wrote:


Re: New picoChip port and maintainers

2008-03-12 Thread Hariharan

Thanks to the GCC SC for accepting the picochip port.

Regards
Hari


David Edelsohn wrote:

I am pleased to announce that the GCC Steering Committee has
accepted the picoChip port for inclusion in GCC and appointed
Hariharan Sandanagobalane and Daniel Towner as port maintainers.
The initial patch needs approval from a GCC GWP maintainer before it may
be committed.

Please join me in congratulating Hari and Daniel on their new role.
Please update your listing in the MAINTAINERS file.

Happy hacking!
David


Re: New picoChip port and maintainers

2008-06-09 Thread Hariharan

Hi David/SC,
Thanks again for accepting the picochip port in GCC.

Although the picochip port has been accepted by the Steering Committee, 
we have had trouble getting a GWP maintainer to review the port. All the 
GWP maintainers seem to be extremely busy. I have emailed all of them, 
but haven't been successful in getting a review.


In light of this, would it be possible for the SC to allow the port to 
be reviewed by other port maintainers?


Regards
Hari


David Edelsohn wrote:

I am pleased to announce that the GCC Steering Committee has
accepted the picoChip port for inclusion in GCC and appointed
Hariharan Sandanagobalane and Daniel Towner as port maintainers.
The initial patch needs approval from a GCC GWP maintainer before it may
be committed.

Please join me in congratulating Hari and Daniel on their new role.
Please update your listing in the MAINTAINERS file.

Happy hacking!
David


Re: Optimising for size

2008-07-15 Thread Hariharan

Hi Joel,
I ran into a similar problem moving from 4.2.2 to 4.3.0. I looked a bit 
into it and found that 4.3 compiler inlines more aggressively than 4.2.x 
compiler. The reason was that the following two lines were removed from 
opts.c


  set_param_value ("max-inline-insns-single", 5); 



  set_param_value ("max-inline-insns-auto", 5);

Of course, there were other changes made to make sure code size didnt 
increase with this change. But, the other changes depend on 
PARAM_INLINE_CALL_COST. The default of 16 was too high for our target 
(picochip). You might want to try to reduce this value and see if your 
code-size woes go away.



Regards
Hari

Joe Buck wrote:

On Mon, Jul 14, 2008 at 10:04:08AM +1000, [EMAIL PROTECTED] wrote:

I have a piece of C code. The code, compiled to an ARM THUMB 
target using
gcc 4.0.2, with -Os results in 230 instructions. The exact same 
code,
using the exact same switches compiles to 437 instructions with 
gcc 4.3.1.

Considering that the compiler optimises to size and the much newer
compiler emits almost twice as much code as the old one, I 
think it is an

issue.


Agreed.  I think it's a regression.  Using -Os and getting
much larger code would qualify.


So the question is, how should I report it?


Open a PR with the complete test case, and the command line options you
used with 4.0.2 and 4.3.1.


Please cc me on the PR.  I would like to track this one and
if you provide a preprocessed  test case can quickly
check the size on 3.2.3, 4.1.1, 4.2.4, 4.3.1 and the trunk.


Use joel AT gcc DOT gnu.org

Thanks.

--
Joel Sherrill, Ph.D. Director of Research & Development
[EMAIL PROTECTED]On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
  Support Available (256) 722-9985


size of array "" is too large

2008-07-17 Thread Hariharan

Hello,
I see that in x86 GCC, you can define a structure with

struct trial
{
  long a[10];
};


Whereas in a 16-bit target (picochip), you cannot define,

struct trial
{
  long a[1];
};

In the case above, i get a
"size of array ‘a’ is too large" error.

The thing that took me by surprise was, if i split the structure to

struct trial
{
  long a[5000];
  long b[5000];
};

This works fine.

I looked around the mailing list a bit. This issue seems to have been 
raised a few times before, but i couldnt find any definitive answer.


Is this a bug in GCC? Do i file a report?

Cheers
Hari


unsigned comparison warning

2008-07-29 Thread Hariharan

Hello,
I found something rather strange with the unsigned comparison warnings 
in GCC.


If i had,

unsigned char 
a; 



int foo 
()   

{

 if (a >= 
0)

   return 
0;

 
else   

   return 
1;


}

and i did gcc -O2 -c trial.c, then i get a warning

trial.c:6: warning: comparison is always true due to limited range of data type

It works the same way if i used an unsigned short. But, if i use unsigned int/long, i dont get this warning. This is on x86. Is there an explanation for this? 


Cheers
Hari









Re: unsigned comparison warning

2008-07-30 Thread Hariharan

Thanks Ian. I will raise this in gcc-help mailing list.

Cheers
Hari

Ian Lance Taylor wrote:

Hariharan <[EMAIL PROTECTED]> writes:


I found something rather strange with the unsigned comparison warnings
in GCC.


This is the wrong mailing list.  The mailing list gcc@gcc.gnu.org is
for gcc developers.  The mailing list [EMAIL PROTECTED] is for
questions about using gcc.  Please take any followups to
[EMAIL PROTECTED]  Thanks.



and i did gcc -O2 -c trial.c, then i get a warning

trial.c:6: warning: comparison is always true due to limited range of data type

It works the same way if i used an unsigned short. But, if i use
unsigned int/long, i dont get this warning. This is on x86. Is there
an explanation for this? 


You neglected to mention the version of gcc.  In current gcc, I don't
see any warning when using "gcc -O2 -c trial.c".  I see a warning for
both "unsigned char" and "unsigned int" when I add the -Wextra option.

Ian


fbranch-probabilities bug

2009-01-08 Thread Hariharan Sandanagobalane

Hi Seongbae,
I was doing some work on profiling for picochip, when i noticed what 
looks to me like a bug. It looks to me that using fbranch-probabilities 
on the commandline (after a round of profile-generate or profile-arcs) 
would just not work on any target. Reason..


Coverage.c:1011

  if (flag_profile_use)
read_counts_file ();

Should this not be

  if (flag_profile_use || flag_branch_probabilities)  // Maybe more flags
read_counts_file ();

??

Of course, i hit the problem later on since the counts were not read, it 
just assumed that the .gcda file were not available, when it actually was.


Thanks
Hari


pr39339 - invalid testcase or SRA bug?

2009-03-10 Thread Hariharan Sandanagobalane

Hi,
Since r144598, pr39339.c has been failing on picochip. On investigation, 
it looks to me that the testcase is illegal.


Relevant source code:
struct C
{
  unsigned int c;
  struct D
  {
unsigned int columns : 4;
unsigned int fore : 9;
unsigned int back : 9;
unsigned int fragment : 1;
unsigned int standout : 1;
unsigned int underline : 1;
unsigned int strikethrough : 1;
unsigned int reverse : 1;
unsigned int blink : 1;
unsigned int half : 1;
unsigned int bold : 1;
unsigned int invisible : 1;
unsigned int pad : 1;
  } attr;
};

struct A
{
  struct C *data;
  unsigned int len;
};

struct B
{
  struct A *cells;
  unsigned char soft_wrapped : 1;
};

struct E
{
  long row, col;
  struct C defaults;
};

__attribute__ ((noinline))
void foo (struct E *screen, unsigned int c, int columns, struct B *row)
{
  struct D attr;
  long col;
  int i;
  col = screen->col;
  attr = screen->defaults.attr;
  attr.columns = columns;
  row->cells->data[col].c = c;
  row->cells->data[col].attr = attr;
  col++;
  attr.fragment = 1;
  for (i = 1; i < columns; i++)
{
  row->cells->data[col].c = c;
  row->cells->data[col].attr = attr;
  col++;
}
}

int
main (void)
{
  struct E e = {.row = 5,.col = 0,.defaults =
  {6, {-1, -1, -1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0}} };
  struct C c[4];
  struct A a = { c, 4 };
  struct B b = { &a, 1 };
  struct D d;
  __builtin_memset (&c, 0, sizeof c);
  foo (&e, 65, 2, &b);
  d = e.defaults.attr;
  d.columns = 2;
  if (__builtin_memcmp (&d, &c[0].attr, sizeof d))
__builtin_abort ();
  d.fragment = 1;
  if (__builtin_memcmp (&d, &c[1].attr, sizeof d))
__builtin_abort ();
  return 0;
}


In picochip, PCC_BITFIELD_TYPE_MATTERS is set and int is 16-bits, so the 
structure D becomes 6 bytes, with 3-bit padding between fore and back.


At SRA the code becomes

;; Function foo (foo)

foo (struct E * screen, unsigned int c, int columns, struct B * row)
{
  unsigned int attr$B32F16;
   attr$B26F6;
   attr$back;
   attr$fore;
   attr$fragment;
  int i;
  long int col;
  struct C * D.1267;
  unsigned int D.1266;
  unsigned int D.1265;
  struct C * D.1264;
  struct A * D.1263;
   D.1262;
  unsigned char D.1261;

:
  col_4 = screen_3(D)->col;
  attr$B32F16_36 = BIT_FIELD_REF defaults.attr, 16, 32>;
  attr$B26F6_37 = BIT_FIELD_REF defaults.attr, 6, 26>;
  attr$back_38 = screen_3(D)->defaults.attr.back;
  attr$fore_39 = screen_3(D)->defaults.attr.fore;
  attr$fragment_40 = screen_3(D)->defaults.attr.fragment;
  D.1261_6 = (unsigned char) columns_5(D);
  D.1262_7 = () D.1261_6;
  D.1263_9 = row_8(D)->cells;
  D.1264_10 = D.1263_9->data;
  D.1265_11 = (unsigned int) col_4;
  D.1266_12 = D.1265_11 * 8;
  D.1267_13 = D.1264_10 + D.1266_12;
  D.1267_13->c = c_14(D);
  BIT_FIELD_REF attr, 16, 32> = attr$B32F16_36;
  BIT_FIELD_REF attr, 6, 26> = attr$B26F6_37;
  D.1267_13->attr.back = attr$back_38;
  D.1267_13->attr.fore = attr$fore_39;
  D.1267_13->attr.fragment = attr$fragment_40;
  D.1267_13->attr.columns = D.1262_7;
  col_20 = col_4 + 1;
  if (columns_5(D) > 1)
goto ;
  else
goto ;

:
  # col_29 = PHI 
  # i_30 = PHI 
  D.1265_24 = (unsigned int) col_29;
  D.1266_25 = D.1265_24 * 8;
  D.1267_26 = D.1264_10 + D.1266_25;
  D.1267_26->c = c_14(D);
  BIT_FIELD_REF attr, 16, 32> = attr$B32F16_36;
  BIT_FIELD_REF attr, 6, 26> = attr$B26F6_37;
  D.1267_26->attr.back = attr$back_38;
  D.1267_26->attr.fore = attr$fore_39;
  D.1267_26->attr.fragment = 1;
  D.1267_26->attr.columns = D.1262_7;
  col_32 = col_29 + 1;
  i_33 = i_30 + 1;
  if (columns_5(D) > i_33)
goto ;
  else
goto ;

:
  return;

}



;; Function main (main)

main ()
{
  struct D d;
  struct B b;
  struct A a;
  struct C c[4];
  struct E e;
  int D.1279;
  int D.1276;

:
  e.row = 5;
  e.col = 0;
  e.defaults.c = 6;
  e.defaults.attr.columns = 15;
  e.defaults.attr.fore = 511;
  e.defaults.attr.back = 511;
  e.defaults.attr.fragment = 1;
  e.defaults.attr.standout = 0;
  e.defaults.attr.underline = 1;
  e.defaults.attr.strikethrough = 0;
  e.defaults.attr.reverse = 1;
  e.defaults.attr.blink = 0;
  e.defaults.attr.half = 1;
  e.defaults.attr.bold = 0;
  e.defaults.attr.invisible = 1;
  e.defaults.attr.pad = 0;
  a.data = &c;
  a.len = 4;
  b.cells = &a;
  b.soft_wrapped = 1;
  __builtin_memset (&c, 0, 32);
  foo (&e, 65, 2, &b);
  d = e.defaults.attr;
  d.columns = 2;
  D.1276_1 = __builtin_memcmp (&d, &c[0].attr, 6);
  if (D.1276_1 != 0)
goto ;
  else
goto ;

:
  __builtin_abort ();

:
  d.fragment = 1;
  D.1279_2 = __builtin_memcmp (&d, &c[1].attr, 6);
  if (D.1279_2 != 0)
goto ;
  else
goto ;

:
  __builtin_abort ();

:
  return 0;

}


Note that padding bits (13,16) are not copied over in bb_2 in function 
foo. main then does a memcmp, which fails because the padding bits are 
different.


From C99 standards (p328), 265) The contents of ‘‘holes’’ used as 
padding for purposes of alignment within structure objects are
indeterminate. Strings short

Re: pr39339 - invalid testcase or SRA bug?

2009-03-10 Thread Hariharan Sandanagobalane
Yes, if i change the structure to bring the 3 1-bit members forward, to 
avoid padding, the testcase does pass.


Thanks to both of you for your help.

Cheers
Hari

Jakub Jelinek wrote:

On Tue, Mar 10, 2009 at 01:44:11PM +, Hariharan Sandanagobalane wrote:
Since r144598, pr39339.c has been failing on picochip. On investigation,  
it looks to me that the testcase is illegal.


Relevant source code:
struct C
{
  unsigned int c;
  struct D
  {
unsigned int columns : 4;



unsigned int fore : 9;
unsigned int back : 9;


As the testcase fails with buggy (pre r144598) gcc and succeeds after even
with:


unsigned int fore : 12;
unsigned int back : 6;


instead of :9, :9, I think we could change it (does it succeed on picochip
then)?  Or move to gcc.dg/torture/ and run only on int32plus targets.
Or add if (sizeof (int) != 4 || sizeof (struct D) != 4) return 0
to the beginning of main.

Jakub


Re: fbranch-probabilities bug

2009-03-11 Thread Hariharan Sandanagobalane


Seongbae Park ??? ??? wrote:

This is the intended behavior, though now I see that the documentation
isn't very clear.


Can you fix the documentation? As it stands now, it is easy for a user
to be misguided into thinking -fprofile-arcs and fbranch-probabilities
combination would work.

Just out of curiosity, What is the downside to letting people use
-fbranch-probabilities without -fprofile-use?

Cheers
Hari


You need to use -fprofile-use - the typical usage scenario is to
compile with -fprofile-generate
to build an executable to do profile collection, and then compile with
-fprofile-use
to build optimized code using the profile data.

Seongbae

On Thu, Jan 8, 2009 at 6:30 AM, Hariharan Sandanagobalane
 wrote:

Hi Seongbae,
I was doing some work on profiling for picochip, when i noticed what looks
to me like a bug. It looks to me that using fbranch-probabilities on the
commandline (after a round of profile-generate or profile-arcs) would just
not work on any target. Reason..

Coverage.c:1011

 if (flag_profile_use)
   read_counts_file ();

Should this not be

 if (flag_profile_use || flag_branch_probabilities)  // Maybe more flags
   read_counts_file ();

??

Of course, i hit the problem later on since the counts were not read, it
just assumed that the .gcda file were not available, when it actually was.

Thanks
Hari





Re: Machine description question

2010-05-12 Thread Hariharan Sandanagobalane

Thanks for your help BingFeng.

I gave this a go and ended up with worse code (and worse memory usage) 
than before. I started with this experiment because of the compilers 
"All virtual registers are assumed to be used and clobbered by 
unspec_volatile" rule. The get/put instructions read/write to registers 
and the virtual register assigned for them interferes with all the 
virtual registers in the function. So, they were highly likely to be 
spilled and use stack instead. I wanted to try to avoid this by the 
introduction of unspec's and use of imaginary registers.


But, the virtual registers that are involved in unspec patterns with 
these imaginary registers still seem to be marked to interfere with all 
the virtual registers. Is that to be expected? Am i missing something 
obvious here?


Regards
Hari

Bingfeng Mei wrote:

Our architecture has the similar resource, and we use the first approach
by creating an imaginary register and dependency between these instructions,
i.e., every such instruction reads and write to the special register to
create artificial dependency. You may need to add a (unspec:..) as an 
independent expression in your pattern to prevent some wrong optimizations. 



Cheers,
Bingfeng

  

-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On 
Behalf Of Hariharan

Sent: 12 May 2010 11:18
To: gcc@gcc.gnu.org
Subject: Machine description question

Hello all,
Picochip has communication instructions that allow one array 
element to 
pass data to another. There are 3 such instructions PUT/GET/TSTPORT. 
Currently, all three of these use UNSPEC_VOLATILE side-effect 
expressions to make sure they don't get reordered. But, i 
wonder if it 
is an overkill to use UNSPEC_VOLATILE for this purpose and whether i 
should use UNSPEC instead. The only thing we care here is that they 
don't reordered with respect to each other. It is okay for other 
instructions to move around the communication instructions 
(as long as 
normal scheduler dependencies are taken care of). There are 
possibly one 
of two things i can do.


1. Introduce an implicit dependency between all communication 
instructions by adding a use/clobber of an imaginary register.
2. Introduce explicit dependency between them by using some 
target hook 
to add dependency links. I have not found any appropriate 
target hook to 
do this.


Can you tell me which one i should try? Has anyone tried 
doing anything 
similar? Any pointers/suggestions on this will be greatly appreciated.


Thanks
Hari





Re: Machine description question

2010-05-13 Thread Hariharan Sandanagobalane

The patterns for PUT/GET were

; Scalar Put instruction.
(define_insn "commsPut"
 [(unspec_volatile [(match_operand:HI 0 "const_int_operand" "")
(match_operand:SI 1 "register_operand" "r")]
   UNSPEC_PUT)]
 ""
 "PUT %R1,%0\t// PORT(%0) := %R1"
 [(set_attr "type" "comms")
  (set_attr "length" "2")])


(define_insn "commsGet"
 [(set (match_operand:SI 0 "register_operand" "=r")
   (unspec_volatile:SI
[(match_operand:HI 1 "immediate_operand" "n")]
UNSPEC_GET))]
 ""
 "GET %1,%R0\t// %R0 := PORT(%1)"
 [(set_attr "type" "comms")
  (set_attr "length" "2")])


I changed them to

; Scalar Put instruction.
(define_insn "commsPut"
 [(unspec [(match_operand:HI 0 "const_int_operand" "")
(match_operand:SI 1 "register_operand" "r")]
   UNSPEC_PUT)
   (use (reg:HI DUMMY_COMMN_REGNUM))
   (clobber (reg:HI DUMMY_COMMN_REGNUM))]
 ""
 "PUT %R1,%0\t// PORT(%0) := %R1"
 [(set_attr "type" "comms")
  (set_attr "length" "2")])

; Simple scalar get.
(define_insn "commsGet"
 [(set (match_operand:SI 0 "register_operand" "=r")
   (unspec:SI
[(match_operand:HI 1 "immediate_operand" "n")]
UNSPEC_GET))
   (use (reg:HI DUMMY_COMMN_REGNUM))
   (clobber (reg:HI DUMMY_COMMN_REGNUM))]
 ""
 "GET %1,%R0\t// %R0 := PORT(%1)"
 [(set_attr "type" "comms")
  (set_attr "length" "2")])


As for the DUMMY_COMMN_REGNUM, I just defined this as a FIXED_REGISTER 
and bumped up FIRST_PSUEDO_REG.


Actually, there is one more problem i faced (other than performance). 
The code generated using unspec's was just plain wrong. The unspec 
pattern that i was using for GET, which was inside a loop, was being 
hoisted out of the loop by the loop optimizer. I guess i should have 
seen this coming, since unspec is just "machine-specific" operation and 
the optimizer probably rightly assumes that multiple execution of this 
with same parameters would result in same value being produced. This 
obviously is not the case for these communication instructions.


Do you have your code to do this using unspec in gcc mainline? Can you 
point me to that, please?


Thanks
Hari

Bingfeng Mei wrote:

How do you define your imaginary register in target.h? Can you post
one example of your instruction pattern? 


Bingfeng

  

-Original Message-
From: Hariharan Sandanagobalane [mailto:harihar...@picochip.com] 
Sent: 12 May 2010 16:40

To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Machine description question

Thanks for your help BingFeng.

I gave this a go and ended up with worse code (and worse 
memory usage) 
than before. I started with this experiment because of the compilers 
"All virtual registers are assumed to be used and clobbered by 
unspec_volatile" rule. The get/put instructions read/write to 
registers 
and the virtual register assigned for them interferes with all the 
virtual registers in the function. So, they were highly likely to be 
spilled and use stack instead. I wanted to try to avoid this by the 
introduction of unspec's and use of imaginary registers.


But, the virtual registers that are involved in unspec patterns with 
these imaginary registers still seem to be marked to 
interfere with all 
the virtual registers. Is that to be expected? Am i missing something 
obvious here?


Regards
Hari

Bingfeng Mei wrote:

Our architecture has the similar resource, and we use the 
  

first approach

by creating an imaginary register and dependency between 
  

these instructions,

i.e., every such instruction reads and write to the special 
  

register to

create artificial dependency. You may need to add a 
  
(unspec:..) as an 

independent expression in your pattern to prevent some 
  
wrong optimizations. 


Cheers,
Bingfeng

  
  

-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On 
Behalf Of Hariharan

Sent: 12 May 2010 11:18
To: gcc@gcc.gnu.org
Subject: Machine description question

Hello all,
Picochip has communication instructions that allow one array 
element to 
pass data to another. There are 3 such instructions 

PUT/GET/TSTPORT. 

Currently, all three of these use UNSPEC_VOLATILE side-effect 
expressions to make sure they don't get reordered. But, i 
wonder if it 
is an overkill to use UNSPEC_VOLATILE for this purpose and 

whether i 

should use UNSPEC instead. The only thing we care here is 

that they 

don't

Re: Machine description question

2010-05-14 Thread Hariharan Sandanagobalane

Hi Bengfeng,
Changing my instruction patterns similar to the ones that you sent does 
get over the correctness issue. Setting the imaginary register 
explicitly this way and adding those extra unspec patterns does seem to 
work. But, performance-wise, it still doesn't give me anything. Did you 
decide to use these patterns (instead of the simpler unspec_volatile 
ones) for performance reasons? Does using these patterns give you anything?


Cheers
Hari

Bingfeng Mei wrote:

Hari,

Here are some patterns similar to yours. 


(define_insn "putbx"
  [(set (reg:BXBC R_BX) (unspec:BXBC [(match_operand:QI 0 "firepath_register" "vr")] UNSPEC_BXM)) 
   (unspec:BXBC [(reg:BXBC R_BX)] UNSPEC_BX)]   <---  Important to avoid some wrong optimization (Maybe DCE, I couldn't remember clearly)



define_insn "getbx"
  [(set (reg:BXBC R_BX) (unspec:BXBC [(reg:BXBC R_BX)] UNSPEC_BX))  < 
Artifical dependency
   (set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(reg:BXBC R_BX)]UNSPEC_BXM))
   (unspec:BXBC [(reg:BXBC R_BX)] UNSPEC_BX)]  < Important to avoid some optimization. 

Our port is still porivate and not in mainline. 


Cheers,
Bingfeng

  

-Original Message-
From: Hariharan Sandanagobalane [mailto:harihar...@picochip.com] 
Sent: 13 May 2010 10:17

To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Machine description question

The patterns for PUT/GET were

; Scalar Put instruction.
(define_insn "commsPut"
  [(unspec_volatile [(match_operand:HI 0 "const_int_operand" "")
 (match_operand:SI 1 "register_operand" "r")]
UNSPEC_PUT)]
  ""
  "PUT %R1,%0\t// PORT(%0) := %R1"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])


(define_insn "commsGet"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec_volatile:SI
 [(match_operand:HI 1 "immediate_operand" "n")]
 UNSPEC_GET))]
  ""
  "GET %1,%R0\t// %R0 := PORT(%1)"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])


I changed them to

; Scalar Put instruction.
(define_insn "commsPut"
  [(unspec [(match_operand:HI 0 "const_int_operand" "")
 (match_operand:SI 1 "register_operand" "r")]
UNSPEC_PUT)
(use (reg:HI DUMMY_COMMN_REGNUM))
(clobber (reg:HI DUMMY_COMMN_REGNUM))]
  ""
  "PUT %R1,%0\t// PORT(%0) := %R1"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])

; Simple scalar get.
(define_insn "commsGet"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI
 [(match_operand:HI 1 "immediate_operand" "n")]
 UNSPEC_GET))
(use (reg:HI DUMMY_COMMN_REGNUM))
(clobber (reg:HI DUMMY_COMMN_REGNUM))]
  ""
  "GET %1,%R0\t// %R0 := PORT(%1)"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])


As for the DUMMY_COMMN_REGNUM, I just defined this as a 
FIXED_REGISTER 
and bumped up FIRST_PSUEDO_REG.


Actually, there is one more problem i faced (other than performance). 
The code generated using unspec's was just plain wrong. The unspec 
pattern that i was using for GET, which was inside a loop, was being 
hoisted out of the loop by the loop optimizer. I guess i should have 
seen this coming, since unspec is just "machine-specific" 
operation and 
the optimizer probably rightly assumes that multiple 
execution of this 
with same parameters would result in same value being produced. This 
obviously is not the case for these communication instructions.


Do you have your code to do this using unspec in gcc 
mainline? Can you 
point me to that, please?


Thanks
Hari

Bingfeng Mei wrote:


How do you define your imaginary register in target.h? Can you post
one example of your instruction pattern? 


Bingfeng

  
  

-Original Message-
From: Hariharan Sandanagobalane [mailto:harihar...@picochip.com] 
Sent: 12 May 2010 16:40

To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Machine description question

Thanks for your help BingFeng.

I gave this a go and ended up with worse code (and worse 
memory usage) 
than before. I started with this experiment because of the 

compilers 

"All virtual registers are assumed to be used and clobbered by 
unspec_volatile" rule. The get/put instructions read/write to 
registers 
and the virtual register assigned for them interferes with all the 
virtual registers in the function. So, they were highly 

likely to be 

spilled and use stack inst

Re: Machine description question

2010-05-14 Thread Hariharan Sandanagobalane
Ours is a vliw processor too, but my focus was on register allocation. 
Unfortunately, the instruction with unspec is still marked to interfere 
with all virtual registers and hence gets spilled. I was hoping the one 
with unspecs might do better there, but no change there. So, i end up 
with similar performance to the unspec_volatile version.


Thanks for your help

Cheers
Hari

Bingfeng Mei wrote:

Yes, we use this instead of unspec_volatile out of performance concern.
Our target is a VLIW processor, so there is more opportunities to move
instructions around. Did you observe any instruction that should be
moved but not? 


Cheers,
Bingfeng

  

-Original Message-
From: Hariharan Sandanagobalane [mailto:harihar...@picochip.com] 
Sent: 14 May 2010 12:26

To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Machine description question

Hi Bengfeng,
Changing my instruction patterns similar to the ones that you 
sent does 
get over the correctness issue. Setting the imaginary register 
explicitly this way and adding those extra unspec patterns 
does seem to 
work. But, performance-wise, it still doesn't give me 
anything. Did you 
decide to use these patterns (instead of the simpler unspec_volatile 
ones) for performance reasons? Does using these patterns give 
you anything?


Cheers
Hari

Bingfeng Mei wrote:


Hari,

Here are some patterns similar to yours. 


(define_insn "putbx"
  [(set (reg:BXBC R_BX) (unspec:BXBC [(match_operand:QI 0 
  
"firepath_register" "vr")] UNSPEC_BXM)) 

   (unspec:BXBC [(reg:BXBC R_BX)] UNSPEC_BX)]   <---  
  
Important to avoid some wrong optimization (Maybe DCE, I 
couldn't remember clearly)


define_insn "getbx"
  [(set (reg:BXBC R_BX) (unspec:BXBC [(reg:BXBC R_BX)] 
  

UNSPEC_BX))  < Artifical dependency


   (set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI [(reg:BXBC R_BX)]UNSPEC_BXM))
   (unspec:BXBC [(reg:BXBC R_BX)] UNSPEC_BX)]  < 
  
Important to avoid some optimization. 

Our port is still porivate and not in mainline. 


Cheers,
Bingfeng

  
  

-Original Message-
From: Hariharan Sandanagobalane [mailto:harihar...@picochip.com] 
Sent: 13 May 2010 10:17

To: Bingfeng Mei
Cc: gcc@gcc.gnu.org
Subject: Re: Machine description question

The patterns for PUT/GET were

; Scalar Put instruction.
(define_insn "commsPut"
  [(unspec_volatile [(match_operand:HI 0 "const_int_operand" "")
 (match_operand:SI 1 "register_operand" "r")]
UNSPEC_PUT)]
  ""
  "PUT %R1,%0\t// PORT(%0) := %R1"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])


(define_insn "commsGet"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec_volatile:SI
 [(match_operand:HI 1 "immediate_operand" "n")]
 UNSPEC_GET))]
  ""
  "GET %1,%R0\t// %R0 := PORT(%1)"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])


I changed them to

; Scalar Put instruction.
(define_insn "commsPut"
  [(unspec [(match_operand:HI 0 "const_int_operand" "")
 (match_operand:SI 1 "register_operand" "r")]
UNSPEC_PUT)
(use (reg:HI DUMMY_COMMN_REGNUM))
(clobber (reg:HI DUMMY_COMMN_REGNUM))]
  ""
  "PUT %R1,%0\t// PORT(%0) := %R1"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])

; Simple scalar get.
(define_insn "commsGet"
  [(set (match_operand:SI 0 "register_operand" "=r")
(unspec:SI
 [(match_operand:HI 1 "immediate_operand" "n")]
 UNSPEC_GET))
(use (reg:HI DUMMY_COMMN_REGNUM))
(clobber (reg:HI DUMMY_COMMN_REGNUM))]
  ""
  "GET %1,%R0\t// %R0 := PORT(%1)"
  [(set_attr "type" "comms")
   (set_attr "length" "2")])


As for the DUMMY_COMMN_REGNUM, I just defined this as a 
FIXED_REGISTER 
and bumped up FIRST_PSUEDO_REG.


Actually, there is one more problem i faced (other than 

performance). 

The code generated using unspec's was just plain wrong. The unspec 
pattern that i was using for GET, which was inside a loop, 

was being 

hoisted out of the loop by the loop optimizer. I guess i 

should have 

seen this coming, since unspec is just "machine-specific" 
operation and 
the optimizer probably rightly assumes that multiple 
execution of this 
with same parameters would result in same value being 

produced. This 


obviously is not the case for these communication instructions.

Do you 

Re: delay branch bug?

2010-05-24 Thread Hariharan Sandanagobalane



Jeff Law wrote:

On 05/24/10 05:46, Hariharan wrote:

Hello all,
I found something a little odd with delay slot scheduling. If i had 
the following bit of code (Note that "get" builtin functions in 
picochip stand for port communication)


int mytest ()
{
 int a[5];
 int i;
 for (i = 0; i < 5; i++)
 {
   a[i] = (int) getctrlIn();
 }
 switch (a[3])
 {
   case 0:
   return 4;
   default:
   return 13;
 }
}

The relevant bit of assembly for this compiled at -Os is

_L2:
   GET 0,R[5:4]// R[5:4] := PORT(0)
_picoMark_LBE5=
_picoMark_LBE4=
   .loc 1 13 0
   STW R4,(R3)0// Mem((R3)0{byte}) := R4
   ADD.0 R3,2,R3   // R3 := R3 + 2 (HI)
   .loc 1 11 0
   SUB.0 R3,R2,r15 // CC := (R3!=R2)
   BNE _L2
   =-> LDW (FP)3,R5// R5 = Mem((FP)6{byte})
   .loc 1 22 0

=-> is the delay slot marker. Note that the LDW instruction has been 
moved into the delay slot. This corresponds to the load in "switch 
(a[3]" statement above. The first 3 times around this loop, LDW would 
be loading uninitialised memory. The loaded value is ignored until we 
come out of the loop and hence the code is functionally correct, but 
i am not sure introduction of uninitialised memory access by the 
compiler when there was none in the source is good.


I browsed around the delay branch code in reorg.c, but couldn't find 
anything that checks for this. Is this the intended behaviour? Can 
anyone familiar with delay branch code help?
It's not ideal, but there's no way for reorg to know that a particular 
memory location is uninitialized as a result trying to "fix" this 
problem would ultimately result in reorg not being allowed to fill 
delay slots with memory references except under very very restrictive 
circumstances.


From a correctness standpoint, the uninitialized value will never be 
used, so it should cause no ill effects on your code.  The biggest 
effect would be tools like valgrind & purify (if supported on your 
architecture) would report the uninitialized memory read.  [Which begs 
the question how does purify handle this on sparc-solaris? ]
The code compiled for picochip runs under a simulator. The simulator 
tracks uninitialised memory accesses and emits warnings and hence my 
question.  I would agree with you that turning off delay slot filling of 
memory references for this sake doesn't make sense.


Thanks for your help.

Cheers
Hari



Jeff


Thanks
Hari



GCC vector extensions

2010-11-04 Thread Hariharan Sandanagobalane

Hello all,
Is it possible to use rtl vector patterns like vec_extractm, vec_setm 
from C code? It looks like C subscipting for vector variables was 
allowed at some point and then removed. So, can these rtl patterns only 
be used from languages other than C?


Of course, i can use these in target builtins, but i am trying to see if 
these can be used by language constructs itself.


Cheers
Hari

PS: I raised a related question in 
http://gcc.gnu.org/ml/gcc-help/2010-11/msg00021.html.




Re: GCC vector extensions

2010-11-05 Thread Hariharan Sandanagobalane

Hi Ian,
Thanks for your help.

I switched to mainline and the vector extract works a treat. When i 
tried vector set, it was still generating suboptimal code. Is this bit 
of code still work in progress?


Cheers
Hari

On 04/11/10 19:23, Ian Lance Taylor wrote:

Hariharan Sandanagobalane  writes:


Is it possible to use rtl vector patterns like vec_extractm, vec_setm
from C code? It looks like C subscipting for vector variables was
allowed at some point and then removed. So, can these rtl patterns
only be used from languages other than C?


They were just recently added and have not been removed.

Also answered on gcc-help.

Ian



Steering Committee

2012-08-17 Thread Hariharan Sandanagobalane
Dear SC members,
I used to maintain the picochip port of GCC, but I have not been
active on the picochip port over the last 8 months. This is unlikely
to change in the future, so I would like my name to be removed from
the maintainers list as picochip maintainer. I am still actively
working on GCC, so I would like to be added to the "Write after
approval" list.

Thanks
Hari


Stack parameter - pass by value - frame usage

2007-09-13 Thread Hariharan Sandanagobalane

Hello,
I looked at an inefficient code sequence for a simple program using 
GCC's picochip port (not yet submitted to mainline). Basically, a 
program like


long carray[10];
void fn (long c, int i)
{
 carray[i] = c;
}

produces good assembly code. But, if i were to do

struct complex16
{
int re,im;
};

struct complex16 carray[10];

void fn (struct complex16 c, int i)
{
 carray[i] = c;
}

GCC generates poor code. It has an extra save and restore of the 
frame-pointer, even though we dont use the frame.


I digged a bit further, and found that the get_frame_size() call returns 
4 in this case and hence the port's prologue generation code generates 
the frame-pointer updation.


It seems to me that each element of the stack is copied to the stack 
from the parameter registers and then that value is being used in the 
function. I have the following RTL code as we get into RTL.


(insn 6 2 7 2 (set (reg:HI 26)
   (reg:HI 0 R0 [ c ])) -1 (nil)
   (nil))

(insn 7 6 10 2 (set (reg:HI 27)
   (reg:HI 1 R1 [ c+2 ])) -1 (nil)
   (nil))

(insn 10 7 8 2 (set (reg/v:HI 28 [ i ])
   (reg:HI 2 R2 [ i ])) -1 (nil)
   (nil))

(insn 8 10 9 2 (set (mem/s/c:HI (reg/f:HI 21 virtual-stack-vars) [3 c+0 
S2 A16])

   (reg:HI 26)) -1 (nil)
   (nil))

(insn 9 8 11 2 (set (mem/s/c:HI (plus:HI (reg/f:HI 21 virtual-stack-vars)
   (const_int 2 [0x2])) [3 c+2 S2 A16])
   (reg:HI 27)) -1 (nil)
   (nil))

Note that the parameter is being written to the frame in the last 2 
instructions above. This, i am guessing is the reason for the 
get_frame_size() returning 4 later on, though the actual save of the 
struct parameter value on the stack is being eliminated at later 
optimization phases (CSE and DCE, i believe).


Why does the compiler do this? I vaguely remember x86 storing all 
parameter values on stack. Is that the reason for this behaviour? Is 
there anything i can do in the port to get around this problem?


Note : In our port "int" is 16-bits and long is 32-bits.

Thanks in advance,

Regards
Hari


Re: Stack parameter - pass by value - frame usage

2007-09-21 Thread Hariharan Sandanagobalane



Ian Lance Taylor wrote:

Hariharan Sandanagobalane <[EMAIL PROTECTED]> writes:


I looked at an inefficient code sequence for a simple program using
GCC's picochip port (not yet submitted to mainline).


Are you working with mainline sources?


I was not. I tried the same with gcc 4.3 branch and it does fix most of 
the problems. There are still corner cases where it produces unoptimal 
code. I will try to figure out whats wrong in those and get back to you.


Regards
Hari




Note that the parameter is being written to the frame in the last 2
instructions above. This, i am guessing is the reason for the
get_frame_size() returning 4 later on, though the actual save of the
struct parameter value on the stack is being eliminated at later
optimization phases (CSE and DCE, i believe).

Why does the compiler do this? I vaguely remember x86 storing all
parameter values on stack. Is that the reason for this behaviour? Is
there anything i can do in the port to get around this problem?


At a guess, it's because the frontend decided that the struct was
addressable and needed to be pushed on the stack.  I thought this got
cleaned up recently, though.

Ian

This email and any files transmitted with it are confidential and intended 
solely for the use of the individuals to whom they are addressed. If you have 
received this email in error please notify the sender and delete the message 
from your system immediately.



Profile information - CFG

2007-09-27 Thread Hariharan Sandanagobalane

Hello,
I am implementing support for PBO on picochip port of GCC (not yet 
submitted to mainline).


I see that GCC generates 2 files, xx.gcno and xx.gcda, containing the 
profile information, the former containing the flow graph 
information(compile-time) and later containing the edge profile 
information(run-time). The CFG information seems to be getting emitted 
quite early in the compilation process(pass_tree_profile). Is the 
instrumentation also done at this time? If it is, as later phases change 
CFG, how is the instrumentation code sanity maintained? If it isnt, How 
would you correlate the CFG in gcno file to the actual CFG at 
execution(that produces the gcda file)?


As for our port's case, we are already able to generate profile 
information using our simulator/hardware, and it is not-too-difficult 
for me to format that information into .gcno and .gcda files. But, i 
guess the CFG that i would have at runtime would be quite different from 
the CFG at initial phases of compilation (even at same optimization 
level). Any suggestions on this? Would i be better off keeping the gcno 
file that GCC generates, try to match the runtime-CFG to the one on the 
gcno file and then write gcda file accordingly?


Has anyone tried inserting profile information from outside of the GCC 
instrumentation back into the compiler? Could you please let me know how 
you handled this?


In general, does anyone have any numbers on the performance improvements 
that PBO brings in for GCC?


Thanks in advance.

Regards
Hari


Re: Profile information - CFG

2007-10-05 Thread Hariharan Sandanagobalane



Seongbae Park (???, ???) wrote:

On 9/27/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote:

Hello,
I am implementing support for PBO on picochip port of GCC (not yet
submitted to mainline).

I see that GCC generates 2 files, xx.gcno and xx.gcda, containing the
profile information, the former containing the flow graph
information(compile-time) and later containing the edge profile
information(run-time). The CFG information seems to be getting emitted
quite early in the compilation process(pass_tree_profile). Is the
instrumentation also done at this time? If it is, as later phases change


Yes.


CFG, how is the instrumentation code sanity maintained? If it isnt, How


Instrumentation code sanity is naturally maintained
since those are global load/stores. The compiler transformations naturally
preserve the original semantic of the input
and since profile counters are global variables,
update to those are preserved to provide what unoptimized code would do.


would you correlate the CFG in gcno file to the actual CFG at
execution(that produces the gcda file)?



As for our port's case, we are already able to generate profile
information using our simulator/hardware, and it is not-too-difficult
for me to format that information into .gcno and .gcda files. But, i
guess the CFG that i would have at runtime would be quite different from
the CFG at initial phases of compilation (even at same optimization
level). Any suggestions on this? Would i be better off keeping the gcno
file that GCC generates, try to match the runtime-CFG to the one on the
gcno file and then write gcda file accordingly?


Not only better off, you *need* to provide information that matches
what's in gcno, otherwise gcc can't read that gcda nor use it.
How you match gcno is a different problem
- there's no guarantee that you'll be able to recover
enough information from the output assembly of gcc,
because without instrumentation, gcc can optimize away the control flow.

pass_tree_profile is when both the instrumentation (with -fprofile-generate)
and reading of the profile data (with -fprofile-use) are done.
The CFG has to remain the same between generate and use
 - otherwise the compiler isn't able to use the profile data.


Thanks for your help, seongbae.

I have managed to get the profile information formatted in the way .gcda 
would look. But, does GCC expect the profile to be accurate? Would it 
accept profile data that came out of sampling?


-Hari



Seongbae


VLIW scheduling and delayed branch

2007-12-08 Thread Hariharan Sandanagobalane

Hi,
I am trying to enable delayed branch scheduling on our port of Gcc for 
picochip (16-bit VLIW DSP). I understand that delayed-branch is run as a 
seperate pass after the DFA scheduling is done. We basically depend on 
the TImode set on the cycle-start instructions to decide what 
instructions form a valid VLIW. By enabling delayed-branch, it seems 
like the delay-branch pass takes any instruction and puts it on the 
delay slot. This sometimes seem to pick the TImode set instructions, but 
does not seem to set the TImode on the next instruction.


Has anyone faced a similar problem before? Are there targets for which 
both VLIW and DBR are enabled? Perhaps ia64?


Thanks for your help.

Regards
Hari


Re: VLIW scheduling and delayed branch

2007-12-10 Thread Hariharan Sandanagobalane

Hi thomas,
Thanks for your reply. A couple of questions below.

Thomas Sailer wrote:
Has anyone faced a similar problem before? Are there targets for which 
both VLIW and DBR are enabled? Perhaps ia64?


I did something similar a few months ago.


What was your target? Is the target code available in Gcc mainline? If 
not, could you pass your code to me?




The problem is that haifa and the delayed branch scheduling passes don't
really fit together. delayed branch scheduling happily undoes all the
haifa decisions.

The question is how much you gain by delayed branch scheduling. I don't
have numbers, but it wasn't much in my case. And since your company name
is picochip, you certainly value size more than speed ?!


Yeah. We do. But, in our architecture, a branch has to have a delay slot 
instruction anyway. In the absence of one, we put a "nop" in there. If 
GCC manages to move a "single" instruction vliw into the delay slot, we 
would benefit in both size and speed, otherwise, we will just have no 
impact on either.




I pursued two approaches. The first one was to insert "stop bit" pseudo
insns into the RTL stream in machdep reorg, so I didn't have to rely on
TImode insn flags during output. But then delayed branch scheduling just
took one insn out of an insn group and put it into the delay slot,
meaning there was usually no cycle gain at all, just larger code size
(due to insn duplication).


This seems fairly straightforward to implement.



The second approach was having lots of parallel insns (using match
parallel and a custom predicate). machdep reorg then converts insn
bundles into a single parallel insn. Delayed branch scheduling then does
the right thing. This approach works fairly well for me, but there are a
few complications. My output code is pretty hackish, as I didn't want to
duplicate outputing a single insn / outputing the same insn as component
of a parallel insn group.


When do you un-parallel those instructions? And, how?

Regards
Hari



Tom



vliw scheduling - TImode bug?

2007-12-19 Thread Hariharan Sandanagobalane

Hello,
I see quite a few instances when i get the following RTL. A conditional 
branch, followed by a BASIC_BLOCK note, followed by a non-TImode 
instruction. Theoretically, i should be allowed to package the non-TI 
instruction along with the conditional branch, but doing so seems to be 
produce incorrect results. Am i supposed to consider the 
NOTE_INSN_BASIC_BLOCK as a cycle-breaker? Or, is it a genuine bug in the 
way TImodes are set on instructions?


(jump_insn:TI 144 225 17 2 
/home/hariharans5/gcc-4.2.2/gcc/testsuite/gcc.c-torture/execute/931004-8.c:15 
(parallel [

   (set (pc)
   (if_then_else (le:HI (reg:CC 17 pseudoCC)
   (const_int 0 [0x0]))
   (label_ref 109)
   (pc)))
   (use (const_int 77 [0x4d]))
   ]) 10 {*branch} (nil)
   (expr_list:REG_DEAD (reg:CC 17 pseudoCC)
   (expr_list:REG_BR_PROB (const_int 500 [0x1f4])
   (nil

(note 17 144 124 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(note 124 17 21 3 
("/home/hariharans5/gcc-4.2.2/gcc/testsuite/gcc.c-torture/execute/931004-8.c") 
17)


(insn 21 124 196 3 
/home/hariharans5/gcc-4.2.2/gcc/testsuite/gcc.c-torture/execute/931004-8.c:15 
(set (reg:HI 3 R3)

   (plus:HI (reg/f:HI 13 FP)
   (const_int 12 [0xc]))) 31 {*lea_move} (nil)
   (nil))

Thanks and regards
Hari