Bootstrap Failure in trunk (fortran)

2007-12-11 Thread Rainer Emrich
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.3.0/gcc-4.3.0/./prev-gcc/xgcc
- -B/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.3.0/gcc-4.3.0/./prev-gcc/
- -B/opt/gcc/Linux/i686-pc-linux-gnu/gcc-4.3.0/i686-pc-linux-gnu/bin/ -c   -g 
-O2
- -fomit-frame-pointer -DIN_GCC   -W -Wall -Wwrite-strings -Wstrict-prototypes
- -Wmissing-prototypes -Wold-style-definition -Wmissing-format-attribute 
-pedantic
- -Wno-long-long -Wno-variadic-macros
- -Wno-overlength-strings -Werror-DHAVE_CONFIG_H -I. -Ifortran
- -I/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc
- -I/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/fortran
- -I/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/../include
- -I/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/../libcpp/include
- -I/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/include
- -I/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/install/include
- -I/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/../libdecnumber
- -I/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/../libdecnumber/bid
- -I../libdecnumber
/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/fortran/decl.c -o
fortran/decl.o
cc1: warnings being treated as errors
/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/fortran/decl.c: In function
‘add_global_entry’:
/home/em/devel/projects/develtools/src/gcc-4.3.0/gcc/fortran/decl.c:4344: error:
comparison between signed and unsigned
gmake[3]: *** [fortran/decl.o] Error 1
gmake[3]: Leaving directory
`/SCRATCH/gcc-build/Linux/i686-pc-linux-gnu/gcc-4.3.0/gcc-4.3.0/gcc'

caused by:

2007-12-11  Bernhard Fischer  <[EMAIL PROTECTED]>

* decl.c (match_prefix): Make seen_type a boolean.
(add_global_entry): Cache type distinction.
* trans-decl.c: Whitespace cleanup.


Rainer
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHXnBO3s6elE6CYeURAiWUAKCkrkYFxlt9sr2gN4fN93L+KKyZ7QCfTIBz
42ApNC7+HlrqQgbpX3tNNMI=
=pmC0
-END PGP SIGNATURE-


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> I added the gcc people to the CC, it's their repository. Maybe they
> can help up sort this out.

Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. 
And gcc people with an interest in Git internals are probably already on 
the Git mailing list.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> Switching to the Google perftools malloc
> http://goog-perftools.sourceforge.net/
> 
> 10%   30  828M
> 20%   15  831M
> 30%   10  834M
> 40%   50  1014M
> 50%   80  1086M
> 60%   80  1500M
> 70% 200  1.53G
> 80% 200  1.85G
> 90% 260  1.87G
> 95% 520  1.97G
> 100% 1335 2.24G
> 
> Google allocator knocked 600MB off from memory use.
> Memory consumption did not fall during the write out phase like it did with 
> gcc.
> 
> Since all of this is with the same code except for changing the
> threading split, those runs where memory consumption went to 4.5GB
> with the gcc allocator must have triggered an extreme problem with
> fragmentation.

Did you mean the glibc allocator?

> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread? Clock time is obviously
> faster. Are the threads working too close to each other in memory and
> bouncing cache lines between the cores? Q6600 is just two E6600s in
> the same package, the caches are not shared.

Of course there'll always be a certain amount of wasted cycles when 
threaded.  The locking overhead, the extra contention for IO, etc.  So 
12% overhead (3% per thread) when using 4 threads is not that bad I 
would say.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I really don't know.

Did you try with pack.deltacachesize set to 1 ?

And yet, this is still missing the actual issue.  The issue being that 
the 2.1GB pack as a _source_ doesn't cause as much memory to be 
allocated even if the _result_ pack ends up being the same.

I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
Now that it has been repacked, I can't repack it anymore, even when 
single threaded, as it start crowling into swap fairly quickly.  It is 
really non intuitive and actually senseless that Git would require twice 
as much RAM to deal with a pack that is 7 times smaller.


Nicolas (still puzzled)


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> And yet, this is still missing the actual issue.  The issue being that 
> the 2.1GB pack as a _source_ doesn't cause as much memory to be 
> allocated even if the _result_ pack ends up being the same.
> 
> I was able to repack the 2.1GB pack on my machine which has 1GB of ram. 
> Now that it has been repacked, I can't repack it anymore, even when 
> single threaded, as it start crowling into swap fairly quickly.  It is 
> really non intuitive and actually senseless that Git would require twice 
> as much RAM to deal with a pack that is 7 times smaller.

OK, here's something else for you to try:

core.deltabasecachelimit=0
pack.threads=2
pack.deltacachesize=1

With that I'm able to repack the small gcc pack on my machine with 1GB 
of ram using:

git repack -a -f -d --window=250 --depth=250

and top reports a ~700m virt and ~500m res without hitting swap at all.
It is only at 25% so far, but I was unable to get that far before.

Would be curious to know what you get with 4 threads on your machine.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Jon Smirl
On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > And yet, this is still missing the actual issue.  The issue being that
> > the 2.1GB pack as a _source_ doesn't cause as much memory to be
> > allocated even if the _result_ pack ends up being the same.
> >
> > I was able to repack the 2.1GB pack on my machine which has 1GB of ram.
> > Now that it has been repacked, I can't repack it anymore, even when
> > single threaded, as it start crowling into swap fairly quickly.  It is
> > really non intuitive and actually senseless that Git would require twice
> > as much RAM to deal with a pack that is 7 times smaller.
>
> OK, here's something else for you to try:
>
> core.deltabasecachelimit=0
> pack.threads=2
> pack.deltacachesize=1
>
> With that I'm able to repack the small gcc pack on my machine with 1GB
> of ram using:
>
> git repack -a -f -d --window=250 --depth=250
>
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.
>
> Would be curious to know what you get with 4 threads on your machine.

Changing those parameters really slowed down counting the objects. I
used to be able to count in 45 seconds now it took 130 seconds. I am
still have the Google allocator linked in.

4 threads, cumulative clock time
25% 200 seconds, 820/627M
55% 510 seconds, 1240/1000M - little late recording
75% 15 minutes, 1658/1500M
90%  22 minutes, 1974/1800M
it's still running but there is no significant change.

Are two types of allocations being mixed?
1) long term, global objects kept until the end of everything
2) volatile, private objects allocated only while the object is being
compressed and then freed

Separating these would make a big difference to the fragmentation
problem. Single threading probably wouldn't see a fragmentation
problem from mixing the allocation types.

When a thread is created it could allocated a private 20MB (or
whatever) pool. The volatile, private objects would come from that
pool. Long term objects would stay in the global pool. Since they are
long term they will just get laid down sequentially in memory.
Separating these allocation types make things way easier for malloc.

CPU time would be helped by removing some of the locking if possible.

-- 
Jon Smirl
[EMAIL PROTECTED]


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Nicolas Pitre wrote:

> OK, here's something else for you to try:
> 
>   core.deltabasecachelimit=0
>   pack.threads=2
>   pack.deltacachesize=1
> 
> With that I'm able to repack the small gcc pack on my machine with 1GB 
> of ram using:
> 
>   git repack -a -f -d --window=250 --depth=250
> 
> and top reports a ~700m virt and ~500m res without hitting swap at all.
> It is only at 25% so far, but I was unable to get that far before.

Well, around 55% memory usage skyrocketed to 1.6GB and the system went 
deep into swap.  So I restarted it with no threads.

Nicolas (even more puzzled)


Re: Something is broken in repack

2007-12-11 Thread Jon Smirl
On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> On Tue, 11 Dec 2007, Nicolas Pitre wrote:
>
> > OK, here's something else for you to try:
> >
> >   core.deltabasecachelimit=0
> >   pack.threads=2
> >   pack.deltacachesize=1
> >
> > With that I'm able to repack the small gcc pack on my machine with 1GB
> > of ram using:
> >
> >   git repack -a -f -d --window=250 --depth=250
> >
> > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > It is only at 25% so far, but I was unable to get that far before.
>
> Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> deep into swap.  So I restarted it with no threads.
>
> Nicolas (even more puzzled)

On the plus side you are seeing what I see, so it proves I am not imagining it.


-- 
Jon Smirl
[EMAIL PROTECTED]


Re: Something is broken in repack

2007-12-11 Thread Linus Torvalds


On Tue, 11 Dec 2007, Jon Smirl wrote:
> 
> So why does our threaded code take 20 CPU minutes longer (12%) to run
> than the same code with a single thread?

Threaded code *always* takes more CPU time. The only thing you can hope 
for is a wall-clock reduction. You're seeing probably a combination of 
 (a) more cache misses
 (b) bigger dataset active at a time
and a probably fairly miniscule
 (c) threading itself tends to have some overheads.

> Q6600 is just two E6600s in the same package, the caches are not shared.

Sure they are shared. They're just not *entirely* shared. But they are 
shared between each two cores, so each thread essentially has only half 
the cache they had with the non-threaded version.

Threading is *not* a magic solution to all problems. It gives you 
potentially twice the CPU power, but there are real downsides that you 
should keep in mind.

> Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> with 4 threads? But only need 950MB with one thread? Where's the extra
> gigabyte going?

I suspect that it's really simple: you have a few rather big files in the 
gcc history, with deep delta chains. And what happens when you have four 
threads running at the same time is that they all need to keep all those 
objects that they are working on - and their hash state - in memory at the 
same time!

So if you want to use more threads, that _forces_ you to have a bigger 
memory footprint, simply because you have more "live" objects that you 
work on. Normally, that isn't much of a problem, since most source files 
are small, but if you have a few deep delta chains on big files, both the 
delta chain itself is going to use memory (you may have limited the size 
of the cache, but it's still needed for the actual delta generation, so 
it's not like the memory usage went away).

That said, I suspect there are a few things fighting you:

 - threading is hard. I haven't looked a lot at the changes Nico did to do 
   a threaded object packer, but what I've seen does not convince me it is 
   correct. The "trg_entry" accesses are *mostly* protected with 
   "cache_lock", but nothing else really seems to be, so quite frankly, I 
   wouldn't trust the threaded version very much. It's off by default, and 
   for a good reason, I think.

   For example: the packing code does this:

if (!src->data) {
read_lock();
src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
read_unlock();
...

   and that's racy. If two threads come in at roughly the same time and 
   see a NULL src->data, theÿ́'ll both get the lock, and they'll both 
   (serially) try to fill it in. It will all *work*, but one of them will 
   have done unnecessary work, and one of them will have their result 
   thrown away and leaked.

   Are you hitting issues like this? I dunno. The object sorting means 
   that different threads normally shouldn't look at the same objects (not 
   even the sources), so probably not, but basically, I wouldn't trust the 
   threading 100%. It needs work, and it needs to stay off by default.

 - you're working on a problem that isn't really even worth optimizing 
   that much. The *normal* case is to re-use old deltas, which makes all 
   of the issues you are fighting basically go away (because you only have 
   a few _incremental_ objects that need deltaing). 

   In other words: the _real_ optimizations have already been done, and 
   are done elsewhere, and are much smarter (the best way to optimize X is 
   not to make X run fast, but to avoid doing X in the first place!). The 
   thing you are trying to work with is the one-time-only case where you 
   explicitly disable that big and important optimization, and then you 
   complain about the end result being slow!

   It's like saying that you're compiling with extreme debugging and no
   optimizations, and then complaining that the end result doesn't run as 
   fast as if you used -O2. Except this is a hundred times worse, because 
   you literally asked git to do the really expensive thing that it really 
   really doesn't want to do ;)

> Is there another allocator to try? One that combines Google's
> efficiency with gcc's speed?

See above: I'd look around at threading-related bugs and check the way we 
lock (or don't) accesses.

Linus


error: no data type for mode ".."

2007-12-11 Thread Bingfeng Mei
Hello,

I tried to define a new machine mode for a data type only allocated to
certain registers, e.g., MAC registers.  I first used an unused PDI mode
(same as Blackfin porting).  

In target-modes.def file:
PARTIAL_INT_MODE (DI);

Then in my test program, I tried to define a new data type using PDI
mode.

typedef int __attribute__ ((mode (PDI)))  MREG;

But GCC reports error: 

tst.c:13: error: no data type for mode 'PDI'

On this typedef statement. 


How can I use the newly defined MODE to specify a new data type? I
cannot find any example for Blackfin, where PDI mode is used to
represennt 40-bit MAC similarly.


I also tries to define a new INT_MODE by:
INT_MODE(PDI, 8).

The error message is the same.

Any hint?  Thanks in advance. 


Cheers,
Bingfeng Mei

Broadcom UK



Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Linus Torvalds wrote:

> That said, I suspect there are a few things fighting you:
> 
>  - threading is hard. I haven't looked a lot at the changes Nico did to do 
>a threaded object packer, but what I've seen does not convince me it is 
>correct. The "trg_entry" accesses are *mostly* protected with 
>"cache_lock", but nothing else really seems to be, so quite frankly, I 
>wouldn't trust the threaded version very much. It's off by default, and 
>for a good reason, I think.

I beg to differ (of course, since I always know precisely what I do, and 
like you, my code never has bugs).

Seriously though, the trg_entry has not to be protected at all.  Why? 
Simply because each thread has its own exclusive set of objects which no 
other threads ever mess with.  They never overlap.

>For example: the packing code does this:
> 
>   if (!src->data) {
>   read_lock();
>   src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
>   read_unlock();
>   ...
> 
>and that's racy. If two threads come in at roughly the same time and 
>see a NULL src->data, theÿ́'ll both get the lock, and they'll both 
>(serially) try to fill it in. It will all *work*, but one of them will 
>have done unnecessary work, and one of them will have their result 
>thrown away and leaked.

No.  Once again, it is impossible for two threads to ever see the same 
src->data at all.  The lock is there simply because read_sha1_file() is 
not reentrant.

>Are you hitting issues like this? I dunno. The object sorting means 
>that different threads normally shouldn't look at the same objects (not 
>even the sources), so probably not, but basically, I wouldn't trust the 
>threading 100%. It needs work, and it needs to stay off by default.

For now it is, but I wouldn't say it really needs significant work at 
this point.  The latest thread patches were more about tuning than 
correctness.

What the threading could be doing, though, is uncovering some other 
bugs, like in the pack mmap windowing code for example.  Although that 
code is serialized by the read lock above, the fact that multiple 
threads are hammering on it in turns means that the mmap window is 
possibly seeking back and forth much more often than otherwise, possibly 
leaking something in the process.

>  - you're working on a problem that isn't really even worth optimizing 
>that much. The *normal* case is to re-use old deltas, which makes all 
>of the issues you are fighting basically go away (because you only have 
>a few _incremental_ objects that need deltaing). 
> 
>In other words: the _real_ optimizations have already been done, and 
>are done elsewhere, and are much smarter (the best way to optimize X is 
>not to make X run fast, but to avoid doing X in the first place!). The 
>thing you are trying to work with is the one-time-only case where you 
>explicitly disable that big and important optimization, and then you 
>complain about the end result being slow!
> 
>It's like saying that you're compiling with extreme debugging and no
>optimizations, and then complaining that the end result doesn't run as 
>fast as if you used -O2. Except this is a hundred times worse, because 
>you literally asked git to do the really expensive thing that it really 
>really doesn't want to do ;)

Linus, please pay attention to the _actual_ important issue here.

Sure I've been tuning the threading code in parallel to the attempt to 
debug this memory usage issue.

BUT.  The point is that repacking the gcc repo using "git repack -a -f 
--window=250" has a radically different memory usage profile whether you 
do the repack on the earlier 2.1GB pack or the later 300MB pack.  
_That_ is the issue.  Ironically, it is the 300MB pack that causes the 
repack to blow memory usage out of proportion.

And in both cases, the threading code has to do the same 
work whether or not the original pack was densely packed or not since -f 
throws away every existing deltas anyway.

So something is fishy elsewhere than in the packing code.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread David Miller
From: Nicolas Pitre <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)

> BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> --window=250" has a radically different memory usage profile whether you 
> do the repack on the earlier 2.1GB pack or the later 300MB pack.  

If you repack on the smaller pack file, git has to expand more stuff
internally in order to search the deltas, whereas with the larger pack
file I bet git has to less often undelta'ify to get base objects blobs
for delta search.

In fact that behavior makes perfect sense to me and I don't understand
GIT internals very well :-)


Re: Something is broken in repack

2007-12-11 Thread Daniel Berlin
On 12/11/07, Jon Smirl <[EMAIL PROTECTED]> wrote:
>
> Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of
> being faster are not true.

Depends on your allocation patterns. For our apps, it certainly is :)
Of course, i don't know if we've updated the external allocator in a
while, i'll bug the people in charge of it.


Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, David Miller wrote:

> From: Nicolas Pitre <[EMAIL PROTECTED]>
> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)
> 
> > BUT.  The point is that repacking the gcc repo using "git repack -a -f 
> > --window=250" has a radically different memory usage profile whether you 
> > do the repack on the earlier 2.1GB pack or the later 300MB pack.  
> 
> If you repack on the smaller pack file, git has to expand more stuff
> internally in order to search the deltas, whereas with the larger pack
> file I bet git has to less often undelta'ify to get base objects blobs
> for delta search.

Of course.  I came to that conclusion two days ago.  And despite being 
pretty familiar with the involved code (I wrote part of it myself) I 
just can't spot anything wrong with it so far.

But somehow the threading code keep distracting people from that issue 
since it gets to do the same work whether or not the source pack is 
densely packed or not.

Nicolas 
(who wish he had access to a much faster machine to investigate this issue)


Re: Something is broken in repack

2007-12-11 Thread Jon Smirl
On 12/11/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Tue, 11 Dec 2007, Jon Smirl wrote:
> >
> > So why does our threaded code take 20 CPU minutes longer (12%) to run
> > than the same code with a single thread?
>
> Threaded code *always* takes more CPU time. The only thing you can hope
> for is a wall-clock reduction. You're seeing probably a combination of
>  (a) more cache misses
>  (b) bigger dataset active at a time
> and a probably fairly miniscule
>  (c) threading itself tends to have some overheads.
>
> > Q6600 is just two E6600s in the same package, the caches are not shared.
>
> Sure they are shared. They're just not *entirely* shared. But they are
> shared between each two cores, so each thread essentially has only half
> the cache they had with the non-threaded version.
>
> Threading is *not* a magic solution to all problems. It gives you
> potentially twice the CPU power, but there are real downsides that you
> should keep in mind.
>
> > Why does the threaded code need 2.24GB (google allocator, 2.85GB gcc)
> > with 4 threads? But only need 950MB with one thread? Where's the extra
> > gigabyte going?
>
> I suspect that it's really simple: you have a few rather big files in the
> gcc history, with deep delta chains. And what happens when you have four
> threads running at the same time is that they all need to keep all those
> objects that they are working on - and their hash state - in memory at the
> same time!
>
> So if you want to use more threads, that _forces_ you to have a bigger
> memory footprint, simply because you have more "live" objects that you
> work on. Normally, that isn't much of a problem, since most source files
> are small, but if you have a few deep delta chains on big files, both the
> delta chain itself is going to use memory (you may have limited the size
> of the cache, but it's still needed for the actual delta generation, so
> it's not like the memory usage went away).

This makes sense. Those runs that blew up to 4.5GB were a combination
of this effect and fragmentation in the gcc allocator. Google
allocator appears to be much better at controlling fragmentation.

Is there a reasonable scheme to force the chains to only be loaded
once and then shared between worker threads? The memory blow up
appears to be directly correlated with chain length.

>
> That said, I suspect there are a few things fighting you:
>
>  - threading is hard. I haven't looked a lot at the changes Nico did to do
>a threaded object packer, but what I've seen does not convince me it is
>correct. The "trg_entry" accesses are *mostly* protected with
>"cache_lock", but nothing else really seems to be, so quite frankly, I
>wouldn't trust the threaded version very much. It's off by default, and
>for a good reason, I think.
>
>For example: the packing code does this:
>
> if (!src->data) {
> read_lock();
> src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> read_unlock();
> ...
>
>and that's racy. If two threads come in at roughly the same time and
>see a NULL src->data, theÿ́'ll both get the lock, and they'll both
>(serially) try to fill it in. It will all *work*, but one of them will
>have done unnecessary work, and one of them will have their result
>thrown away and leaked.

That may account for the threaded version needing an extra 20 minutes
CPU time.  An extra 12% of CPU seems like too much overhead for
threading. Just letting a couple of those long chain compressions be
done twice

>
>Are you hitting issues like this? I dunno. The object sorting means
>that different threads normally shouldn't look at the same objects (not
>even the sources), so probably not, but basically, I wouldn't trust the
>threading 100%. It needs work, and it needs to stay off by default.
>
>  - you're working on a problem that isn't really even worth optimizing
>that much. The *normal* case is to re-use old deltas, which makes all
>of the issues you are fighting basically go away (because you only have
>a few _incremental_ objects that need deltaing).

I agree, this problem only occurs when people import giant
repositories. But every time someone hits these problems they declare
git to be screwed up and proceed to thrash it in their blogs.

>In other words: the _real_ optimizations have already been done, and
>are done elsewhere, and are much smarter (the best way to optimize X is
>not to make X run fast, but to avoid doing X in the first place!). The
>thing you are trying to work with is the one-time-only case where you
>explicitly disable that big and important optimization, and then you
>complain about the end result being slow!
>
>It's like saying that you're compiling with extreme debugging and no
>optimizations, and then complaining that the end result doesn't run as
>fast as if you used -O2. Except this is a hundred ti

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator.

I disagree.  This is insane.

> Google allocator appears to be much better at controlling fragmentation.

Indeed.  And if fragmentation is indeed wasting half of Git's memory 
usage then we'll have to come with a custom memory allocator.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

No.  That would be the equivalent of holding each revision of all files 
uncompressed all at once in memory.

> > That said, I suspect there are a few things fighting you:
> >
> >  - threading is hard. I haven't looked a lot at the changes Nico did to do
> >a threaded object packer, but what I've seen does not convince me it is
> >correct. The "trg_entry" accesses are *mostly* protected with
> >"cache_lock", but nothing else really seems to be, so quite frankly, I
> >wouldn't trust the threaded version very much. It's off by default, and
> >for a good reason, I think.
> >
> >For example: the packing code does this:
> >
> > if (!src->data) {
> > read_lock();
> > src->data = read_sha1_file(src_entry->idx.sha1, &type, &sz);
> > read_unlock();
> > ...
> >
> >and that's racy. If two threads come in at roughly the same time and
> >see a NULL src->data, theÿ́'ll both get the lock, and they'll both
> >(serially) try to fill it in. It will all *work*, but one of them will
> >have done unnecessary work, and one of them will have their result
> >thrown away and leaked.
> 
> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

No it may not.  This theory is wrong as explained before.

> >
> >Are you hitting issues like this? I dunno. The object sorting means
> >that different threads normally shouldn't look at the same objects (not
> >even the sources), so probably not, but basically, I wouldn't trust the
> >threading 100%. It needs work, and it needs to stay off by default.
> >
> >  - you're working on a problem that isn't really even worth optimizing
> >that much. The *normal* case is to re-use old deltas, which makes all
> >of the issues you are fighting basically go away (because you only have
> >a few _incremental_ objects that need deltaing).
> 
> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

It's not only for repack.  Someone just reported git-blame being 
unusable too due to insane memory usage, which I suspect is due to the 
same issue.


Nicolas


Re: Something is broken in repack

2007-12-11 Thread Linus Torvalds


On Tue, 11 Dec 2007, Jon Smirl wrote:
> >
> > So if you want to use more threads, that _forces_ you to have a bigger
> > memory footprint, simply because you have more "live" objects that you
> > work on. Normally, that isn't much of a problem, since most source files
> > are small, but if you have a few deep delta chains on big files, both the
> > delta chain itself is going to use memory (you may have limited the size
> > of the cache, but it's still needed for the actual delta generation, so
> > it's not like the memory usage went away).
> 
> This makes sense. Those runs that blew up to 4.5GB were a combination
> of this effect and fragmentation in the gcc allocator. Google
> allocator appears to be much better at controlling fragmentation.

Yes. I think we do have some case where we simply keep a lot of objects 
around, and if we are talking reasonably large deltas, we'll have the 
whole delta-chain in memory just to unpack one single object.

The delta cache size limits kick in only when we explicitly cache old 
delta results (in case they will be re-used, which is rather common), it 
doesn't affect the normal "I'm using this data right now" case at all.

And then fragmentation makes it much much worse. Since the allocation 
patterns aren't nice (they are pretty random and depend on just the sizes 
of the objects), and the lifetimes aren't always nicely nested _either_ 
(they become more so when you disable the cache entirely, but that's just 
death for performance), I'm not surprised that there can be memory 
allocators that end up having some issues.

> Is there a reasonable scheme to force the chains to only be loaded
> once and then shared between worker threads? The memory blow up
> appears to be directly correlated with chain length.

The worker threads explicitly avoid touching the same objects, and no, you 
definitely don't want to explode the chains globally once, because the 
whole point is that we do fit 15 years worth of history into 300MB of 
pack-file thanks to having a very dense representation. The "loaded once" 
part is the mmap'ing of the pack-file into memory, but if you were to 
actually then try to expand the chains, you'd be talking about many *many* 
more gigabytes of memory than you already see used ;)

So what you actually want to do is to just re-use already packed delta 
chains directly, which is what we normally do. But you are explicitly 
looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
it then blows up.

I'm sure we can find places to improve. But I would like to re-iterate the 
statement that you're kind of doing a "don't do that then" case which is 
really - by design - meant to be done once and never again, and is using 
resources - again, pretty much by design - wildly inappropriately just to 
get an initial packing done.

> That may account for the threaded version needing an extra 20 minutes
> CPU time.  An extra 12% of CPU seems like too much overhead for
> threading. Just letting a couple of those long chain compressions be
> done twice

Well, Nico pointed out that those things should all be thread-private 
data, so no, the race isn't there (unless there's some other bug there).

> I agree, this problem only occurs when people import giant
> repositories. But every time someone hits these problems they declare
> git to be screwed up and proceed to thrash it in their blogs.

Sure. I'd love to do global packing without paying the cost, but it really 
was a design decision. Thanks to doing off-line packing ("let it run 
overnight on some beefy machine") we can get better results. It's 
expensive, yes. But it was pretty much meant to be expensive. It's a very 
efficient compression algorithm, after all, and you're turning it up to 
eleven ;)

I also suspect that the gcc archive makes things more interesting thanks 
to having some rather large files. The ChangeLog is probably the worst 
case (large file with *lots* of edits), but I suspect the *.po files 
aren't wonderful either.

Linus


Re: Something is broken in repack

2007-12-11 Thread Junio C Hamano
Linus Torvalds <[EMAIL PROTECTED]> writes:

> On Tue, 11 Dec 2007, Jon Smirl wrote:
>> >
>> > So if you want to use more threads, that _forces_ you to have a bigger
>> > memory footprint, simply because you have more "live" objects that you
>> > work on. Normally, that isn't much of a problem, since most source files
>> > are small, but if you have a few deep delta chains on big files, both the
>> > delta chain itself is going to use memory (you may have limited the size
>> > of the cache, but it's still needed for the actual delta generation, so
>> > it's not like the memory usage went away).
>> 
>> This makes sense. Those runs that blew up to 4.5GB were a combination
>> of this effect and fragmentation in the gcc allocator. Google
>> allocator appears to be much better at controlling fragmentation.
>
> Yes. I think we do have some case where we simply keep a lot of objects 
> around, and if we are talking reasonably large deltas, we'll have the 
> whole delta-chain in memory just to unpack one single object.

Eh, excuse me.  unpack_delta_entry()

 - first unpacks the base object (this goes recursive);
 - uncompresses the delta;
 - applies the delta to the base to obtain the target object;
 - frees delta;
 - frees (but allows it to be cached) the base object;
 - returns the result

So no matter how deep a chain is, you keep only one delta at a time in
core, not whole delta-chain in core.

> So what you actually want to do is to just re-use already packed delta 
> chains directly, which is what we normally do. But you are explicitly 
> looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
> it then blows up.

While that does not explain, as Nico pointed out, the huge difference
between the two repack runs that have different starting pack, I would
say it is a fair thing to say.  If you have a suboptimal pack (i.e. not
enough reusable deltas, as in the 2.1GB pack case), do run "repack -f",
but if you have a good pack (i.e. 300MB pack), don't.


Re: Something is broken in repack

2007-12-11 Thread Andreas Ericsson

Nicolas Pitre wrote:

On Tue, 11 Dec 2007, David Miller wrote:


From: Nicolas Pitre <[EMAIL PROTECTED]>
Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST)

BUT.  The point is that repacking the gcc repo using "git repack -a -f 
--window=250" has a radically different memory usage profile whether you 
do the repack on the earlier 2.1GB pack or the later 300MB pack.  

If you repack on the smaller pack file, git has to expand more stuff
internally in order to search the deltas, whereas with the larger pack
file I bet git has to less often undelta'ify to get base objects blobs
for delta search.


Of course.  I came to that conclusion two days ago.  And despite being 
pretty familiar with the involved code (I wrote part of it myself) I 
just can't spot anything wrong with it so far.


But somehow the threading code keep distracting people from that issue 
since it gets to do the same work whether or not the source pack is 
densely packed or not.


Nicolas 
(who wish he had access to a much faster machine to investigate this issue)


If it's still an issue next week, we'll have a 16 core (8 dual-core cpu's)
machine with some 32gb of ram in that'll be free for about two days.
You'll have to remind me about it though, as I've got a lot on my mind
these days.

--
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231


Re: Something is broken in repack

2007-12-11 Thread Andreas Ericsson

Junio C Hamano wrote:

Linus Torvalds <[EMAIL PROTECTED]> writes:

So what you actually want to do is to just re-use already packed delta 
chains directly, which is what we normally do. But you are explicitly 
looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why 
it then blows up.


While that does not explain, as Nico pointed out, the huge difference
between the two repack runs that have different starting pack, I would
say it is a fair thing to say.  If you have a suboptimal pack (i.e. not
enough reusable deltas, as in the 2.1GB pack case), do run "repack -f",
but if you have a good pack (i.e. 300MB pack), don't.



I think this is too much of a mystery for a lot of people to let it go.
Even I started looking into it, and I've got so little spare time just
now that I wouldn't stand much of a chance of making a contribution
even if I had written the code originally.

That being said, I the fact that some git repositories really *can't*
be repacked on some machines (because it eats ALL virtual memory) is
really something that lowers git's reputation among huge projects.

--
Andreas Ericsson   [EMAIL PROTECTED]
OP5 AB www.op5.se
Tel: +46 8-230225  Fax: +46 8-230231


RE: Help with another constraint

2007-12-11 Thread Balaji V. Iyer
Hello Everyone,
I got past that negdi2 and some errors..now I am trying to compile
some linux module, and it says I am not able to find this constraint:

init/main.c: In function 'start_kernel':
init/main.c:441: error: insn does not satisfy its constraints:
(insn 112 110 478 12 (set (mem:QI (reg/v/f:SI 16 r16 [orig:72 line.183 ]
[72]) [0 S1 A8])
(const_int 0 [0x0])) 16 {movqi} (nil)
(nil))
init/main.c:441: internal compiler error: in
reload_cse_simplify_operands, at postreload.c:391
Please submit a full bug report,

Here is what I have for movqi:

(define_insn "movqi"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=p,q,m,m,p,q,p,q")
(match_operand:QI 1 "general_operand"   "m,m,p,q,p,q,I,I"))]
  ""
  "*
  switch(which_alternative)
   {
 case 0:
 case 1:
   return \"l.lbz   \\t%0,%1\";
 case 2:
 case 3:
   return \"l.sb\\t%0,%1\";
 case 4:
 case 5:
   return \"l.ori   \\t%0,%1,0\\t # move reg to reg\";
 case 6:
 case 7:
   return \"l.addi  \\t%0,r0,%1\\t # move immediate\";
 default:
   return \"invalid alternative\";
   }
  "

To give a quick explanation: 
p = register numbers between 0-31 (inclusive)
q = register numbers between 32-63 (inclusive)

I = constant int value: ((VALUE) >=-32768 && (VALUE) <=32767)

So, what am I missing?

Any help is highly appreciated!
 

Thanking You,

Yours Sincerely,

Balaji V. Iyer.


-- 
 
Balaji V. Iyer
PhD Student, 
Center for Efficient, Scalable and Reliable Computing,
Department of Electrical and Computer Engineering,
North Carolina State University.


-Original Message-
From: 'Rask Ingemann Lambertsen' [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 10, 2007 12:16 PM
To: Balaji V. Iyer
Cc: gcc@gcc.gnu.org; [EMAIL PROTECTED]
Subject: Re: Help with another constraint

On Sun, Dec 09, 2007 at 11:35:32AM -0500, Balaji V. Iyer wrote:
> Hello Rask,
>   I am not understanding your response, can you clarify it for me?
> 
> As per the question  about the error message above?
> 
> ../../gcc-4.0.2/gcc/libgcc2.c -o libgcc/./_negdi2.o
> ../../gcc-4.0.2/gcc/libgcc2.c: In function '__negdi2':
> ../../gcc-4.0.2/gcc/libgcc2.c:72: error: insn does not satisfy its
> constraints:

   I think this is misleading you. It seems likely that the problem is
with the predicate and not the constraint.

> (insn 15 13 16 (set (mem:SI (plus:SI (reg/f:SI 2 r2)
   ^^^

   This has to be a register, doesn't it? If so, use -fdump-rtl-all and
look at the dump files to see where it goes wrong.

> (const_int -28 [0xffe4])) [0 D.1256+0 S4 A32])
> (neg:SI (reg:SI 3 r3 [orig:80 D.1255 ] [80]))) 38 {negsi2}
(nil)
> (nil))

   Please also post your negsi2 pattern.

--
Rask Ingemann Lambertsen
Danish law requires addresses in e-mail to be logged and stored for a
year



Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote:

> On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > On Tue, 11 Dec 2007, Nicolas Pitre wrote:
> >
> > > OK, here's something else for you to try:
> > >
> > >   core.deltabasecachelimit=0
> > >   pack.threads=2
> > >   pack.deltacachesize=1
> > >
> > > With that I'm able to repack the small gcc pack on my machine with 1GB
> > > of ram using:
> > >
> > >   git repack -a -f -d --window=250 --depth=250
> > >
> > > and top reports a ~700m virt and ~500m res without hitting swap at all.
> > > It is only at 25% so far, but I was unable to get that far before.
> >
> > Well, around 55% memory usage skyrocketed to 1.6GB and the system went
> > deep into swap.  So I restarted it with no threads.
> >
> > Nicolas (even more puzzled)
> 
> On the plus side you are seeing what I see, so it proves I am not imagining 
> it.

Well... This is weird.

It seems that memory fragmentation is really really killing us here.  
The fact that the Google allocator did manage to waste quite less memory 
is a good indicator already.

I did modify the progress display to show accounted memory that was 
allocated vs memory that was freed but still not released to the system.  
At least that gives you an idea of memory allocation and fragmentation 
with glibc in real time:

diff --git a/progress.c b/progress.c
index d19f80c..46ac9ef 100644
--- a/progress.c
+++ b/progress.c
@@ -8,6 +8,7 @@
  * published by the Free Software Foundation.
  */
 
+#include 
 #include "git-compat-util.h"
 #include "progress.h"
 
@@ -94,10 +95,12 @@ static int display(struct progress *progress, unsigned n, 
const char *done)
if (progress->total) {
unsigned percent = n * 100 / progress->total;
if (percent != progress->last_percent || progress_update) {
+   struct mallinfo m = mallinfo();
progress->last_percent = percent;
-   fprintf(stderr, "%s: %3u%% (%u/%u)%s%s",
-   progress->title, percent, n,
-   progress->total, tp, eol);
+   fprintf(stderr, "%s: %3u%% (%u/%u) %u/%uMB%s%s",
+   progress->title, percent, n, progress->total,
+   m.uordblks >> 18, m.fordblks >> 18,
+   tp, eol);
fflush(stderr);
progress_update = 0;
return 1;

This shows that at some point the repack goes into a big memory surge.  
I don't have enough RAM to see how fragmented memory gets though, since 
it starts swapping around 50% done with 2 threads.

With only 1 thread, memory usage grows significantly at around 11% with 
a pretty noticeable slowdown in the progress rate.

So I think the theory goes like this:

There is a block of big objects together in the list somewhere.  
Initially, all those big objects are assigned to thread #1 out of 4.  
Because those objects are big, they get really slow to delta compress, 
and storing them all in a window with 250 slots takes significant 
memory.

Threads 2, 3, and 4 have "easy" work loads, so they complete fairly 
quicly compared to thread #1.  But since the progress display is global 
then you won't notice that one thread is actually crawling slowly.

To keep all threads busy until the end, those threads that are done with 
their work load will steal some work from another thread, choosing the 
one with the largest remaining work.  That is most likely thread #1.  So 
as threads 2, 3, and 4 complete, they will steal from thread 1 and 
populate their own window with those big objects too, and get slow too.

And because all threads gets to work on those big objects towards the 
end, the progress display will then show a significant slowdown, and 
memory usage will almost quadruple.

Add memory fragmentation to that and you have a clogged system.

Solution: 

pack.deltacachesize=1
pack.windowmemory=16M

Limiting the window memory to 16MB will automatically shrink the window 
size when big objects are encountered, therefore keeping much fewer of 
those objects at the same time in memory, which in turn means they will 
be processed much more quickly.  And somehow that must help with memory 
fragmentation as well.

Setting pack.deltacachesize to 1 is simply to disable the caching of 
delta results entirely which will only slow down the writing phase, but 
I wanted to keep it out of the picture for now.

With the above settings, I'm currently repacking the gcc repo with 2 
threads, and memory allocation never exceeded 700m virt and 400m res, 
while the mallinfo shows about 350MB, and progress has reached 90% which 
has never occurred on this machine with the 300MB source pack so far.


Nicolas


porting gcc to tic54x

2007-12-11 Thread a2220333
hi,
I have been porting tic54x to gcc. I use gcc-4.2.2 version. I write some 
simplest c54x.h and c54x.c and a empty md, and I 
compile it to generate the tic54x-gcc compiler. 

But when I execute the compiler I generate I got a segmentation fault error. Is 
there anything must be define in c54x.c or 
c54x.h that could make the simplest compiler with no correct output and no 
errors? Because I want to add functions from this 
basic port.

thanks.

here is my files

/***c54x.h/
/* number of registers */
#define FIRST_PSEUDO_REGISTER 25
/* number of register classes */
#define N_REG_CLASSES 26
struct cumul_args {
 int has_varargs;
 int numarg;
};
#define CUMULATIVE_ARGS struct cumul_args
/* Node: Register Classes */
/* TODO: get rid of single-register classes? */
enum reg_class
{
 NO_REGS,
 IMR_REG,
 IFR_REG,
 A_REG,
 B_REG,
 T_REG,
 TRN_REG,
 SP_REG,
 BK_REG,
 BRC_REG,
 RSA_REG,
 REA_REG,
 PMST_REG,
 XPC_REG,
 DP_REG,
 ST_REGS,
 INT_REGS,
 STAT_REGS,
 ACC_REGS,
 BR_REGS,
 DBL_OP_REGS,
 AUX_REGS,
 ARSP_REGS,
 MMR_REGS,
 GENERAL_REGS,
 ALL_REGS,
 LIM_REG_CLASSES
};
#define STRICT_ALIGNMENT 1  /* Nothing is smaller than alignment.. */
#define BYTES_BIG_ENDIAN 0
#define FUNCTION_BOUNDARYBITS_PER_WORD
#define UNITS_PER_WORD  1
#define BIGGEST_ALIGNMENTBITS_PER_WORD*2
/* Node: 13.11 Trampolines for Nested Functions */
#define TRAMPOLINE_SIZE 2 /* Just a guess for now */
#define STACK_BOUNDARY  BITS_PER_WORD
#define PmodeQImode
/* Stack pointer */
#define SP_REGNO   16
#define STACK_POINTER_REGNUM SP_REGNO
#define AR7_REGNO  15
#define FRAME_POINTER_REGNUM AR7_REGNO
/* Fake argument pointer reg */
#define ARG_REGNO  24
#define ARG_POINTER_REGNUM   ARG_REGNO
#define WORDS_BIG_ENDIAN0
#define PARM_BOUNDARYBITS_PER_WORD
#define FUNCTION_MODE QImode
#define BASE_REG_CLASS ARSP_REGS
#define MOVE_MAX 1
#define BITS_BIG_ENDIAN 1
/* Node: 10.10.5 Elimination */
#define FRAME_POINTER_REQUIRED 0
/* Node: 13.15 Describing Relative Costs of Operations */
#define SLOW_BYTE_ACCESS 1
#define CASE_VECTOR_MODE QImode
/* Node: 13.13 Addressing Modes */
#define MAX_REGS_PER_ADDRESS 2
#define ASM_APP_ON  "#APP"
#define ASM_APP_OFF "#NO_APP"
#define STARTING_FRAME_OFFSET -1 /* Local frame starts just below the frame 
pointer */
/*sam added start*/
//optabs.c used this...
#define CODE_FOR_indirect_jump 8
/*sam added end*/
#define DEFAULT_SIGNED_CHAR 0  /* FIXME (ripped from c4x) */
/* FIXME: double check this */
#define INDEX_REG_CLASS NO_REGS
#define GO_IF_LEGITIMATE_ADDRESS(MODE, X, ADDR)   \
 do { \
 } while (0)
#define GO_IF_MODE_DEPENDENT_ADDRESS(ADDR, LABEL) \
 do { \
 } while(0);
/* registers that have a fixed purpose 
 *  * and can't be used for general tasks. */
#define FIXED_REGISTERS \
{ \
 /* IMR IFR ST0 ST1 A   B   T   TRN AR0 AR1 AR2 */ \
 1,  1,  1,  1,  0,  0,  0,  0,  0,  0,  0, \
 /* AR3 AR4 AR5 AR6 AR7 SP  BK  BRC RSA REA PMST XPC DP ARG*/ \
 0,  0,  0,  0,  0,  1,  0,  0,  0,  0,  1,   1,  1, 1  \
}
#define CALL_USED_REGISTERS \
{ \
 /* IMR IFR ST0 ST1 A   B   T   TRN AR0 AR1 AR2 */ \
 1,  1,  1,  1,  1,  1,  1,  1,  1,  0,  1, \
 /* AR3 AR4 AR5 AR6 AR7 SP  BK  BRC RSA REA PMST XPC DP ARG */ \
 1,  1,  1,  0,  0,  1,  1,  1,  1,  1,  1,   1,  1, 1 \
}
/* Defines which registers are in which classes */
#define REG_CLASS_CONTENTS \
{  \
 {0x}, /* NO_REGS */ \
 {0x0001}, /* IMR_REG */ \
 {0x0002}, /* IFR_REG */ \
 {0x0010}, /* A_REG */ \
 {0x0020}, /* B_REG */ \
 {0x0040}, /* T_REG */ \
 {0x0080}, /* TRN_REG */ \
 {0x0001}, /* SP_REG */ \
 {0x0002}, /* BK_REG */ \
 {0x0004}, /* BRC_REG */ \
 {0x0008}, /* RSA_REG */ \
 {0x0010}, /* REA_REG */ \
 {0x0020}, /* PMST_REG */ \
 {0x0040}, /* XPC_REG */ \
 {0x0080}, /* DP_REG */ \
 {0x000c}, /* ST_REGS */ \
 {0x0003}, /* INT_REGS */ \
 {0x002c}, /* STAT_REGS */ \
 {0x0030}, /* ACC_REGS */ \
 {0x001c}, /* BR_REGS */ \
 {0x3c00}, /* DBL_OP_REGS */ \
 {0xff00}, /* AUX_REGS */ \
 {0x0001ff00}, /* ARSP_REGS */ \
 {0x007fffcf}, /* MMR_REGS */ \
 {0x011efff0}, /* GENERAL_REGS */ \
 {0x}  /* ALL_REGS */ \
}
#define REGISTER_NAMES { \
 "imr", "ifr", "st0", "st1", \
 "a", "b",  "t", "trn", "ar0", "ar1", "ar2", \
 "ar3", "ar4", "ar5", "ar6", "ar7", \
 "sp", "bk", "brc", "rsa", "rea", \
 "pmst", "xpc", "dp", "arg" }
#define REG_CLASS_NAMES \
{   \
 "NO_REGS",  \
 "IMR_REG",  \
 "IFR_REG",  \
 "A_REG",\
 "B_REG",\
 "T_REG",\
 "TRN_REG",  \
 "SP_REG",   \
 "BK_REG",   \
 "BRC_REG",  \
 "RSA_REG",  \
 "REA_REG",  \
 "PMST_REG", \
 "XPC_REG",  \
 "DP_REG",   \
 "ST_REGS"