Re: Escape the unnecessary re-optimization in automatic parallelization.
Oh, hello Yuri, nice to meet you here :) Alexey On Tue, Oct 13, 2009 at 7:51 PM, Yuri Kashnikoff wrote: >> Therefore, the most effective way to address the issue of running redundant >> optimization passes in the context is probably to put it in the wider >> context >> of the work to allow external plugins to influence the pass sequence that is >> being applied, and to control this with machine learning. >> > Joern is right about external plugins. AFAIK, this work was done as a > part of GSoC#2009 project leaded by Grigori Fursin. Please check these > links for fufther information - > 1) > http://ctuning.org/wiki/index.php/CTools:ICI:Projects:GSOC09:Function_cloning_and_program_instrumentation#Work_with_adapt_plugin > 2) http://ctuning.org/wiki/index.php/CTools:ICI:Projects:GSOC09:Scripts > > In my opinion, you can easily edit XML's generated by adapt plugin to > exclude particular passes, then run scripts to make adapt plugin > substitute passes according to your edited XML files. > > Thanks! >
Re: Why auto variables NOT overlap on stack?
There's another funny thing about gcc3 behavior which I've just discovered: $ gcc -v 2>&1 | grep version gcc version 3.4.2 $ gcc -o mem mem.c ; ./mem -1024 $ gcc -o mem1 mem1.c ; ./mem1 0 $ cat mem.c #include int main() { char *p1, *p2; { char a[1024]; p1 = a; } { char a[1024]; p2 = a; } printf("%d\n", p2 - p1); return 0; } $ cat mem1.c #include static const int N = 1024; int main() { char *p1, *p2; { char a[N]; p1 = a; } { char a[N]; p2 = a; } printf("%d\n", p2 - p1); return 0; } Alexey
Re: clearing many bytes variables (could use one machine instruction)?
On Tue, Mar 9, 2010 at 3:58 PM, Basile Starynkevitch wrote: > Hello All, > > With a recently compiled gcc-trunk on x86-64/linux, I am compiling the > folllowing example: > > # > > /* file testmanychar.c */ > extern void g (int, char *, char *, char *); > > void > f (void) > { > char x0, x1, x2, x3, x4, x5, x6, x7; > /* assuming x0 is word aligned on a x86_64, and variables are bytes in > memory, we could clear all the variables in one machine instruction */ > x0 = x1 = x2 = x3 = x4 = x5 = x6 = x7 = (char) 0; > g (10, &x0, &x1, &x2); > g (20, &x2, &x3, &x4); > g (30, &x4, &x5, &x6); > g (40, &x6, &x7, &x0); > } > > # > > My intuition was that GCC could store x0 on a 64 bits aligned byte, and x1 > immediately after, and so one, and clear all the eight bytes at once using a > single machine instruction [clearing a 64 bits word]. > > But this is not the case, since > gcc-trunk -S -O3 -fverbose-asm testmanychar.c > gives the following code > > # > .type f, @function > f: > .LFB0: > .cfi_startproc > movq %rbx, -24(%rsp) #, > movq %rbp, -16(%rsp) #, > movl $10, %edi #, > movq %r12, -8(%rsp) #, > subq $40, %rsp #, > .cfi_def_cfa_offset 48 > leaq 13(%rsp), %rbx #, tmp58 > .cfi_offset 12, -16 > .cfi_offset 6, -24 > .cfi_offset 3, -32 > leaq 15(%rsp), %rbp #, tmp60 > leaq 14(%rsp), %rdx #, tmp59 > leaq 11(%rsp), %r12 #, tmp61 > movb $0, 8(%rsp) #, x7 > movb $0, 9(%rsp) #, x6 > movq %rbx, %rcx # tmp58, > movq %rbp, %rsi # tmp60, > movb $0, 10(%rsp) #, x5 > movb $0, 11(%rsp) #, x4 > movb $0, 12(%rsp) #, x3 > movb $0, 13(%rsp) #, x2 > movb $0, 14(%rsp) #, x1 > movb $0, 15(%rsp) #, x0 > call g # > leaq 12(%rsp), %rdx #, tmp62 > movq %r12, %rcx # tmp61, > movq %rbx, %rsi # tmp58, > movl $20, %edi #, > leaq 9(%rsp), %rbx #, tmp64 > call g # > leaq 10(%rsp), %rdx #, tmp65 > movq %rbx, %rcx # tmp64, > movq %r12, %rsi # tmp61, > movl $30, %edi #, > call g # > leaq 8(%rsp), %rdx #, tmp68 > movq %rbp, %rcx # tmp60, > movq %rbx, %rsi # tmp64, > movl $40, %edi #, > call g # > movq 16(%rsp), %rbx #, > movq 24(%rsp), %rbp #, > movq 32(%rsp), %r12 #, > addq $40, %rsp #, > .cfi_def_cfa_offset 8 > ret > .cfi_endproc > .LFE0: > .size f, .-f > .ident "GCC: (GNU) 4.5.0 20100309 (experimental) [trunk revision > 157303]" > > # > > > With > gcc-trunk -S -O3 -fverbose-asm -march=core2 -mtune=core2 testmanychar.c > I am getting still > > ## > > # options passed: testmanychar.c -march=core2 -mtune=core2 -O3 > > .globl f > .type f, @function > f: > .LFB0: > .cfi_startproc > movq %rbx, -24(%rsp) #, > movq %rbp, -16(%rsp) #, > movq %r12, -8(%rsp) #, > movl $10, %edi #, > subq $40, %rsp #, > .cfi_def_cfa_offset 48 > leaq 13(%rsp), %rbx #, tmp58 > .cfi_offset 12, -16 > .cfi_offset 6, -24 > .cfi_offset 3, -32 > leaq 15(%rsp), %rbp #, tmp60 > leaq 11(%rsp), %r12 #, tmp61 > leaq 14(%rsp), %rdx #, tmp59 > movq %rbx, %rcx # tmp58, > movq %rbp, %rsi # tmp60, > movb $0, 8(%rsp) #, x7 > movb $0, 9(%rsp) #, x6 > movb $0, 10(%rsp) #, x5 > movb $0, 11(%rsp) #, x4 > movb $0, 12(%rsp) #, x3 > movb $0, 13(%rsp) #, x2 > movb $0, 14(%rsp) #, x1 > movb $0, 15(%rsp) #, x0 > call g # > leaq 12(%rsp), %rdx #, tmp62 > movq %r12, %rcx # tmp61, > movq %rbx, %rsi # tmp58, > movl $20, %edi #, > leaq 9(%rsp), %rbx #, tmp64 > call g # > leaq 10(%rsp), %rdx #, tmp65 > movq %rbx, %rcx # tmp64, > movq %r12, %rsi # tmp61, > movl $30, %edi #, > call g # > leaq 8(%rsp), %rdx #, tmp68 > movq %rbp, %rcx # tmp60, > movq %rbx, %rsi # tmp64, > movl $40, %edi #, > call g # > movq 16(%rsp), %rbx #, > movq 24(%rsp), %rbp #, > movq 32(%rsp), %r12 #, > addq $40, %rsp #, > .cfi_def_cfa_offset 8 > ret > .cfi_endproc > .LFE0: > .size f, .-f > .ident "GCC: (GNU) 4.5.0 201003
Re: (un)aligned accesses on x86 platform.
On Tue, Mar 16, 2010 at 9:05 PM, Tristan Gingold wrote: > > On Mar 16, 2010, at 3:50 PM, H.J. Lu wrote: > >> 2010/3/8 Paweł Sikora : >>> hi, >>> >>> during development a cross platform appliacation on x86 workstation >>> i've enabled an alignemnt checking [1] to catch possible erroneous >>> code before it appears on client's sparc/arm cpu with sigbus ;) >>> >>> it works pretty fine and catches alignment violations but Jakub Jelinek >>> had told me (on glibc bugzilla) that gcc on x86 can still dereference >>> an unaligned pointer (except for vector insns). >>> i suppose it means that gcc can emit e.g. movl for access a short int >>> (or maybe others scenarios) in some cases and violates cpu alignment rules. >>> >>> so, is it possible to instruct gcc-x86 to always use suitable loads/stores >>> like on sparc/arm? >>> >>> [1] "AC" bit - http://en.wikipedia.org/wiki/FLAGS_register_(computing) >>> >> >> I am interested in an -mstrict-alignment option for x86. > > Not sure it will be useful. The libc still does unaligned accesses IIRC. > > Wow. What for? Alexey
Re: (un)aligned accesses on x86 platform.
On Tue, Mar 16, 2010 at 9:48 PM, Tristan Gingold wrote: > > On Mar 16, 2010, at 4:37 PM, Alexey Salmin wrote: >>>> I am interested in an -mstrict-alignment option for x86. >>> >>> Not sure it will be useful. The libc still does unaligned accesses IIRC. >>> >> >> Wow. What for? > > Well, simply because it is not compiled with strict alignment. There might > also be some optimization in > memory operation that does unaligned accesses. I always thought that unaligned access is much slower than aligned one. You mean code-size optimizations? Alexey
Re: Question about perl while bootstrapping gcc
On 4/17/10, Alan Lehotsky wrote: > Take a look at the at(1) or batch(1) commands if you really want to execute > a command and logout while it's still running. > Or screen(1)
Re: change to gcc from lcc
2008/11/20 Michael Matz <[EMAIL PROTECTED]>: > Hi, > > On Wed, 19 Nov 2008, H.J. Lu wrote: > >> On Wed, Nov 19, 2008 at 7:18 PM, Nicholas Nethercote >> <[EMAIL PROTECTED]> wrote: >> > On Tue, 18 Nov 2008, H.J. Lu wrote: >> > >> >>> I used malloc to create my arrays instead of creating the in the stack. >> >>> My program is working now but it is very slow. >> >>> >> >>> I use two-dimensional arrays. The way I access element (i,j) is: >> >>> array_name[i*row_length+j] >> >>> >> >>> The server that I use has 16GB ram. The ulimit -a command gives the >> >>> following output: >> >>> time(seconds)unlimited >> >>> file(blocks) unlimited >> >>> data(kbytes) unlimited >> >>> stack(kbytes)8192 >> >> >> >> >> >> >> >> That limits stack to 8MB. Please change it to 1GB. >> > >> > Why? >> > >> >> int buffer1[250][100]; >> >> takes close to 1GB on stack. > > Read the lines you quoted carefully again. > > > Ciao, > Michael. > Can you please talk in a more understandable way? I also think that 4 * 250 * 100 is close to 1073741824 which is 1 Gb. And automatic variables are allocated in stack (which is 8Mb here) instead of data segment. Alexey
Fwd: A bug?
2008/12/16 Dennis Clarke : > >> Hi, >> >> The following program segfaults when compiled with gcc >> but runs fine when compiled with g++ or icc (the intel C compiler) >> >> #include >> struct Hello { >> char world[20]; >> }; >> struct Hello s(){ >> struct Hello r; >> r.world[0]='H'; >> r.world[1]='\0'; >> return r; >> } >> >> int main(){ >> printf("%s\n",s().world); >> } >> >> Assigning s() to a variable and then using the variable avoids the >> segfault. > > compiles and works fine with GCC 4.3.2 on Solaris 8/9/10 sun4m/sun4u/i386 > > $ /opt/csw/gcc4/bin/gcc -v -o foo.o -c foo.c > Using built-in specs. > Target: sparc-sun-solaris2.8 > Configured with: ../gcc-4.3.2/configure --prefix=/opt/csw/gcc4 > --with-local-prefix=/opt/csw --with-as=/usr/ccs/bin/as --without-gnu-ld > --with-ld=/usr/ccs/bin/ld --with-cpu=v7 --enable-threads=posix > --enable-nls --enable-shared --enable-languages=c,c++,fortran,objc > --with-gmp=/opt/csw --with-mpfr=/opt/csw --enable-multilib > --with-included-gettext --with-libiconv-prefix=/opt/csw --with-x > --enable-java-awt=xlib --with-system-zlib --enable-bootstrap > Thread model: posix > gcc version 4.3.2 (GCC) > COLLECT_GCC_OPTIONS='-v' '-o' 'foo.o' '-c' '-mcpu=v7' > /opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/4.3.2/cc1 -quiet -v foo.c > -quiet -dumpbase foo.c -mcpu=v7 -auxbase-strip foo.o -version -o > /var/tmp//ccAHrz2q.s > ignoring nonexistent directory > "/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/../../../../sparc-sun-solaris2.8/include" > #include "..." search starts here: > #include <...> search starts here: > /opt/csw/include > /opt/csw/gcc4/include > /opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/include > /opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/include-fixed > /usr/include > End of search list. > GNU C (GCC) version 4.3.2 (sparc-sun-solaris2.8) >compiled by GNU C version 4.3.2, GMP version 4.2.2, MPFR version > 2.3.1. > warning: GMP header version 4.2.2 differs from library version 4.2.4. > GGC heuristics: --param ggc-min-expand=47 --param ggc-min-heapsize=32768 > Compiler executable checksum: 1ac791ab3c2b7cc8775dc74d45095fef > COLLECT_GCC_OPTIONS='-v' '-o' 'foo.o' '-c' '-mcpu=v7' > /usr/ccs/bin/as -V -Qy -s -xarch=v8 -o foo.o /var/tmp//ccAHrz2q.s > /usr/ccs/bin/as: Sun WorkShop 6 2003/12/18 Compiler Common 6.0 Patch > 114802-02 > COMPILER_PATH=/opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/4.3.2/:/opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/4.3.2/:/opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/:/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/:/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/:/usr/ccs/bin/ > LIBRARY_PATH=/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/:/usr/ccs/lib/:/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/../../../:/lib/:/usr/lib/ > COLLECT_GCC_OPTIONS='-v' '-o' 'foo.o' '-c' '-mcpu=v7' > > $ /opt/csw/gcc4/bin/gcc -v -o foo.s -S -c foo.c > Using built-in specs. > Target: sparc-sun-solaris2.8 > Configured with: ../gcc-4.3.2/configure --prefix=/opt/csw/gcc4 > --with-local-prefix=/opt/csw --with-as=/usr/ccs/bin/as --without-gnu-ld > --with-ld=/usr/ccs/bin/ld --with-cpu=v7 --enable-threads=posix > --enable-nls --enable-shared --enable-languages=c,c++,fortran,objc > --with-gmp=/opt/csw --with-mpfr=/opt/csw --enable-multilib > --with-included-gettext --with-libiconv-prefix=/opt/csw --with-x > --enable-java-awt=xlib --with-system-zlib --enable-bootstrap > Thread model: posix > gcc version 4.3.2 (GCC) > COLLECT_GCC_OPTIONS='-v' '-o' 'foo.s' '-S' '-c' '-mcpu=v7' > /opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/4.3.2/cc1 -quiet -v foo.c > -quiet -dumpbase foo.c -mcpu=v7 -auxbase-strip foo.s -version -o foo.s > ignoring nonexistent directory > "/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/../../../../sparc-sun-solaris2.8/include" > #include "..." search starts here: > #include <...> search starts here: > /opt/csw/include > /opt/csw/gcc4/include > /opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/include > /opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/include-fixed > /usr/include > End of search list. > GNU C (GCC) version 4.3.2 (sparc-sun-solaris2.8) >compiled by GNU C version 4.3.2, GMP version 4.2.2, MPFR version > 2.3.1. > warning: GMP header version 4.2.2 differs from library version 4.2.4. > GGC heuristics: --param ggc-min-expand=47 --param ggc-min-heapsize=32768 > Compiler executable checksum: 1ac791ab3c2b7cc8775dc74d45095fef > COMPILER_PATH=/opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/4.3.2/:/opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/4.3.2/:/opt/csw/gcc4/libexec/gcc/sparc-sun-solaris2.8/:/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/:/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/:/usr/ccs/bin/ > LIBRARY_PATH=/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/:/usr/ccs/lib/:/opt/csw/gcc4/lib/gcc/sparc-sun-solaris2.8/4.3.2/../../../:/lib/:/usr/lib/ > COLLECT_GCC_OPTIONS='-v' '-o' 'foo.s' '-S' '-c' '-mcpu=v7' > > $ cat foo.s >.file "foo.c"
Re: -pthread
> On GNU/Linux, you need to use -pthread or -D_REENTRANT at compilation > time, and you need to use -pthread or -lpthread at link time. ? gcc mailing list certainly worth reading for users. I always thought that -lpthread at link time is enough. Alexey
Lexer/cpplib improvements
Hello! I want to join the gcc development process and I decided that Lexer/cpplib will be a good place to start. It's quite interesting for me, I have some experience in this theme in few projects, fortunately there is a http://gcc.gnu.org/onlinedocs/cppinternals/";>special manual and http://gcc.gnu.org/wiki/Speedup_areas";>some work to do. Also I'd like to participate in the Google Summer of Code with this project. Is it enough complicated? Is there any sense of writing an application? Who can be my mentor? Even if i don't join the GSOC I still need a person who can help me sometimes with this task. Alexey Salmin
XDELETEVEC
Hello! I'm learning my way around the gcc lexer/cpplib code and I have a question about the way it works with memory buffers. It seems that arrays are allocated with XNEWVEC macro - generally a good idea of course. So I expected to see memory freed with the corresponding macro XDELETEVEC and was surprised to find out that it is being freed with the simple "free" function in the most cases. [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep XDELETEVEC * | wc -l 5 [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep 'free (' * | wc -l 64 I've checked the Partial transitions list for the corresponding item or something but have not found anything. So I want to ask: what's wrong with XDELETEVEC (and XDELETE as well)? Alexey Salmin
GSOC Student application
Hello, here's my application. Please, leave your comments as I still have two days to fix it if something is wrong :) Project I want to make some improvements in the Lexer/cpplib area: 1) Change the way of file handling -- Mmap file into memory if possible instead of allocating a buffer (if no character conversation is needed) -- Find the boundaries of line which for conversation is needed instead of converting the whole buffer. 2) Replace all malloc/free functions with XNEW/XDELETE (XNEWVEC, XDELETEVEC) macro. 3) Some small miscellaneous changes -- Improve the developer's documentation and comments -- Add a ru.po file for the libcpp Why is it useful for GCC? (corresponding to the project items) 1) The compile time and, probably, memory usage will be reduced. 2) Hard to say anything here. I have no idea why malloc/free functions are still used in the code since XNEW/XDELETE are supposed to be there. (Or may be I'm wrong? I've asked here once about this and it seems that I'm right.) 3) A good documentation is important for understanding the source code. The long sequence of mails in this list called "How to understand the gcc source code" is demonstrative. Why should I do this? 1. My knowledge in C programming language is very good. 2. I have some expirience in tokenization. 3. I want to join the GCC development process independently of GSOC, I will continue my work and supply my code after the end of the summer of code. 4. I have some expirience writing with the C++ language. May be it's not enough to develop big projects with it but it's definite enough to lex it :) 5. Finally I have a "Compilers: Principles, Techniques, and Tools" book written by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. I'm joking of course :) Biography I'm 18 years old student learning in Novosibirsk State University, Russia. I've been working with linux for 4 years, I enjoy writing C code and I always wanted to join some really great project like GCC. PS Where am I supposed to send this mail? I've seen no special address for GSOC applications so I sent it here. But I've seen no other applications in this list so I'm confused :)
Re: GSOC Student application
> There are issues of Garbage Collection from libgcc or Boehms's GC > that you possibly can't use another allocators that these defaults, > unless you have control of the manager of the whole memory, > and it's too complex due to the gigant size of the project. [EMAIL PROTECTED]:~/gcc/src/include$ grep XNEW libiberty.h #define XNEW(T) ((T *) xmalloc (sizeof (T))) #define XNEWVEC(T, N) ((T *) xmalloc (sizeof (T) * (N))) #define XNEWVAR(T, S) ((T *) xmalloc ((S))) [EMAIL PROTECTED]:~/gcc/src/libiberty$ grep -A 11 '^xmalloc (' xmalloc.c xmalloc (size_t size) { PTR newmem; if (size == 0) size = 1; newmem = malloc (size); if (!newmem) xmalloc_failed (size); return (newmem); } So, you can see that XNEW* macro are now exactly the same as just malloc function and they were added only for possible future change of the memory allocator. Any malloc function should be repalced with this macro AFAIK. And the worst thing I can see in the code is freeing the memory allocated with XNEW macro. It works fine now but it's wrong as I understand. [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep XNEW * | wc -l 66 [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep XDELETE * | wc -l 6 [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep free * | wc -l 153 [EMAIL PROTECTED]:~/gcc/src/libcpp$ grep malloc * | wc -l 13 > You must know that before optimizing anything, you must profile the > whole code (-pg, gprof, ...) and study the beautiful formula of > "Amdahl's Law" for sequential machines in some books. > > Studied this law, you can optimize better than your previous knowledge. I know what profiling is. And I know how parallel programs work, thanks. I'm just talknig here about distinct improvements I can do, not about some abstract optimizing. > > Luck U.S.S.R. boy ;) > Yes, I've been living in USSR for 2 first years of my life :)