How could I get alias set information from data_reference_p
Hi, I'm now working on Graphite branch and need to know the alias set information for each data_reference_p, which would be an integer (or alias_set_type) stands for which alias set it is in. I tried to get the alias set information with get_alias_set (tree) (I've no idea how this function works, just a experimental trying), for my testcase, it returns 2 for all the data_reference_p->ref, which I think is unreasonable. So I think I may go wrong somewhere. The question will be: how could I get it's relevant alias set information from data_reference_p? p.s. : In Graphite, the data reference was built for each gimple stmt with: get_references_in_stmt (stmt, &references); then for each ref in references, data reference is created with: dr = create_data_ref (nest, *ref->pos, stmt, ref->is_read); Thanks, Li
Re: How could I get alias set information from data_reference_p
Hi Richard, On Tue, Jul 14, 2009 at 4:54 PM, Richard Guenther wrote: > On Tue, Jul 14, 2009 at 8:01 AM, Li Feng wrote: >> Hi, >> >> I'm now working on Graphite branch and need to know >> the alias set information for each data_reference_p, which >> would be an integer (or alias_set_type) stands for which >> alias set it is in. >> >> I tried to get the alias set information with get_alias_set (tree) >> (I've no idea how this function works, just a experimental >> trying), for my testcase, it returns 2 for all the >> data_reference_p->ref, which I think is unreasonable. >> So I think I may go wrong somewhere. >> >> The question will be: how could I get it's relevant >> alias set information from data_reference_p? >> >> p.s. : >> In Graphite, the data reference was built for each >> gimple stmt with: >> get_references_in_stmt (stmt, &references); >> then for each ref in references, data reference is created with: >> dr = create_data_ref (nest, *ref->pos, stmt, ref->is_read); > > get_alias_set (dr->ref) is the correct thing. Why do you think it > is unreasonable to return 2 for all of them? Why do you need > alias-set information anyway? The testcase looks like this, where I assume that p and a in the same alias set, while q and Q in another, and so on. But between them, the alias set number should be different, otherwise maybe I misunderstood somewhere about the alias set. int Q[10]; int foo() { int i; int a[10], b[20]; int *p = a; int *q = Q; for (i = 0; i < 10; i++) { p[i] = i; a[i]= i - 5; b[i] = i*i; q[i] = 5; } return Q[3]*a[2]*b[10]; } For you question, We are using this information before dependency checking under polyhedron, if 2 data references get different dimensions and they are not in the same alias set, we could conclude that they takes no dependency. Li
Re: How could I get alias set information from data_reference_p
Hi Richard, Maybe you could look into this thread and give us some suggestion/confirmation. Now we plan to use dr_may_alias_p (DR1, DR2) to partition the alias set. http://groups.google.de/group/gcc-graphite/browse_thread/thread/7bffbe9037b5adf4?hl=en Thanks, Li On Wed, Jul 15, 2009 at 5:34 AM, Richard Guenther wrote: > On Tue, Jul 14, 2009 at 6:08 PM, Sebastian Pop wrote: >> On Tue, Jul 14, 2009 at 11:03, Sebastian Pop wrote: Why do you need alias-set numbers? >>> >>> We want to represent the alias set information as an extra subscript >>> on memory accesses: for example, >>> >>> if we have A[10] and supposing that A is in alias set 6, this would be >>> represented as "memory_access[6][10]". >>> >>> For B[3][4] with base array B in alias set 7 and 8, we would represent >>> this as "memory_access[7][3][4] or memory_access[8][3][4]". >>> >>> The data dependence test that we currently have in the graphite branch >>> would work with alias sets represented by an extra dimension to the >>> array dimensions. >> >> And by the way, right now we consider that all the data refs alias each >> other, that means that we set the alias dimension of the memory >> accesses to 1 for every data reference. This makes the test very >> conservative: in the example above, with the current implementation, >> we would have: >> >> A: memory_access[1][10] >> B: memory_access[1][3][4] > > Well, so you really want to have sort-of the base objects partitioned > into must(!)-conflict sets? Thus, > > (*p)[1][2] > (*q)[1][3] > > is only correct if the resulting accesses may not alias? (that is, > p == q?) > > No, we don't have such information readily available. You have > to compute it. Likely by building a complete conflict map of > the bases of all memory references and partitioning it. > > Which doesn't sound like a great idea - it's quadratic. Thus, can't > you represent this in a more sparse way? > > Richard. > >> Sebastian >> >
Re: How could I get alias set information from data_reference_p
Hi Richard, On 7/16/09, Richard Guenther wrote: > On Thu, Jul 16, 2009 at 1:15 AM, Tobias > Grosser wrote: >> On Wed, 2009-07-15 at 22:48 +0200, Richard Guenther wrote: >>> On Wed, Jul 15, 2009 at 10:46 PM, Richard >>> Guenther wrote: >>> > On Wed, Jul 15, 2009 at 9:15 PM, Tobias >>> > Grosser wrote: >>> >>> A note on Lis final graph algorithm. I don't understand why you >>> >>> want >>> >>> to allow data-references to be part of multiple alias-sets? (Of >>> >>> course >>> >>> I don't know how you are going to use the alias-sets ...) >>> >> >>> >> Just to pass more information to Graphite. The easiest example might >>> >> be >>> >> something like >>> >> >>> >> A -- B -- C >>> >> >>> >> if we have >>> >> >>> >> AS1 = {A,B} >>> >> AS2 = {B,C} >>> >> >>> >> we know that A and C do not alias and therefore do not have any >>> > >>> > No, from the above you _don't_ know that. How would you arrive >>> > at that conclusion? >>> >>> What I want to say is that, if A -- B -- C is supposed to be the alias >>> graph >>> resulting from querying the alias oracle for the pairs (A, B), (A, C), >>> (B, C) >>> then this is a result that will never occur. Because if (A, B) is true >>> and (B, C) is true then (A, C) will be true as well. >> >> What for example for this case: >> >> void foo (*b) { >> int *a >> int *c >> >> if (bar()) >>a = b; >> else >>c = b; >> } >> >> I thought this may give us the example above, but it seems I am wrong. >> If the alias oracle is transitive that would simplify the algorithm a >> lot. Can we rely on the transitivity? > > Actually I was too fast (or rather it was too late), an example with > A -- B -- C would be > > int a, c; > void foo(int *p) > > with B == (*p). B may alias a and c but a may not alias c. > > So, back to my first question then, which is still valid. > > Just to pass more information to Graphite. The easiest example might be > something like > > A -- B -- C > > if we have > > AS1 = {A,B} > AS2 = {B,C} > > we know that A and C do not alias and therefore do not have any > dependencies. > > How do you derive at 'A and C do not alias' from looking at > the alias set numbers for AS1 and AS2. How do you still > figure out that B aliases A and C just from looking at > the alias set numbers? Or rather, what single alias set number > does B get? AS1 = {A,B} AS2 = {B,C} B is not neccessary to have only a single alias set number, for this situation, B will have alias number both 1 and 2 (it is in both alias set), A will be with alias number 1 and C will be with alias number 2. So A and C got different alias set number, we could conclude that they are not alias. While for A and B or B and C, as B got alias number both 1 and 2, so they may alias. Li
Re: How could I get alias set information from data_reference_p
Hi Daniel, On Thu, Jul 16, 2009 at 11:45 PM, Daniel Berlin wrote: > On Thu, Jul 16, 2009 at 5:00 AM, Li Feng wrote: >> Hi Richard, >> On 7/16/09, Richard Guenther wrote: >>> On Thu, Jul 16, 2009 at 1:15 AM, Tobias >>> Grosser wrote: >>>> On Wed, 2009-07-15 at 22:48 +0200, Richard Guenther wrote: >>>>> On Wed, Jul 15, 2009 at 10:46 PM, Richard >>>>> Guenther wrote: >>>>> > On Wed, Jul 15, 2009 at 9:15 PM, Tobias >>>>> > Grosser wrote: >>>>> >>> A note on Lis final graph algorithm. I don't understand why you >>>>> >>> want >>>>> >>> to allow data-references to be part of multiple alias-sets? (Of >>>>> >>> course >>>>> >>> I don't know how you are going to use the alias-sets ...) >>>>> >> >>>>> >> Just to pass more information to Graphite. The easiest example might >>>>> >> be >>>>> >> something like >>>>> >> >>>>> >> A -- B -- C >>>>> >> >>>>> >> if we have >>>>> >> >>>>> >> AS1 = {A,B} >>>>> >> AS2 = {B,C} >>>>> >> >>>>> >> we know that A and C do not alias and therefore do not have any >>>>> > >>>>> > No, from the above you _don't_ know that. How would you arrive >>>>> > at that conclusion? >>>>> >>>>> What I want to say is that, if A -- B -- C is supposed to be the alias >>>>> graph >>>>> resulting from querying the alias oracle for the pairs (A, B), (A, C), >>>>> (B, C) >>>>> then this is a result that will never occur. Because if (A, B) is true >>>>> and (B, C) is true then (A, C) will be true as well. >>>> >>>> What for example for this case: >>>> >>>> void foo (*b) { >>>> int *a >>>> int *c >>>> >>>> if (bar()) >>>> a = b; >>>> else >>>> c = b; >>>> } >>>> >>>> I thought this may give us the example above, but it seems I am wrong. >>>> If the alias oracle is transitive that would simplify the algorithm a >>>> lot. Can we rely on the transitivity? >>> >>> Actually I was too fast (or rather it was too late), an example with >>> A -- B -- C would be >>> >>> int a, c; >>> void foo(int *p) >>> >>> with B == (*p). B may alias a and c but a may not alias c. >>> >>> So, back to my first question then, which is still valid. >>> >>> Just to pass more information to Graphite. The easiest example might be >>> something like >>> >>> A -- B -- C >>> >>> if we have >>> >>> AS1 = {A,B} >>> AS2 = {B,C} >>> >>> we know that A and C do not alias and therefore do not have any >>> dependencies. >>> >>> How do you derive at 'A and C do not alias' from looking at >>> the alias set numbers for AS1 and AS2. How do you still >>> figure out that B aliases A and C just from looking at >>> the alias set numbers? Or rather, what single alias set number >>> does B get? >> AS1 = {A,B} >> AS2 = {B,C} >> >> B is not neccessary to have only a single alias set number, >> for this situation, B will have alias number both 1 and 2 (it >> is in both alias set), >> A will be with alias number 1 and >> C will be with alias number 2. >> So A and C got different alias set number, we could conclude >> that they are not alias. >> While for A and B or B and C, as B got alias number both 1 and 2, >> so they may alias. > > So if i understand you right, it seems all you've done is inverted the > existing alias/points-to sets. > IE instead of saying A has B, C, D in it's alias set, you are saying B > is in the alias set of A, C is in the alias set of A, D is in the > alias set of A. > > Effectively, > > A -> {B, C, D} > B -> {C, D, E} > becomes > B -> A > C -> A, B > D -> A ,B > E -> B > I'm not sure about your meaning by B,C,D in A's alias set. If that means BandCandD may alias to A, but without saying B,C,D will alias to each other or not, then I think we may misunderstand somewhere. If I understand you correctly, in your example, you mean
[GSoC] Status of Automatic parallelization in Graphite.
Hi all, Working with great Graphite developers, I have finished this summer's GSoC project. So I think it's time to summarize this summer's work and plan next. For details about autopar in Graphite, you could refer to this page: http://gcc.gnu.org/wiki/Graphite/Parallelization and this page: http://gcc.gnu.org/wiki/AutoparRelated I blog here for this project: http://summergraphite.blogspot.com/ Here is a short summary. * Current status: First step of autopar in Graphite has been done. You can trigger it by 2 flags -floop-parallelize-all -ftree-parallelize-loops=4. Both of them are needed, the first flag will trigger Graphite pass to mark loops that can be parallel and the second flag will trigger the code generation part. Now Autopar in Graphite can handle loops: 1. Common loops without dependency (Checking dependency under polyhedral model). 2. Loops with if statement. 3. Some of the triangle loops. * Limitations (To be enhanced) 1. Code generation only works for the innermost. 2. We can not handle all the triangle loops. 3. Missing reduction support. * Open projects 1. Heuristics for automatic parallelization. 2. Parallelize non innermost loops. 3. Handle reductions in graphite 4. Advanced automatic parallelization. (For details about these open projects, you could refer to this page: http://gcc.gnu.org/wiki/Graphite/Parallelization ) * Plan next This project is only the first step of my contribution to GCC communicy, so I will continue my working with Graphite developers for later works. It's really fun to work together, and discuss together, so if you also interested in Graphite development, just leave us a post in our maillist. To Graphite developers, I would say it's really great to work with you guys, and thanks for your great help. Let's keep our good working goes on. :) Cheers, Li
Escape the unnecessary re-optimization in automatic parallelization.
Hi, I'm considering how to escape the unnecessary re-optimization in automatic parallelization. e.g. in a source code the function fun() one loop is considered can_be_parallel by Graphite pass, and in pass tree-parloop.c, will call the code generation part for this loop and generate newly created fun.loop_f0(), note that fun.loop_f0 is a function that has some omp directive added on optimized function fun()'s loop (It has been optimized by the passes before tree-parloop.c), and to my observation, the newly created function fun.loop_f0 will be passed through the first pass again (Is this statement corret?). So my question is, 1. Is this necessary/correct if we want to escape the re-optimization for the first few passes before tree-parloop.c and continue the optimization passes after it for the function fun.loop_f0, there must be compile time savings if we do this in my opinion. 2. How could I do if we want to do this. I'm not familiar with the mechanism that how passes works, it'll be appriciate if someone can tell me more. Thanks, Li
Re: Escape the unnecessary re-optimization in automatic parallelization.
Hi Joern, On Sat, Oct 10, 2009 at 5:27 PM, Joern Rennecke wrote: > Quoting Li Feng : >> >> So my question is, >> >> 1. Is this necessary/correct if we want to escape the re-optimization for >> the >> first few passes before tree-parloop.c and continue the optimization >> passes >> after it for the function fun.loop_f0, there must be compile time savings >> if we >> do this in my opinion. > > Note that the process of parallelization adds new code, and make > pre-existing code makes code sub-optimal - e.g. it transforms the loop into > a normal form > where there is only one induction variable. > > So, even if you have a homogeneous sets of cores that you are targeting, it > makes sense in principle to re-start optimizations from the beginning. > If the re-running of each individual optimization pass makes sense will > depend > on what that pass exactly does and how that relates to parallelized loop and > the target. > > However, parallelization is done to differetn target architectures, then > re-running the optimization becomes more improtant, since different > parameters > and heuristics can come into play. I agree that re-running will take difference parameters and heuristics come into play. But I'm not clear about re-running the first few passes will do optimizations for different target architectures for parallelization. shouldn't such kind optimizations be done at back-end ? I mean, since the first few passes before autopar is in the middle end. > > Moreover, the set of optimization passes that run before parallelization is > subject to change as GCC evolves. > Yes. I agree. > Therefore, the most effective way to address the issue of running redundant > optimization passes in the context is probably to put it in the wider > context > of the work to allow external plugins to influence the pass sequence that is > being applied, and to control this with machine learning. > Thanks, Li
Reducing fortran testcase with delta.
Hi, I have noticed this wiki:A_guide_to_testcase_reduction which tells how to reduce a c/c++ testcase with delta. That's a really good tool to do this reducing job. And for c/c++ the topformflat tool in the Delta distribution will puts syntactically related tokens on one line (because delta is line granularity), is there the same thing for preprocessing the fortran code? Thanks, Li
Re: Reducing fortran testcase with delta.
Hi Joost, On Fri, Oct 30, 2009 at 4:41 PM, VandeVondele Joost wrote: > Hi Li, > > I've attached 'Fortran-aware' delta. I tries to guess cut a Fortran file in > more reasonable places (e.g. between subroutine boundaries, after enddos). > It works reasonably well, but is a hack. I think another way might be doing like what topformflat did in c/c++ code. Move these lines between {do enddo} or {subroutine} in one line without breaking into delta. But anyway, your idea is good too and it has been accomplished! > > Especially with Fortran90 and modules, iterated delta runs can help a lot > (i.e. first runs removes 'public/use' module statements, next round cleans > more efficiently). It also features 'randomized' bisection. That helps to > reduce towards a minimized testcase when iterating delta runs. > This really helps, I'll try your hacked delta for fortran code. > I usually call it with the following script: > >> cat do_many > > for i in `seq 1 30` > do > ~/delta-2006.08.03/delta -suffix=.f90 -test=delta.script > -cp_minimal=small.f90 bug.f90 > cp small.f90 small.f90.$i > cp small.f90 bug.f90 > done > Thanks, Li
{gSoc}application : Automatic parallelization in Graphite
Hi all, Below is the proposal of this gSoc project. I'd really like you review and comment on this and then I can plan this project better. Thanks, Li Feng #title Automatic parallelization in Graphite * Synopsis With the integration of Graphite to GCC4.4, a strong loop nest analysis and transformation engine was introduced. But now it does only non-parallel loop generation. My work is to detect synchronization free parallel loops and generate parallel code for them, which will mainly enables programs to run faster. * The Project In GCC there already exists an auto-parallelization pass, but it can't works on some not-simple loops. e.g. The following code can't be handled by autopar, which yields a scev_not_known dependency. int Z[100][100]; int main(void) { int i,j; for (i = 0; i <= 10; i++) for (j = 0; j <=10; j++) Z[i][j] = Z[j+10][i+11]; return 0; } I made some experimental work on "What kind of loops can autopar handle and what it can't" ,you can find it here: http://gcc.gnu.org/wiki/AutoparRelated. Graphite can analyze every loop, condition that can be analyzed by polyhedral analysis, and even complicated loop nests. Under Graphite, we could detect synchronization free parallel loops and generate parallel code for them. During Google Summer of Code, I want to solve the two issues. First of all, I will write test cases for different kind of loops that can be parallelized under Graphite. This work will be done by some discussions about data dependency analysis under polyhedral model with Graphite developers. The first issue will be parallel loop detection. This means that Graphite will recognize simple parallel loops using SCoP detection and data dependency analysis. SCoP detection is well supported by Graphite and Konrad Trifunic is now working on data dependency analysis based on polyhedral model, which will be done about 3-4 weeks. The place for parallel loop detection will be after CLOOG generate the new loops, either CLAST(cloog AST after call cloog on polyhedra) or after clast_to_gimple. At this time as we have the polyhedral information (poly_bb_p) still available during code generation, we can try to update the dependency information using the restrictions CLOOG added and the polyhedral dependency analysis to check if there is any dependency in the generated loops. So the basic step of parallel loop detection will be: 1. Get the poly_bb_p after CLOOG 2. Check that if there are dependencies. 3. If dependency doesn't exist, mark the loop as parallel. We may add annotations of more detailed information here. e.g. shared/private objects. The second issue will be parallel code generation. At this part, I will try to integrate autopar's code generation to Graphite. This work will be done precisely after cloog_to_gimple. I will make sure that the loops Graphite creates can be handled by the autopar's code generation. Currently the autopar pass in GCC lower the OMP_FOR and OMP_PARALLEL to libgomp functions. Still, maybe we could call autopar's reduction analysis, if the scalar analysis determines that the loop is still parallelizable, then the code generation will be called. * Success Criteria 1. Graphite can recognize and mark loops that can be parallelized 2. Graphite will generate parallelized code for them 3. Pass test cases for Graphite's auto-parallelization * Road-map 1. Since data dependency analysis is not available, I will firstly integrate autopar's code generation to Graphite. This work will be done step by step.(Mid June) - Introduce a flag -fgraphite-parallelize that forces auto-parallelization for all loops. - Make sure the loops Graphite creates can be handled by the autopar's code generation. - Call autopar for every loop.(The loops that can not be paralleled will just fail/crash.) 2. Write test cases for the loops that can be parallelized. This will take a few discussions with Graphite developers to see which kind of loops we will should detect and can be auto-parallelized.(End June) 3. Write code for parallel code detection. This part of job will based on SCoP detection and data dependency, and at this time, data dependency analysis should have been done. This part of work will take most of the time.(First week of August) 4. Code cleaning and write documents.(Second week of August) * Profit for GCC - When the auto-parallelization has been done in Graphite, developer can mostly take their effort to loop transformations. Graphite will in charge of optimizations(generate parallelism) and the autopar code in Graphite will just detect and generate code for them.
Re: {gSoc}application : Automatic parallelization in Graphite
Hi Tobi, On Thu, Mar 26, 2009 at 3:17 PM, Tobias Grosser wrote: > On Thu, 2009-03-26 at 10:36 +0800, Li Feng wrote: >> Hi all, >> >> Below is the proposal of this gSoc project. I'd really like you review and >> comment on this and then I can plan this project better. > > Hi Li, > > this looks nice. Thanks for your work on this. > > Tobias > >> >> Thanks, >> Li Feng >> >> #title Automatic parallelization in Graphite >> >> * Synopsis >> >> With the integration of Graphite to GCC4.4, a strong loop nest >> analysis and transformation engine was introduced. But now it >> does only non-parallel loop generation. My work is to detect >> synchronization free parallel loops and generate parallel code >> for them, which will mainly enables programs to run faster. >> > [..] > >> * Road-map >> >> 1. Since data dependency analysis is not available, I will firstly >> integrate autopar's code generation to Graphite. This work will >> be done step by step.(Mid June) >> - Introduce a flag -fgraphite-parallelize that forces >> auto-parallelization >> for all loops. >> - Make sure the loops Graphite creates can be handled by the autopar's >> code generation. >> - Call autopar for every loop.(The loops that can not be paralleled will >> just fail/crash.) > > I think on this point we should discuss with Razya and the others where > your work ends. Just adapting autopar to handle all graphite loops is a > project on its own. So I do not think it can be done by you in two > weeks. > >> 2. Write test cases for the loops that can be parallelized. This will take a >> few discussions with Graphite developers to see which kind >> of loops we will should detect and can be auto-parallelized.(End June) >> 3. Write code for parallel code detection. This part of job will based on >> SCoP detection and data dependency, and at this time, data dependency >> analysis should have been done. This part of work will take most of >> the time.(First week of August) > > How much time this point takes depends how exact you want to solve it. I > think a correct but conservative implementation might be written in a > week. If you want to detect all loops it will take you more time. > >> 4. Code cleaning and write documents.(Second week of August) > > I think it is useful to define the limits of your work a little bit > stricter. For me there are two options: > > 1. You stay more on the autopar/gimple side (Step 1) and adapt autopar > for graphite. This is very necessary for graphite, but it might need > some time to get up to speed in autopar. > > 2. You stay more on the graphite/polyhedral side. You work on all these > steps, but we limit step 1 to the graphite internal parts. This means > you connect autopar to graphite and try to generate parallel loop nests. > If autopar can not handle a loop nest you analyse it and write a bug > report. Someone else will extend autopar. > > As Razya already has some knowlege about autopar and she is also > interested in working on parallelization maybe she is able to support > you with the autopar stuff? So you might be able to actually focus more > on the polyhedral part. It's good that Razya do the autopar related job, then I can focus mostly on the polyhedral part. So I think my work will be: 1. detect loops that can be parallel(based on SCoP detection and data dependency test). 2. connect Graphite with autopar part(by Razya) And thanks for your comment :) >> * Profit for GCC >> >> - When the auto-parallelization has been done in Graphite, developer >> can mostly take their effort to loop transformations. Graphite will > > be > >> in charge of optimizations(generate parallelism) and the autopar >> code in Graphite will just detect and generate code for them. > > > > Tobias > > Li Feng
[gSoc] [graphite] general plan for Automatic parallelization in Graphite
Hi, It's nice that the proposal 'Automatic parallelization in Graphite' is accepted. Which means I will be working with great Graphtie developers this summer, and trying to implement the project . I have set up a blog for this project, which will mainly about this project: 1. plans 2. what I have done 3. related Graphite internals You can subscribe to it if you like: http://summergraphite.blogspot.com/ Here is a general plan for this project, keep you in loop, and feel free to comment :) 1. Mark the innermost loop parallel [done] 2. Try to schedule autopar pass after Graphite, and enable code generation if flag_graphite_force_parallel is set - There should be some discussion with Razya about her plan about the autopar part - But before that, I'll try to schedule autopar first 3. I may try to write testcases for the loops that should be parallel, from simple to hard, and check autopar's code generation part, make sure this works correctly as we expected. - The testcases is important. There should be some detailed discussion maybe with Sebastian and Konrad. To see what kind of loop we can/decide to handle. - Check autopar's code generation with flag_graphite_force_parallel set with these testcases, report bugs if it goes wrong. 4. Try to write code for deciding if a loop can be parallel with data dependency test under this polyhedral model. - Try to understand the interface of data dependency test - Write code, if data dependency success, mark the loop parallel Cheers, Li