Re: Changing distribution of an existing array

Zalewski, Marcin J Mon, 02 Apr 2018 16:52:52 -0700

Hi Brad.

Sorry about Outlook's indentation quoting below. Maybe one day Microsoft will 
decide to implement better quoting.


    Hi Marcin --
    
    > I am wondering if there is a way to change a mapping of a distributed 
    > array.
    
    Good question.  Feel free to direct questions like this that might be of 
    general interest to Chapel programmers to Stack Overflow if you use it 
    (tends to be a more searchable, permanent place to look for answers to 
    such questions).

Yes, that is a very good idea. I did wonder where to best ask this question, 
and you are right that Stack Overflow looks like the best option
    
    High-level answer: This isn't quite what you were asking, but there isn't 
    a way to change the domain of an array nor the domain map of a domain 
    (currently at least, nor a plan to support it in the future).  These 
    decisions are effectively part of the static type of the variable and so 
    would be challenging to change.  The rationale for this decision is that 
    they have a big impact on the code generated for an array.

That makes sense, and it would be actually the least favorable solution to my 
issue.
    
    Getting closer to what you're asking: a given domain map can potentially 
    be modified to affect the implementation of its doamins and their arrays, 
    including their distributions -- if the domain map author has provided a 
    means to do so.  Most of the rest of this mail will describe what is 
    possible within our current domain maps, but note that one could always 
    modify the domain maps further, or write a new domain map, to support 
    things that our domain maps don't happen to support currently.
    
Yes, definitely. That's what I thought from studying the domain map interface, 
but my Chapel-fu was not there yet to figure out which of the existing domains 
do what I want.

    Diving in a bit further, in the example you provide, the issue is that the 
    Block distribution doesn't (currently) have a way of redefining its 
    bounding box / distribution as you'd ideally like to:
    
    > var domC = {0..0} dmapped Block({0..0});
    >
    > ...
    >
    >  domC = {0..i}; // dmapped Block({0..i}); // needed, or halt on assignment
    
    So once the bounding box defining its distribution is set to {0..0}, 
    there's no way to update it.  The reason for this is partially due to 
    laziness on the part of the domain map author (in this case, probably me), 
    and partially fear of leading users to do something expensive without 
    being aware of it.  Specifically, if you think about the communication 
    required to naively change the bounding box from 0..0 to 0..1 to 0..2 to 
    0..n one element at a time across a large number of locales, there'd be 
    lots of little ripples of data across the locale set with each change.
    
    (That said, the Block distribution probably really _ought_ to define a 
    method that redefines the bounding box, regardless of the cost, where the 
    method's documentation could warn against using it too frequently for 
    performance reasons).
    
Yes, I thought of the cost myself, and I thought I would add in bulk and then 
rebalance, but your answer below makes this whole point moot.    

    > I understand why is this, but I am looking for a way to add values to a 
    > distributed array and then “rebalance” the distribution at some point. 
    > 
    > So there are 2 questions I have:
    >
    >
    >  1.  Is there a way to do this with arrays at all without somehow 
    > creating a whole new array?
    >
    >  2.  Is there any better approach for this?
    
    Answering both questions at once, my mind goes to using a different 
    distribution that's more amenable to load balancing as new indices are 
    added to it.  For example, rewriting your example to use the Cyclic 
    distribution:
    
       use CyclicDist;
    
       var domC = {0..0} dmapped Cyclic(startIdx=0);
    
    the output shows up like this:
    
       Initially, C is: 0
       C(0) assigned, C is: 0. domC is {0..0}
       C(1) assigned, C is: 0 1. domC is {0..1}
       C(2) assigned, C is: 0 1 0. domC is {0..2}
       C(3) assigned, C is: 0 1 0 1. domC is {0..3}
       C(4) assigned, C is: 0 1 0 1 0. domC is {0..4}
       C(5) assigned, C is: 0 1 0 1 0 1. domC is {0..5}
    
    So, by nature, Cyclic is better suited to distributing new indices across 
    the locales without any need to change the definition or type of the 
    domain map.
    
    If your data structure / algorithm is one that benefits from having 
    consecutive blocks of _logical_ indices mapped to the same locale, then 
    the BlockCyclic distribution might be a nice choice in that it would 
    combine some of the index locality of the Block distribution with the 
    ability to grow arbitrarily without redefining the domain map, as in the 
    Cyclic distribution.  But trying it within your program, I'm finding that 
    Block-Cylic isn't mature enough to support your program yet, for lame 
    reasons (of these three domain maps, it's the one that's seen the least 
    amount of development + optimization effort and apparently re-assigning a 
    domain dmapped with BlockCyclic() has never been implemented).  Feel free 
    to file a feature request against this if it it's of interest to you. 
    But let me also emphasize that _index locality_, not spatial locality, is 
    the main difference between Cyclic and Block-Cyclic.  Both will store the 
    local elements of a 1D array in consecutive memory locations, so they 
    should both have similar spatial locality if all you were doing was 
    iterating over all of the elements of an array (say).

That's exactly what I needed! Cyclic and BlockCyclic fit my need very well. I 
just went for Block, but now that I think about it, I should definitely started 
with Cyclic. I think I got thrown off by so many examples in the test directory 
using Block, and I thought that if Block implemented something so would Cyclic, 
if maybe at different cost. Your answer clarifies that this is not the case.
    
    If index locality doesn't matter to you at all, another option to explore 
    would be to use a distributed associative domain.  This results in even 
    less spatial locality, and the load balancing would only be as good as its 
    hash function.  Distributed associative domains are a feature that have 
    never made it onto master, but have been "almost done" forever, so this 
    would be another place that'd be fair to file a feature request if it'd be 
    useful/preferable to you than the 1D rectangular domains/arrays you're 
    currently using.

When you say it did not make it to master, do you mean that currently this 
feature is not publically available anywhere? I did have associative domains in 
mind, but I was going to go with something simpler to begin with. So 
definitely, associative domains are of interest, and I have distributed data 
structures in C++ that implement a graph data structure using associative 
domains.
    
    Circling back around: even if Block() were to support a way to dynamically 
    change its bounding box, I'm not convinced that it's what you'd want to 
    use for your case given that you're potentially adding vertices 
    incrementally, which isn't a cheap operation for a Block distribution. 
    Note that even if we implemented a way to change the bounding box for 
    Block(), we'd likely create a whole new array under the covers, assign 
    between the arrays and throw the original away -- so you'd get the 
    convenience of being able to change the bounding box, but wouldn't get 
    anything like in-place reallocation and minimal shifting of data between 
    locales without a lot of work (i.e., one _could_ implement this with 
    effort, I'm just not convinced the benefit would be worth the effort, so 
    wouldn't sign up for it myself).
    
Yes, exactly. You are correct; Block is not necessary. Cyclic is actually much 
better for some algorithms anyway, at least for naïve implementations. Also, 
with Cyclic I do not need to worry as much about having my graph balanced in 
the first place. For example it would be OK if lower indices had larger 
neighbor lists, while with Blocked I would need to make sure I got the 
distribution of degrees right, or I would be in trouble immediately.
    
    > With arrays with Block distribution, everything outside of the bounding 
    > box seems to go to the last locale.
    
    This isn't quite right.  It's actually that the n-dimensional bounding box 
    specifies a partitioning on the n-dimensional space and that any indices 
    that fall outside the bounding box are mapped to the locale which owns the 
    closest indices. In your case, since all of the indices you're adding are 
    above the bounding box, they're ending up being owned by the last locale.

Right.
    
    This is illustrated in slide 72 of the following presentation better than 
    I can say in words here:
    
    https://chapel-lang.org/presentations/ChapelForNWCPP2018-presented.pdf
    
Thank you for the reference.
    
    > I am not necessarily looking for the most efficient solution at this 
    > point but rather for something simple using the basic language 
    > mechanisms in Chapel.
    
    Hope this description helps!  Let us know if it raises additional 
    questions.

Yes, this was a great answer that gave me all the tools I need to go forward. 
Thank you again. (

-Marcin
    

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Chapel-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/chapel-users

Re: Changing distribution of an existing array

Reply via email to