Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

w...@warmi.net Mon, 31 Jan 2022 09:09:22 -0800

It does not make sense to me to have allocation/population of 60k instances of 
Tile + QPoint into a QVector taking 5 seconds. It took 5 ms on my computer…


Having said that , if allocation and assignment are your bottleneck , just use 
placement new operator with a large block of memory or just pure array of 60k 
structures + QConcurent::map with interators instead of a vector.
You will still need to initialize the array so if that’s the true bottleneck 
then I guess handle your own threading where you reinitialize pool of threads 
in a loop passing computed range without the initial storing step.

Sent from my iPhone

> On Jan 31, 2022, at 2:38 AM, Andrei Golubev <andrei.golu...@qt.io> wrote:
> 
> 
> Hello,
> 
> What Scott suggests should help a lot.
> 
> Using a slightly different approach, if you know how many tiles you need, say 
> 60k, isn't it enough to figure out the tile position already?
> Given an index 0 <= i < 60k, you should be able to map it to a 2-dimensional 
> coordinates if you know how many tiles you have horizontally and vertically 
> (actually, just horizontally is enough) - and you do, because the amount of 
> horizontal tiles = image.width() / tile.width().
> So, if I think about it right,  i / (n horizontal tiles) gives you row 
> number, i % (n horizontal tiles) gives you column number.
> 
> Now you can generate a 60k size vector of indices from 0 to 59999 (basically 
> std::iota()) and use that as an input for the tile calculation function 
> through QtConcurrent. Internally, you figure out the tile position in 2d 
> space and from that you get your QPoint + QSize for the square patch 
> (guessing, that i index would basically be a hand-written alternative to 
> blockIdx in CUDA and friends). This now allows you to calculate your tiles 
> vector in parallel as well (without actually using a vector).
> 
> On a side note, doesn't QtConcurrent provide a way to execute a function on a 
> range? Maybe you don't even need to supply a vector of indices, just a 
> half-open interval from 0 to 60k.
> Given that QtConcurrent should handle any sort of callable, you can also 
> supply it a lambda with capture list containing your hyper parameters like 
> tile.width(), tile.height(), and any other external context information.
> 
> --
> Best Regards,
> Andrei
> From: Interest <interest-boun...@qt-project.org> on behalf of Scott Bloom 
> <sc...@towel42.com>
> Sent: Monday, January 31, 2022 4:09 AM
> To: Murphy, Sean <sean.mur...@centauricorp.com>; Tony Rietwyk 
> <t...@rightsoft.com.au>; interest@qt-project.org <interest@qt-project.org>
> Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I 
> want?
>  
> Couple things I would try.
>  
> First, preallocate the size of the vector, or use a list if you don’t need 
> random access into it. 
>  
> Second, just send, pos, size into the tile.  Only save the values in the 
> constrctor. When the worker thread kicks off on a tile, then initialize it 
> and do any computation.  This includes the allocation of the submatrix.
>  
> Also, does it need to be a shared pointer?  Its clear who owns the pointers, 
> they aren’t being “shared” as much as use.  For me, I only use shared 
> pointers, when multiple objects can “own” the pointer.  In this case, the 
> tile manager owns the tiles.
>  
> Mainly, do as much as possible to reduce the time in the nested loop.
> 
> Scott
>  
> From: Interest <interest-boun...@qt-project.org> On Behalf Of Murphy, Sean
> Sent: Sunday, January 30, 2022 4:59 PM
> To: Tony Rietwyk <t...@rightsoft.com.au>; interest@qt-project.org
> Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I 
> want?
>  
> Thanks for the response, but I'm not following your suggestion - or at least 
> I'm not seeing how it's different than what I'm doing? Maybe a little 
> pseudocode will help. Here's what I'm currently doing:
>  
> Tile class:
> private:
>   QPoint mPos;
>   int mSize;
>  
> tile::tile(QPoint pos, int size) :
>     mPos(pos),
>     mSize(size)
> {
>   // assigns this tile an mSize x mSize square
>   // from the original image starting at mPos
>   // pixel location in the original image
> }
>  
> void tile::process()
> {
>   // does the work on the assigned subset
> }
>   
> TileManager:
> private:
>   QVector<QSharedPointer<tile>> mTiles;
>  
> processTile(QSharedPointer<tile>& t)
> {
>   t->process();
> }
>  
> tileManager::setup(QSize tileGrid, int tileSize)
> {
>   // generate each tile with its assignment
>   for(int i=0; i < tileGrid.height(); ++i)
>   {
>     for(int j=0; j < tileGrid.width(); ++j)
>     {
>       // create the new tile while assign its 
>       // region of the original image
>       QSharedPointer<tile> t(new tile(
>                    QPoint(j * tileSize, i * tileSize),
>                    tileSize));
>       mTiles.append(t);
>     }
>   }
>   QtConcurrent::map(mTiles, processTile);
> }
>  
> So I think I'm already doing what you're saying? Where I'm paying the penalty 
> is that the allocation of each tile is happening in one thread and I'd like 
> to see if I can thread out the object creation. But I don't see how to 
> simultaneously thread out the tile objection creation AND correctly assign 
> the tile its location since as far as I can tell, when QtConcurrent executes 
> tileManager's processTile function in parallel there's nothing I can poll 
> inside tileManager::processTile() that allows me to know WHICH step I'm at.
>  
> Or am I misunderstanding what you're saying? 
>  
> The best thing I can come up with is that maybe I could change the type of my 
> mTiles vector to be a QVector<QPoint>> but then I'd still need to loop 
> through nested for-loop to populate all the QPoint items in the vector I want 
> to pass to QtConcurrent::map(). I have tried that yet to see if generating 
> thousands of QPoint objects is faster than generating the same number of 
> tiles, but I can test that out.
>  
> Sean
> From: Interest <interest-boun...@qt-project.org> on behalf of Tony Rietwyk 
> <t...@rightsoft.com.au>
> Sent: Sunday, January 30, 2022 7:19 PM
> To: interest@qt-project.org <interest@qt-project.org>
> Subject: [External]Re: [Interest] How to get QtConcurrent to do what I want?
>  
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you recognize the sender and know the 
> content is safe.
>  
> Hi Sean,
>  
> Can you use the position of the tile as a unique key?  Then the manager only 
> needs to calculate each tile's position in the original image.  Each tile 
> extracts the bits, processes and notifies the result with its position.
>  
> Regards, Tony
>  
>  
> On 31/01/2022 10:06 am, Murphy, Sean wrote:
> I'm hitting a design issue with the way I'm using the QtConcurrent module to 
> do some image processing, and I'm wondering if someone can give some pointers?
>  
> At a high level, the software needs to do some processing on every pixel of 
> an image. The processing can mostly be done in parallel, so I've created the 
> following:
> Tile class - responsible for doing the processing on a small subset of the 
> original image
> Has a constructor that takes a Position and Size. From those parameters, the 
> Tile knows what subset of the original image it is going to process
> Has a process() function which will do the work on those assigned pixels
> TileManager class - responsible for managing the Tile objects
> Contains a for-loop that creates each Tile object, assigns it a unique 
> Position, and adds it to the QVector<Tile> vector 
> Has a processTile(Tile& t) function which calls t.process() to tell a given 
> Tile to begin its work
> Calls QtConcurrent::map(tiles, processTile) to process each tile
> So far this works well, but as I was timing different parts of the codebase, 
> I discovered that a large portion of the time is spent allocating the 
> QVector<Tile> vector (step 2a above) before I get to the concurrent 
> processing call. The reason why is obvious to me - I need to ensure that each 
> tile is created with a unique assignment and as far as I can see, that need 
> to happen in a single thread? If I could instead pass off the Tile creation 
> to the parallel processing step, I might be able to improve the overall 
> performance, but I don't see a way around it within the QtConcurrent 
> framework.
>  
> How can I go about creating Tile objects in parallel AND ensure that each of 
> them gets a unique Position assignment? I could easily move the Tile 
> allocation into processTile(), but if I do that, I don't see a way make the 
> unique position assignment since I don't see how a given call to 
> processTile() would know where it is in the overall parallelization sequence 
> to determine what Position to assign to the Tile it creates. If I were using 
> something like CUDA, I could use things like blockIdx and threadIdx to do 
> that, but as far as I can see, those concepts don't exist (or at least aren't 
> exposed) in QtConcurrent.
>  
> Any thoughts?
> 
> 
> _______________________________________________
> Interest mailing list
> Interest@qt-project.org
> https://lists.qt-project.org/listinfo/interest
> _______________________________________________
> Interest mailing list
> Interest@qt-project.org
> https://lists.qt-project.org/listinfo/interest

_______________________________________________
Interest mailing list
Interest@qt-project.org
https://lists.qt-project.org/listinfo/interest

Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

Reply via email to