It does not make sense to me to have allocation/population of 60k instances of Tile + QPoint into a QVector taking 5 seconds. It took 5 ms on my computer…
Having said that , if allocation and assignment are your bottleneck , just use placement new operator with a large block of memory or just pure array of 60k structures + QConcurent::map with interators instead of a vector. You will still need to initialize the array so if that’s the true bottleneck then I guess handle your own threading where you reinitialize pool of threads in a loop passing computed range without the initial storing step. Sent from my iPhone > On Jan 31, 2022, at 2:38 AM, Andrei Golubev <andrei.golu...@qt.io> wrote: > > > Hello, > > What Scott suggests should help a lot. > > Using a slightly different approach, if you know how many tiles you need, say > 60k, isn't it enough to figure out the tile position already? > Given an index 0 <= i < 60k, you should be able to map it to a 2-dimensional > coordinates if you know how many tiles you have horizontally and vertically > (actually, just horizontally is enough) - and you do, because the amount of > horizontal tiles = image.width() / tile.width(). > So, if I think about it right, i / (n horizontal tiles) gives you row > number, i % (n horizontal tiles) gives you column number. > > Now you can generate a 60k size vector of indices from 0 to 59999 (basically > std::iota()) and use that as an input for the tile calculation function > through QtConcurrent. Internally, you figure out the tile position in 2d > space and from that you get your QPoint + QSize for the square patch > (guessing, that i index would basically be a hand-written alternative to > blockIdx in CUDA and friends). This now allows you to calculate your tiles > vector in parallel as well (without actually using a vector). > > On a side note, doesn't QtConcurrent provide a way to execute a function on a > range? Maybe you don't even need to supply a vector of indices, just a > half-open interval from 0 to 60k. > Given that QtConcurrent should handle any sort of callable, you can also > supply it a lambda with capture list containing your hyper parameters like > tile.width(), tile.height(), and any other external context information. > > -- > Best Regards, > Andrei > From: Interest <interest-boun...@qt-project.org> on behalf of Scott Bloom > <sc...@towel42.com> > Sent: Monday, January 31, 2022 4:09 AM > To: Murphy, Sean <sean.mur...@centauricorp.com>; Tony Rietwyk > <t...@rightsoft.com.au>; interest@qt-project.org <interest@qt-project.org> > Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I > want? > > Couple things I would try. > > First, preallocate the size of the vector, or use a list if you don’t need > random access into it. > > Second, just send, pos, size into the tile. Only save the values in the > constrctor. When the worker thread kicks off on a tile, then initialize it > and do any computation. This includes the allocation of the submatrix. > > Also, does it need to be a shared pointer? Its clear who owns the pointers, > they aren’t being “shared” as much as use. For me, I only use shared > pointers, when multiple objects can “own” the pointer. In this case, the > tile manager owns the tiles. > > Mainly, do as much as possible to reduce the time in the nested loop. > > Scott > > From: Interest <interest-boun...@qt-project.org> On Behalf Of Murphy, Sean > Sent: Sunday, January 30, 2022 4:59 PM > To: Tony Rietwyk <t...@rightsoft.com.au>; interest@qt-project.org > Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I > want? > > Thanks for the response, but I'm not following your suggestion - or at least > I'm not seeing how it's different than what I'm doing? Maybe a little > pseudocode will help. Here's what I'm currently doing: > > Tile class: > private: > QPoint mPos; > int mSize; > > tile::tile(QPoint pos, int size) : > mPos(pos), > mSize(size) > { > // assigns this tile an mSize x mSize square > // from the original image starting at mPos > // pixel location in the original image > } > > void tile::process() > { > // does the work on the assigned subset > } > > TileManager: > private: > QVector<QSharedPointer<tile>> mTiles; > > processTile(QSharedPointer<tile>& t) > { > t->process(); > } > > tileManager::setup(QSize tileGrid, int tileSize) > { > // generate each tile with its assignment > for(int i=0; i < tileGrid.height(); ++i) > { > for(int j=0; j < tileGrid.width(); ++j) > { > // create the new tile while assign its > // region of the original image > QSharedPointer<tile> t(new tile( > QPoint(j * tileSize, i * tileSize), > tileSize)); > mTiles.append(t); > } > } > QtConcurrent::map(mTiles, processTile); > } > > So I think I'm already doing what you're saying? Where I'm paying the penalty > is that the allocation of each tile is happening in one thread and I'd like > to see if I can thread out the object creation. But I don't see how to > simultaneously thread out the tile objection creation AND correctly assign > the tile its location since as far as I can tell, when QtConcurrent executes > tileManager's processTile function in parallel there's nothing I can poll > inside tileManager::processTile() that allows me to know WHICH step I'm at. > > Or am I misunderstanding what you're saying? > > The best thing I can come up with is that maybe I could change the type of my > mTiles vector to be a QVector<QPoint>> but then I'd still need to loop > through nested for-loop to populate all the QPoint items in the vector I want > to pass to QtConcurrent::map(). I have tried that yet to see if generating > thousands of QPoint objects is faster than generating the same number of > tiles, but I can test that out. > > Sean > From: Interest <interest-boun...@qt-project.org> on behalf of Tony Rietwyk > <t...@rightsoft.com.au> > Sent: Sunday, January 30, 2022 7:19 PM > To: interest@qt-project.org <interest@qt-project.org> > Subject: [External]Re: [Interest] How to get QtConcurrent to do what I want? > > CAUTION: This email originated from outside of the organization. Do not click > links or open attachments unless you recognize the sender and know the > content is safe. > > Hi Sean, > > Can you use the position of the tile as a unique key? Then the manager only > needs to calculate each tile's position in the original image. Each tile > extracts the bits, processes and notifies the result with its position. > > Regards, Tony > > > On 31/01/2022 10:06 am, Murphy, Sean wrote: > I'm hitting a design issue with the way I'm using the QtConcurrent module to > do some image processing, and I'm wondering if someone can give some pointers? > > At a high level, the software needs to do some processing on every pixel of > an image. The processing can mostly be done in parallel, so I've created the > following: > Tile class - responsible for doing the processing on a small subset of the > original image > Has a constructor that takes a Position and Size. From those parameters, the > Tile knows what subset of the original image it is going to process > Has a process() function which will do the work on those assigned pixels > TileManager class - responsible for managing the Tile objects > Contains a for-loop that creates each Tile object, assigns it a unique > Position, and adds it to the QVector<Tile> vector > Has a processTile(Tile& t) function which calls t.process() to tell a given > Tile to begin its work > Calls QtConcurrent::map(tiles, processTile) to process each tile > So far this works well, but as I was timing different parts of the codebase, > I discovered that a large portion of the time is spent allocating the > QVector<Tile> vector (step 2a above) before I get to the concurrent > processing call. The reason why is obvious to me - I need to ensure that each > tile is created with a unique assignment and as far as I can see, that need > to happen in a single thread? If I could instead pass off the Tile creation > to the parallel processing step, I might be able to improve the overall > performance, but I don't see a way around it within the QtConcurrent > framework. > > How can I go about creating Tile objects in parallel AND ensure that each of > them gets a unique Position assignment? I could easily move the Tile > allocation into processTile(), but if I do that, I don't see a way make the > unique position assignment since I don't see how a given call to > processTile() would know where it is in the overall parallelization sequence > to determine what Position to assign to the Tile it creates. If I were using > something like CUDA, I could use things like blockIdx and threadIdx to do > that, but as far as I can see, those concepts don't exist (or at least aren't > exposed) in QtConcurrent. > > Any thoughts? > > > _______________________________________________ > Interest mailing list > Interest@qt-project.org > https://lists.qt-project.org/listinfo/interest > _______________________________________________ > Interest mailing list > Interest@qt-project.org > https://lists.qt-project.org/listinfo/interest
_______________________________________________ Interest mailing list Interest@qt-project.org https://lists.qt-project.org/listinfo/interest