Igor, if there are any papers published on what you are doing with these
images I would be very interested.
I went to the new London HPC and AI Meetup on Thursday, one talk was by
Odin Vision which was excellent.
Recommend the new Meetup to anyone in the area. Next meeting 21st August.
And a plug
Converting the files to TF records or similar would be one obvious approach
if you are concerned about meta data. But then I d understand why some
people would not want that (size, augmentation process). I assume you are
are doing the training in a distributed fashion using MPI via Horovod or
simil
oh and knowing what type of fileystem you're on would help with recommendations.
On Fri, Jun 28, 2019 at 1:51 PM Michael Di Domenico
wrote:
>
> i'm not familiar with the imagenet set, but i'm suprised you'd see a
> bottleneck. my understanding of the ML image sets is that they're
> mostly read.
i'm not familiar with the imagenet set, but i'm suprised you'd see a
bottleneck. my understanding of the ML image sets is that they're
mostly read. do you have things like noatime set on the filesystem?
do you know specifically which ops are pounding the metadata?
On Fri, Jun 28, 2019 at 1:47 PM
On 6/28/19 1:47 PM, Mark Hahn wrote:
Hi all,
I wonder if anyone has comments on ways to avoid metadata bottlenecks
for certain kinds of small-io-intensive jobs. For instance, ML on
imagenet,
which seems to be a massive collection of trivial-sized files.
A good answer is "beef up your MD serv
Hi all,
I wonder if anyone has comments on ways to avoid metadata bottlenecks
for certain kinds of small-io-intensive jobs. For instance, ML on imagenet,
which seems to be a massive collection of trivial-sized files.
A good answer is "beef up your MD server, since it helps everyone".
That's a bi