I've not had any issues training with ImageNet in the past. We're using a ZFS box with a large L2ARC over 10GbE. If you are having problems, you might consider creating a HDF5 of ImageNet? There may even be one on Academic Torrents or something. I suspect this may help quite a bit.
Interested to hear if you try this! Thanks, Aaron Mark Hahn writes: > Hi all, > I wonder if anyone has comments on ways to avoid metadata bottlenecks > for certain kinds of small-io-intensive jobs. For instance, ML on imagenet, > which seems to be a massive collection of trivial-sized files. > > A good answer is "beef up your MD server, since it helps everyone". > That's a bit naive, though (no money-trees here.) > > How about things like putting the dataset into squashfs or some other > image that can be loop-mounted on demand? sqlite? perhaps even a format > that can simply be mmaped as a whole? > > personally, I tend to dislike the approach of having a job stage tons of > stuff onto node storage (when it exists) simply because that guarantees a > waste of cpu/gpu/memory resources for however long the stagein takes... > > thanks, mark hahn. -- Aaron Jackson - M6PIU Researcher at University of Nottingham http://aaronsplace.co.uk/ _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf