Thanks for writing this up, Lily, I think standardizing how we handle external datasets is highly valuable.
To comment on some of the points Tianqi raised, I quite like that this approach is fundamentally separated from any dependencies as it allows user's to wrap any dataset or dataloader they want. I agree with Tianqi that we should consider returning ndarray instead of numpy as it's more tightly integrated with TVM. The point about zero-copy through DLPack is quite interesting and could be cool follow-up work if we go with the ndarray standardization. I also like using `@property` for `batch_size` and `num_batches` since itll look a little cleaner. In terms of naming, I think the proposed `DataLoader` is a better description of the functionality than `Dataset` and lean towards `tvm.utils.data` being the best namespace for this work. That said, these are all pretty minor points, this work overall is great. Thanks Lily! --- [Visit Topic](https://discuss.tvm.apache.org/t/dataloader-an-api-to-wrap-datasets-from-other-machine-learning-frameworks/9498/3) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/f7c1fb2237b6be6ecd9f9acdb591b9b16b9e6798c40f9be357d4694d9af11d33).