[Apache TVM Discuss] [Development/RFC] DataLoader -- an API to wrap datasets from other machine learning frameworks

Altan Haan via Apache TVM Discuss Sat, 27 Mar 2021 14:42:11 -0700


Commenting to agree that I like the approach, and strongly believe this will be 
useful (e.g. for reducing the boilerplate involved with setting up datasets for 
TVM training, since common datasets already exist in PyTorch or TF). Also agree 
with Tianqi about the NDArray/DLPack interfacing as we want to eliminate any 
unnecessary data copying especially in the training workflow.


Perhaps this is more specific to the PR, but I'm a bit wary of assuming a 
specific input and target/label shape (e.g. NCHW and integer) for some of the 
loaders, since this seems overfit to vision (how would we support a BERT 
dataset, for example?) Is knowledge of the layout really required? I'm also not 
sure about `__next__` return a list of ndarrays, since when batching inputs we 
want them to be in a single contiguous array of shape `(batch_size, ...)`. Hope 
this makes sense and would be happy to formally review the PR once you've had 
time to incorporate the other feedback!





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/dataloader-an-api-to-wrap-datasets-from-other-machine-learning-frameworks/9498/8)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/26096d1fa719f8f5b76a10d6bc5ce9f41f8105ad53fee18e90b86ed31113ce60).

[Apache TVM Discuss] [Development/RFC] DataLoader -- an API to wrap datasets from other machine learning frameworks

Reply via email to