For NLP transformer models, sometime we will share parameters between layers to 
reduce the model size and runtime memory.

But after building the model to TVM, it somehow expands all the layers and 
create duplicate variable nodes which enlarge the model size a lot. (It depends 
on the num of layers which share the parameters, it could be 6-12 times larger 
than the original torch model.)

The worst thing is that during the runtime, it will also allocate separate 
memory for the identical parameters. I tried to make a change to the params 
format and corresponding runtime to merge the duplicate parameters like below:

![Screen Shot 2021-12-21 at 
20.46.09|472x500](upload://h7kB2WJ8C9ytCySXUtr63ZHsZ8h.png) 

The above change reduced the model size a lot, but still have no gains for the 
memory part. I am considering diving into the frontend/backend to investigate 
where the duplication happens. But before that, I want to hear from the 
community. Is it possible and any suggested ways?





---
[Visit 
Topic](https://discuss.tvm.apache.org/t/share-parameters-within-a-module/11753/1)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/5c741e96b32d837ab3bffc4716c34bb7400e44dcc8440c3cb0acf358f20ef38d).

Reply via email to