For NLP transformer models, sometime we will share parameters between layers to reduce the model size and runtime memory.
But after building the model to TVM, it somehow expands all the layers and create duplicate variable nodes which enlarge the model size a lot. (It depends on the num of layers which share the parameters, it could be 6-12 times larger than the original torch model.) The worst thing is that during the runtime, it will also allocate separate memory for the identical parameters. I tried to make a change to the params format and corresponding runtime to merge the duplicate parameters like below:  The above change reduced the model size a lot, but still have no gains for the memory part. I am considering diving into the frontend/backend to investigate where the duplication happens. But before that, I want to hear from the community. Is it possible and any suggested ways? --- [Visit Topic](https://discuss.tvm.apache.org/t/share-parameters-within-a-module/11753/1) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/5c741e96b32d837ab3bffc4716c34bb7400e44dcc8440c3cb0acf358f20ef38d).