[GitHub] [incubator-bluemarlin] Bimlesh759-AI opened a new issue #52: [BLUEMARLIN-29] : For DIN-Lookalike model, training is very slow.

GitBox Thu, 24 Feb 2022 02:51:35 -0800


Bimlesh759-AI opened a new issue #52:
URL: https://github.com/apache/incubator-bluemarlin/issues/52



   Training scenario:
   
           Following datasets details include users with minimum one click 
count with step = 10.
           test_dataset_count = 110755727,
           train_dataset_count = 517801469,
           user_count = 94315979,
           item_count = 19
           
           EPOCH = 250
           train_batch_size = 20480
              test_batch_size = 2048
           
           Current model takes around 12 hours to train 1 epoch if we use all 
datasets. If we use around 50% datasets by randomly selecting
           then also model takes around 7-8 hours to train for 1 epoch.
           
           By this analogy, If we want to train the model for complete 250 
epochs on full datasets, then it will take around 125 days.
           
           Currently we are using Tensorflow 1.15, Two GPU are there in 
training but only one GPU is used.
           
   Target about model.
        
           1. It is required that model should not take more than 24 hours to 
train.
           2. Model should be able to use all the available GPU.
           3. Is it possible to further reduce the datasets with regard to size 
without losing insights.
           4. Is it possible to get DIN-Lookalike model and trainer code in 
Tensorflow 2.0 version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-bluemarlin] Bimlesh759-AI opened a new issue #52: [BLUEMARLIN-29] : For DIN-Lookalike model, training is very slow.

Reply via email to