Bimlesh759-AI opened a new issue #52:
URL: https://github.com/apache/incubator-bluemarlin/issues/52
Training scenario:
Following datasets details include users with minimum one click
count with step = 10.
test_dataset_count = 110755727,
train_dataset_count = 517801469,
user_count = 94315979,
item_count = 19
EPOCH = 250
train_batch_size = 20480
test_batch_size = 2048
Current model takes around 12 hours to train 1 epoch if we use all
datasets. If we use around 50% datasets by randomly selecting
then also model takes around 7-8 hours to train for 1 epoch.
By this analogy, If we want to train the model for complete 250
epochs on full datasets, then it will take around 125 days.
Currently we are using Tensorflow 1.15, Two GPU are there in
training but only one GPU is used.
Target about model.
1. It is required that model should not take more than 24 hours to
train.
2. Model should be able to use all the available GPU.
3. Is it possible to further reduce the datasets with regard to size
without losing insights.
4. Is it possible to get DIN-Lookalike model and trainer code in
Tensorflow 2.0 version.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]