cache (#4150)

Cody Hao Yu Mon, 21 Oct 2019 10:06:06 -0700

Thanks for the reponses and I think they are valuable. I embedded my opinions 
with yours and leave the dispatch context for @kevinthesun.


Also cc @tqchen and @icemelon9 for their inputs.

> > If we design this resume logic in a general way, we can also extend it to 
> > tophub.
> 
> Does it make sense to generalise here? As far as I can tell, TopHub doesn't 
> store tuning history just optimal configs, so there's no way to 'resume' a 
> TopHub tuning session. In some way we have to determine whether the existing 
> 'tuning effort' to produce a particular config is sufficient and number of 
> trials is the only obvious way I can think of characterising this. I'd be 
> happy to look at any alternative implementation idea though.

I agree with you that TopHub is serving a different purpose if we consider 
trial number in the resume logic, but they can still share the same 
implementation and history format in the way I suggested. My concern of using 
trial number is that it limits the use case of this RFC only for resuming 
interrupted tuning but not others, such as transferring the tuning process to 
the others, or reuse the configs of 2000 trial random search to launch a new 
grid search, etc.

Alternativaly, we could decouple the history and a specific tuning process. 
Speicifically, we do not add any tuning process specific information to the 
config library but just let the tuner determine if it can reuse the result from 
the config library or not when it needs to measure that config. For example, 
the tuning process was interrupted in the 50th trial so we have 50 configs in 
the library. When resuming the tuning, the tuner still starts from scratch but 
it could save the time of measuring those 50 configs when the tuner follows the 
same tuning process. One advantage is that this scenario is applicable to 
different tuner or even different models with the same task.

One drawback of my alternative comapred to yours is that if the tuning process 
is non-deterministic (e.g., random search) then we might spend time on tuning 
different configs, but I think this can be workaround by either exposing an 
optional random seed argument in tuner (such as `random_state` used in 
`sklearn`), or let user reduce the trial number when resuming.

> 
> > We can try to retrieve the target device info using system call and add it 
> > to every record when dumping to file/database.
> 
> This would be a good start, but I think this needs to also be something a 
> user can fully specify. For instance, we might be interesting in driver 
> versions, memory clock speeds or even physical parameters such as board 
> cooling. Which system calls were you considering using to determine the 
> platform? Perhaps have a default method that relies on these calls with the 
> ability to pass additional arbitrary info to `config_library.start_job()`?
> 

I have the same question actually. This part is relatively vague and probably 
need some other's input.

> > In my personal opinion, we also need to invalid the history/records when 
> > TVM has been updated
> 
> I agree with this, but maybe it can be included as part of the previous point 
> on board configuration? In a general sense we need an idea of whether a 
> particular config is 'compatible' with our current platform and I think it's 
> reasonable to include TVM version as a part of this.

Your response remineded me that the current config history already includes a 
version information, although it is always 0.1. Not sure if we can make use of 
it and save some efforts.


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4150#issuecomment-544611522

Re: [dmlc/tvm] [RFC] [AutoTVM] Implementing an auto-tuning library/cache (#4150)

Reply via email to