Re: [mlpack] Reinforcement Learning GSOC

Marcus Edel Tue, 13 Mar 2018 16:12:47 -0700

Hello Sahith,

> Not exactly. I was referring to the wrapper that we will be building around
> OpenAI gym. So I was wondering whether we will integrate that into the main
> mlpack repository or whether it'll be a completely separate project


I think we should outsource the wrapper, perhaps someone finds it useful for
another project; but it's not a completely separate project, every RL method in
mlpack already follows the OpenAI gym interface (action, step), so the
integration is straightforward.

Let me know if I should clarify anything.

Thanks,
Marcus


> On 13. Mar 2018, at 20:09, Sahith D <[email protected]> wrote:
> 
> Hello Marcus,
> 
> I see we could definitely introduce a metric that is related, e.g. that counts
> the number of evaluations/iterations.
> 
> Yes that seems like a good metric. Though like I said before it might be a 
> bit redundant for some environments.
>  
> I like both ideas, we should just make sure it is manageable, as you already
> pointed out the Advanced Policy Gradient method might take more time.
> 
> I'll start making a more concrete approach to both these methods
>  
> Are you talking about the additional RL method?
> 
> Not exactly. I was referring to the wrapper that we will be building around 
> OpenAI gym. So I was wondering whether we will integrate that into the main 
> mlpack repository or whether it'll be a completely separate project.
> 
> I will start adding stuff to my proposal and send it to you for your thoughts 
> soon.
> 
> Thanks,
> Sahith 
>> On 11. Mar 2018, at 04:54, Sahith D <[email protected] 
>> <mailto:[email protected]>> wrote:
> 
>> 
>> Hello Marcus,
>> Apologies for the long delay in my reply. I had my midsem examinations going 
>> on and was unable to respond.
>> 
>> The time metric I had in mind was more related to how long the actual in 
>> game time is for which I think is independent of the system and is part of 
>> the environment itself. However I realized that most games already have a 
>> score that focuses on time so this might seem redundant.
>> 
>> In one of your previous mails you mentioned we should initially focus on 
>> existing mlpack methods for the training. The only mlpack RL method 
>> currently present is a Q-Learning model from last year's GSOC which includes 
>> policies and also experience replays. While this is good for the basic 
>> environments in OpenAI we should implement at least one more method to 
>> supplement it.
>> 
>> 1. Double- DQN could be a good fit as it just builds on top of the current 
>> method and hence would be the best to pursue
>> 2. An advanced Policy Gradient method which would take more time but could 
>> also extend the number of environments that can be solved in the future.
>> 
>> Also in regards building an API I would like to know whether you wanted to 
>> focus on building on top of the methods already present in mlpack and extend 
>> them as much as we can or build something from scratch but using the mlpack 
>> methods present whenever we need them.
>> 
>> Thanks
>> 
>> 
>> 
>> On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello Sahith,
>> 
>> I'm not sure about the time metric, it might be meaningless if not run on the
>> same or similar system. If we only compare our own methods, that should be 
>> fine
>> through. The rest sounds reasonable to me.
>> 
>> Best,
>> Marcus
>> 
>>> On 2. Mar 2018, at 22:34, Sahith D <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Marcus,
>>> 
>>> Making pre-trained models sounds good however we'll have to pick the most 
>>> popular or easiest environments for this at least in the start. 
>>> For meaningful metrics other than iterations we could have use the score of 
>>> the game which is the best possible metric and also the time it takes to 
>>> reach that score. Depending on the environment, a low time or a large time 
>>> could be better. The user controlled parameters could also include
>>> 1. Exploration rate/ Exploration rate decay
>>> 2. Learning rate 
>>> 3. Reward size
>>> Perhaps a few more but these are essential.
>>> 
>>> I like the idea of creating an API to upload results. We could include the 
>>> metrics that we've talked about and perhaps include a bit more like the a 
>>> recording that you mentioned possibly one where they can watch the agent 
>>> learn through each iteration and see it become better.
>>> 
>>> Thanks,
>>> Sahith  
>>> 
>>> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hello Sahith,
>>> 
>>>> This looks very feasible along with being cool and intuitive. We could 
>>>> implement
>>>> a system where a user who is a beginner can just choose an environment and 
>>>> input
>>>> a particular pre-built methods and can compare different methods through
>>>> visualizations and the actual emulation of the game environment. Other 
>>>> users can
>>>> have more control and call only specific functions of the API which they 
>>>> need
>>>> and can modify everything and these people would be the ones who would most
>>>> benefit from a having leaderboard for comparison between other users on 
>>>> OpenAI
>>>> gym.
>>> 
>>> I think merging ideas from both sides is a neat idea; the first step should
>>> focus on the existing mlpack methods, provide pre-trained models for 
>>> specific
>>> parameter sets and output some metrics, providing a recording of the 
>>> environment
>>> is also a neat feature. Note the optimizer visualization allows a user to 
>>> fine
>>> control the optimizer parameter, but only because the time to find a 
>>> solution is
>>> low, in case of  RL methods we are talking about minutes or hours, so 
>>> providing
>>> pretraining models is essential. If you like the idea, we should think about
>>> some meaningful metrics, besids number of iterations.
>>> 
>>> For other frameworks, one idea is to provide an API to upload the results, 
>>> based
>>> on the information, we could generate the metrics.
>>> 
>>> Let me know what you think.
>>> 
>>> Thanks,
>>> Marcus
>>> 
>>>> On 2. Mar 2018, at 13:08, Sahith D <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Hi Marcus,
>>>> This looks very feasible along with being cool and intuitive. We could 
>>>> implement a system where a user who is a beginner can just choose an 
>>>> environment and input a particular pre-built methods and can compare 
>>>> different methods through visualizations and the actual emulation of the 
>>>> game environment. Other users can have more control and call only specific 
>>>> functions of the API which they need and can modify everything and these 
>>>> people would be the ones who would most benefit from a having leaderboard 
>>>> for comparison between other users on OpenAI gym.
>>>> Though I would like to know how in depth you would want this to be. The 
>>>> optimizer tutorial seems to have pretty much all the major optimizers 
>>>> currently being used. Do you think we should try something thats as 
>>>> extensive or just set up a framework for future contributors? 
>>>> 
>>>> Thanks,
>>>> Sahith
>>>> 
>>>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Hello Sahith,
>>>> 
>>>> I like the idea, also since OpenAI abandoned the leaderboard this could be 
>>>> a
>>>> great opportunity. I'm a fan of giving a user the opportunity to test the
>>>> methods without much hassle, so one idea is to provide an interface for 
>>>> the web,
>>>> that exposes a minimal set of settings, something like:
>>>> 
>>>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html 
>>>> <http://www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html>
>>>> 
>>>> Let me know what you think, there are a bunch of interesting features, 
>>>> that we
>>>> could look into, but we should make sure each is tangible and useful.
>>>> 
>>>> Thanks,
>>>> Marcus
>>>> 
>>>>> On 28. Feb 2018, at 23:03, Sahith D <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> A playground type project sounds like a great idea. We could start with 
>>>>> using the current Q-Learning method already present in the mlpack 
>>>>> repository and then apply it to a environments in gym as a sort of 
>>>>> tutorial. We could then move onto more complex methods like Double 
>>>>> Q-Learning and Monte Carlo Tree Search (just suggestions) just to get 
>>>>> started so that more people will get encouraged to try their hand at 
>>>>> solving the environments in more creative ways using C++ as the python 
>>>>> community is already pretty strong. If we could build something of a 
>>>>> leaderboard similar to what OpenAI gym already has then it could foster a 
>>>>> creative community of people who want to try more RL. Does this sound 
>>>>> good or can it be improved upon?
>>>>> 
>>>>> Thanks,
>>>>> Sahith.
>>>>> 
>>>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hello Sahith,
>>>>> 
>>>>>> 1. We could implement all the fundamental RL algorithms like those over 
>>>>>> here
>>>>>> https://github.com/dennybritz/reinforcement-learning 
>>>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository 
>>>>>> contains
>>>>>> nearly all the algorithms that are useful for RL according to David 
>>>>>> Silver's RL
>>>>>> course. They're all currently in python so it could just be a matter of 
>>>>>> porting
>>>>>> them over to use mlpack.
>>>>> 
>>>>> I don't think implementing all the methods, is something we should pursue 
>>>>> over
>>>>> the summer, writing the method itself and coming up with some meaningful 
>>>>> tests
>>>>> takes time. Also, in my opinion instead of implementing all methods, we 
>>>>> should
>>>>> pick methods that make sense in a specific context and make them as fast 
>>>>> and
>>>>> easy to use as possible.
>>>>> 
>>>>>> 2. We could implement fewer algorithms but work more on solving the 
>>>>>> OpenAI gym
>>>>>> environments using them. This would require tighter integration of the 
>>>>>> gym
>>>>>> wrapper that you have already written. If enough environments can be 
>>>>>> solved then
>>>>>> this could become a viable C++ library for comparing RL algorithms in the
>>>>>> future.
>>>>> 
>>>>> I like the idea, this could be a great way to present the RL 
>>>>> infrastructure to a
>>>>> wider audience, in the form of a playground.
>>>>> 
>>>>> Let me know what you think.
>>>>> 
>>>>> Thanks,
>>>>> Marcus
>>>>> 
>>>>>> On 27. Feb 2018, at 23:01, Sahith D <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Hi Marcus,
>>>>>> Sorry for not updating you earlier as I had some exams that I needed to 
>>>>>> finish first.
>>>>>> I've been working on the policy gradient over in this repository which 
>>>>>> you can see over here https://github.com/SND96/mlpack-rl 
>>>>>> <https://github.com/SND96/mlpack-rl>
>>>>>> I also had some ideas on what this project could be about.
>>>>>> 
>>>>>> 1. We could implement all the fundamental RL algorithms like those over 
>>>>>> here https://github.com/dennybritz/reinforcement-learning 
>>>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository 
>>>>>> contains nearly all the algorithms that are useful for RL according to 
>>>>>> David Silver's RL course. They're all currently in python so it could 
>>>>>> just be a matter of porting them over to use mlpack. 
>>>>>> 2. We could implement fewer algorithms but work more on solving the 
>>>>>> OpenAI gym environments using them. This would require tighter 
>>>>>> integration of the gym wrapper that you have already written. If enough 
>>>>>> environments can be solved then this could become a viable C++ library 
>>>>>> for comparing RL algorithms in the future.
>>>>>> 
>>>>>> Right now I'm working on the solving one of the environments in gym 
>>>>>> using a Deep Q-Learning approach similar to what is already there in the 
>>>>>> mlpack library from last year's gsoc. Its taking a bit longer than I 
>>>>>> hoped as I'm still familiarizing myself with some of the server calls 
>>>>>> being made and how to properly get information about the environements. 
>>>>>> Would appreciate your thoughts on the ideas that I have and anything 
>>>>>> else that you had in mind.
>>>>>> 
>>>>>> Thanks!
>>>>>> Sahith
>>>>>> 
>>>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> Hi Marcus,
>>>>>> I've been having difficulties compiling mlpack which has stalled my 
>>>>>> progress. I've opened an issue on the same and appreciate any help.
>>>>>> 
>>>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> Hey Marcus,
>>>>>> No problem with the slow response as I was familiarizing myself better 
>>>>>> with the codebase and the methods present in the meantime. I'll start 
>>>>>> working on what you mentioned. I'll notify you when I finish.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> Hello Sahith,
>>>>>> 
>>>>>> thanks for getting in touch and sorry for the slow response.
>>>>>> 
>>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the 
>>>>>> > past year
>>>>>> > and am interested in coding with mlpack on the RL project for this 
>>>>>> > summer. I've
>>>>>> > been going through the codebase and have managed to get the Open AI 
>>>>>> > gym api up
>>>>>> > and running on my computer. Is there any other specific task I can do 
>>>>>> > while I
>>>>>> > get to know more of the codebase?
>>>>>> 
>>>>>> Great that you got it all working, another good entry point is to write 
>>>>>> a simple
>>>>>> RL method, one method that is simple that comes to mind is the Policy 
>>>>>> Gradients
>>>>>> method. Another idea is to write an example for solving a GYM 
>>>>>> environment with
>>>>>> the existing codebase, something in the vein of the Kaggel Digit 
>>>>>> Recognizer
>>>>>> Eugene wrote
>>>>>> (https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer 
>>>>>> <https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer>).
>>>>>> 
>>>>>> Let me know if I should clarify anything.
>>>>>> 
>>>>>> Thanks,
>>>>>> Marcus
>>>>>> 
>>>>>> > On 19. Feb 2018, at 20:41, Sahith D <[email protected] 
>>>>>> > <mailto:[email protected]>> wrote:
>>>>>> >
>>>>>> > Hello Marcus,
>>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the 
>>>>>> > past year and am interested in coding with mlpack on the RL project 
>>>>>> > for this summer. I've been going through the codebase and have managed 
>>>>>> > to get the Open AI gym api up and running on my computer. Is there any 
>>>>>> > other specific task I can do while I get to know more of the codebase?
>>>>>> > Thanks!
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Reinforcement Learning GSOC

Reply via email to