Hello Sahith, > Not exactly. I was referring to the wrapper that we will be building around > OpenAI gym. So I was wondering whether we will integrate that into the main > mlpack repository or whether it'll be a completely separate project
I think we should outsource the wrapper, perhaps someone finds it useful for another project; but it's not a completely separate project, every RL method in mlpack already follows the OpenAI gym interface (action, step), so the integration is straightforward. Let me know if I should clarify anything. Thanks, Marcus > On 13. Mar 2018, at 20:09, Sahith D <[email protected]> wrote: > > Hello Marcus, > > I see we could definitely introduce a metric that is related, e.g. that counts > the number of evaluations/iterations. > > Yes that seems like a good metric. Though like I said before it might be a > bit redundant for some environments. > > I like both ideas, we should just make sure it is manageable, as you already > pointed out the Advanced Policy Gradient method might take more time. > > I'll start making a more concrete approach to both these methods > > Are you talking about the additional RL method? > > Not exactly. I was referring to the wrapper that we will be building around > OpenAI gym. So I was wondering whether we will integrate that into the main > mlpack repository or whether it'll be a completely separate project. > > I will start adding stuff to my proposal and send it to you for your thoughts > soon. > > Thanks, > Sahith >> On 11. Mar 2018, at 04:54, Sahith D <[email protected] >> <mailto:[email protected]>> wrote: > >> >> Hello Marcus, >> Apologies for the long delay in my reply. I had my midsem examinations going >> on and was unable to respond. >> >> The time metric I had in mind was more related to how long the actual in >> game time is for which I think is independent of the system and is part of >> the environment itself. However I realized that most games already have a >> score that focuses on time so this might seem redundant. >> >> In one of your previous mails you mentioned we should initially focus on >> existing mlpack methods for the training. The only mlpack RL method >> currently present is a Q-Learning model from last year's GSOC which includes >> policies and also experience replays. While this is good for the basic >> environments in OpenAI we should implement at least one more method to >> supplement it. >> >> 1. Double- DQN could be a good fit as it just builds on top of the current >> method and hence would be the best to pursue >> 2. An advanced Policy Gradient method which would take more time but could >> also extend the number of environments that can be solved in the future. >> >> Also in regards building an API I would like to know whether you wanted to >> focus on building on top of the methods already present in mlpack and extend >> them as much as we can or build something from scratch but using the mlpack >> methods present whenever we need them. >> >> Thanks >> >> >> >> On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <[email protected] >> <mailto:[email protected]>> wrote: >> Hello Sahith, >> >> I'm not sure about the time metric, it might be meaningless if not run on the >> same or similar system. If we only compare our own methods, that should be >> fine >> through. The rest sounds reasonable to me. >> >> Best, >> Marcus >> >>> On 2. Mar 2018, at 22:34, Sahith D <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Marcus, >>> >>> Making pre-trained models sounds good however we'll have to pick the most >>> popular or easiest environments for this at least in the start. >>> For meaningful metrics other than iterations we could have use the score of >>> the game which is the best possible metric and also the time it takes to >>> reach that score. Depending on the environment, a low time or a large time >>> could be better. The user controlled parameters could also include >>> 1. Exploration rate/ Exploration rate decay >>> 2. Learning rate >>> 3. Reward size >>> Perhaps a few more but these are essential. >>> >>> I like the idea of creating an API to upload results. We could include the >>> metrics that we've talked about and perhaps include a bit more like the a >>> recording that you mentioned possibly one where they can watch the agent >>> learn through each iteration and see it become better. >>> >>> Thanks, >>> Sahith >>> >>> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hello Sahith, >>> >>>> This looks very feasible along with being cool and intuitive. We could >>>> implement >>>> a system where a user who is a beginner can just choose an environment and >>>> input >>>> a particular pre-built methods and can compare different methods through >>>> visualizations and the actual emulation of the game environment. Other >>>> users can >>>> have more control and call only specific functions of the API which they >>>> need >>>> and can modify everything and these people would be the ones who would most >>>> benefit from a having leaderboard for comparison between other users on >>>> OpenAI >>>> gym. >>> >>> I think merging ideas from both sides is a neat idea; the first step should >>> focus on the existing mlpack methods, provide pre-trained models for >>> specific >>> parameter sets and output some metrics, providing a recording of the >>> environment >>> is also a neat feature. Note the optimizer visualization allows a user to >>> fine >>> control the optimizer parameter, but only because the time to find a >>> solution is >>> low, in case of RL methods we are talking about minutes or hours, so >>> providing >>> pretraining models is essential. If you like the idea, we should think about >>> some meaningful metrics, besids number of iterations. >>> >>> For other frameworks, one idea is to provide an API to upload the results, >>> based >>> on the information, we could generate the metrics. >>> >>> Let me know what you think. >>> >>> Thanks, >>> Marcus >>> >>>> On 2. Mar 2018, at 13:08, Sahith D <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi Marcus, >>>> This looks very feasible along with being cool and intuitive. We could >>>> implement a system where a user who is a beginner can just choose an >>>> environment and input a particular pre-built methods and can compare >>>> different methods through visualizations and the actual emulation of the >>>> game environment. Other users can have more control and call only specific >>>> functions of the API which they need and can modify everything and these >>>> people would be the ones who would most benefit from a having leaderboard >>>> for comparison between other users on OpenAI gym. >>>> Though I would like to know how in depth you would want this to be. The >>>> optimizer tutorial seems to have pretty much all the major optimizers >>>> currently being used. Do you think we should try something thats as >>>> extensive or just set up a framework for future contributors? >>>> >>>> Thanks, >>>> Sahith >>>> >>>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Hello Sahith, >>>> >>>> I like the idea, also since OpenAI abandoned the leaderboard this could be >>>> a >>>> great opportunity. I'm a fan of giving a user the opportunity to test the >>>> methods without much hassle, so one idea is to provide an interface for >>>> the web, >>>> that exposes a minimal set of settings, something like: >>>> >>>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html >>>> <http://www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html> >>>> >>>> Let me know what you think, there are a bunch of interesting features, >>>> that we >>>> could look into, but we should make sure each is tangible and useful. >>>> >>>> Thanks, >>>> Marcus >>>> >>>>> On 28. Feb 2018, at 23:03, Sahith D <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> A playground type project sounds like a great idea. We could start with >>>>> using the current Q-Learning method already present in the mlpack >>>>> repository and then apply it to a environments in gym as a sort of >>>>> tutorial. We could then move onto more complex methods like Double >>>>> Q-Learning and Monte Carlo Tree Search (just suggestions) just to get >>>>> started so that more people will get encouraged to try their hand at >>>>> solving the environments in more creative ways using C++ as the python >>>>> community is already pretty strong. If we could build something of a >>>>> leaderboard similar to what OpenAI gym already has then it could foster a >>>>> creative community of people who want to try more RL. Does this sound >>>>> good or can it be improved upon? >>>>> >>>>> Thanks, >>>>> Sahith. >>>>> >>>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Hello Sahith, >>>>> >>>>>> 1. We could implement all the fundamental RL algorithms like those over >>>>>> here >>>>>> https://github.com/dennybritz/reinforcement-learning >>>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository >>>>>> contains >>>>>> nearly all the algorithms that are useful for RL according to David >>>>>> Silver's RL >>>>>> course. They're all currently in python so it could just be a matter of >>>>>> porting >>>>>> them over to use mlpack. >>>>> >>>>> I don't think implementing all the methods, is something we should pursue >>>>> over >>>>> the summer, writing the method itself and coming up with some meaningful >>>>> tests >>>>> takes time. Also, in my opinion instead of implementing all methods, we >>>>> should >>>>> pick methods that make sense in a specific context and make them as fast >>>>> and >>>>> easy to use as possible. >>>>> >>>>>> 2. We could implement fewer algorithms but work more on solving the >>>>>> OpenAI gym >>>>>> environments using them. This would require tighter integration of the >>>>>> gym >>>>>> wrapper that you have already written. If enough environments can be >>>>>> solved then >>>>>> this could become a viable C++ library for comparing RL algorithms in the >>>>>> future. >>>>> >>>>> I like the idea, this could be a great way to present the RL >>>>> infrastructure to a >>>>> wider audience, in the form of a playground. >>>>> >>>>> Let me know what you think. >>>>> >>>>> Thanks, >>>>> Marcus >>>>> >>>>>> On 27. Feb 2018, at 23:01, Sahith D <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Hi Marcus, >>>>>> Sorry for not updating you earlier as I had some exams that I needed to >>>>>> finish first. >>>>>> I've been working on the policy gradient over in this repository which >>>>>> you can see over here https://github.com/SND96/mlpack-rl >>>>>> <https://github.com/SND96/mlpack-rl> >>>>>> I also had some ideas on what this project could be about. >>>>>> >>>>>> 1. We could implement all the fundamental RL algorithms like those over >>>>>> here https://github.com/dennybritz/reinforcement-learning >>>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository >>>>>> contains nearly all the algorithms that are useful for RL according to >>>>>> David Silver's RL course. They're all currently in python so it could >>>>>> just be a matter of porting them over to use mlpack. >>>>>> 2. We could implement fewer algorithms but work more on solving the >>>>>> OpenAI gym environments using them. This would require tighter >>>>>> integration of the gym wrapper that you have already written. If enough >>>>>> environments can be solved then this could become a viable C++ library >>>>>> for comparing RL algorithms in the future. >>>>>> >>>>>> Right now I'm working on the solving one of the environments in gym >>>>>> using a Deep Q-Learning approach similar to what is already there in the >>>>>> mlpack library from last year's gsoc. Its taking a bit longer than I >>>>>> hoped as I'm still familiarizing myself with some of the server calls >>>>>> being made and how to properly get information about the environements. >>>>>> Would appreciate your thoughts on the ideas that I have and anything >>>>>> else that you had in mind. >>>>>> >>>>>> Thanks! >>>>>> Sahith >>>>>> >>>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> Hi Marcus, >>>>>> I've been having difficulties compiling mlpack which has stalled my >>>>>> progress. I've opened an issue on the same and appreciate any help. >>>>>> >>>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> Hey Marcus, >>>>>> No problem with the slow response as I was familiarizing myself better >>>>>> with the codebase and the methods present in the meantime. I'll start >>>>>> working on what you mentioned. I'll notify you when I finish. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> Hello Sahith, >>>>>> >>>>>> thanks for getting in touch and sorry for the slow response. >>>>>> >>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the >>>>>> > past year >>>>>> > and am interested in coding with mlpack on the RL project for this >>>>>> > summer. I've >>>>>> > been going through the codebase and have managed to get the Open AI >>>>>> > gym api up >>>>>> > and running on my computer. Is there any other specific task I can do >>>>>> > while I >>>>>> > get to know more of the codebase? >>>>>> >>>>>> Great that you got it all working, another good entry point is to write >>>>>> a simple >>>>>> RL method, one method that is simple that comes to mind is the Policy >>>>>> Gradients >>>>>> method. Another idea is to write an example for solving a GYM >>>>>> environment with >>>>>> the existing codebase, something in the vein of the Kaggel Digit >>>>>> Recognizer >>>>>> Eugene wrote >>>>>> (https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer >>>>>> <https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer>). >>>>>> >>>>>> Let me know if I should clarify anything. >>>>>> >>>>>> Thanks, >>>>>> Marcus >>>>>> >>>>>> > On 19. Feb 2018, at 20:41, Sahith D <[email protected] >>>>>> > <mailto:[email protected]>> wrote: >>>>>> > >>>>>> > Hello Marcus, >>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the >>>>>> > past year and am interested in coding with mlpack on the RL project >>>>>> > for this summer. I've been going through the codebase and have managed >>>>>> > to get the Open AI gym api up and running on my computer. Is there any >>>>>> > other specific task I can do while I get to know more of the codebase? >>>>>> > Thanks! >>>>>> >>>>> >>>> >>> >>
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
