Proximal Policy Optimization

date

Mar 16th 2018

category

machine-learning

short description

Analyzed and improved Proximal Policy Optimization algorithm.

Goals

  • Comparing the algorithm performance with other baseline techniques for OpenAI game environment
  • Exploring performance based on input data preprocessing , using different Neural Network architectures & CPU vs GPU training
  • Modifying different hyperparameters to analyze their impact on the overall performance of the algorithm

link to repository

Implementation

  • The model is developed using TensorFlow and input data is collected from OpenAI GYM’s MS-PACMAN environment.

  • Performance of different neural network architectures is explored: “CNN vs LSTM - Reward function” ppo 1

  • GPU based training was done using Google Collaboratory

  • Reference : OpenAI GYM Baselines

Output

Different models based on the modified hyperparemeters, CPU training & GPU training. Performance comparison(rewards & loss function) plots.