Determining Player Skill in the Game of Go with DL. with Deep Neural Networks

Pages 16
Views 3

Please download to get full document.

View again

of 16
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Description
Determining Player Skill in the Game of Go with Deep Neural Networks Josef Moudřík 1 Roman Neruda 2 1 Charles University in Prague Faculty of Mathematics and Physics 2 Institute of
Transcript
Determining Player Skill in the Game of Go with Deep Neural Networks Josef Moudřík 1 Roman Neruda 2 1 Charles University in Prague Faculty of Mathematics and Physics 2 Institute of Computer Science Academy of Sciences of the Czech Republic TPNC 2016 DOI: / Presentation Outline Introduction: Go, Computer Go, Deep Learning Motivation Dataset Augmentation & Downsampling Model Architecture Experiments Conclusions Introduction: Game of Go One of the oldest games. 2 players, perfect information, deterministic rules. Board size of intersections. Goal: control the board enclose territory, capture enemy. Introduction: Computer Go Go AI is hard: high branching factor, no clear evaluation function. Recently solved by Google AlphaGo, a combination of Monte Carlo Tree Search with deep learning. [Silver et al., 2016] Introduction: Deep Learning Differentiable neural network models, large number of parameters, deep error is back-propagated through many steps. Convolutional Neural Networks: hierarchical model based on learning convolutional kernels, great for data with spatial structure e.g. images, sound spectrograms, Go boards. Learns increasingly abstract hierarchical representations. Introduction: Motivation Strength of Go players is measured by rating: a numerical quantity rating is assigned to each player, updated after each game, using win/loss information. Rating is used to e.g. pair opponents with similar strength. Rating converges slowly for new players, causing problems such as badly matched opponents and rating deflation. Can we use more information (than the win/loss bit) from each game? Introduction: Motivation Strength of Go players is measured by rating: a numerical quantity rating is assigned to each player, updated after each game, using win/loss information. Rating is used to e.g. pair opponents with similar strength. Rating converges slowly for new players, causing problems such as badly matched opponents and rating deflation. Can we use more information (than the win/loss bit) from each game? Maybe the game record itself?! Introduction: Motivation Strength of Go players is measured by rating: a numerical quantity rating is assigned to each player, updated after each game, using win/loss information. Rating is used to e.g. pair opponents with similar strength. Rating converges slowly for new players, causing problems such as badly matched opponents and rating deflation. Can we use more information (than the win/loss bit) from each game? Maybe the game record itself?! Our Work: Use Deep Learning to predict player s strength from a board position, aiming to improve convergence of rating systems. Dataset 188,700 Games from Online Go Server (OGS). this makes for 3,426,489 pairs (X, y), where y is one of 3 classes based on strength, y {strong, intermediate, beginner} X is encoding of position and last 4 moves, represented as a volume of size : 4 planes of liberties of current player, 4 planes of liberties of opponent, 1 plane for empty intersections, 4 planes marking the last 4 moves. Augmentation & Downsampling Techniques to reduce over-fitting and improve generalization. Sub-sampling: on average, take every 5th position from each game (uniformly randomly). Augmentation: each sample is randomly transformed into 1 of its 8 symmetries during training. Equalization: y classes are equally represented in the training set (throwaway superfluous examples). Model Architecture Input layer, 1 Convolutional layer of 512 filters of size 5 5, 3 Convolutional layer of 128 filters of size 3 3, 2 fully connected layers of 128 neurons, Output layer, 3-way Softmax. All layers (except for the final one) have ReLU activation. Trained with mini-batched SGD with Nesterov momentum. Img. adapted from [Silver et al., 2016]. Experiments and Results Single Position Baseline case, accuracy 71.5% Training Data Loss Confusion Matrix P Training Steps Figure: Training Loss Evolution Experiments and Results Single Position Accuracy Accuracy Sample Size 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% Sample Size at Move % Move Number Figure: Dependency of accuracy and sample size on move number. Experiments and Results Aggregation Summary Table: Summary of results. Augmentation (ensemble of 8 symmetries), Cropped (skip first 30 moves), Weighted (proportionaly to avg. Acc. for given move). Model Acc. Acc. (Top-2) Single Position 71.5 % 94.6% Single Position (A) 72.5 % 94.9% Aggregated per Game, mode (A) 76.8 % N/A Aggregated per Game, sum (A) 77.1 % 96.4% Aggregated per Game, sum (A, C) 77.7 % 96.7% Aggregated per Game, sum (A, W) 77.9 % 96.8% Conclusions We have used Deep Learning to predict player s strength from a single game position (= little information). The method is applicable to whole games by aggregating individual predictions. Works nicely for 3 target classes, more data would be good to move towards accurate regression. Will be experimentally deployed on Online Go Server (hopefuly) soon. References I Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x