This blog is on my implementation of the classic game of Rock-Paper-Scissors, with a twist.
Link to Code.
Abstract
Rock–paper–scissors (also known as scissors-paper-rock or other variants) is a hand game usually played between two people, in which each player simultaneously forms one of three shapes with an outstretched hand. These shapes are "rock" (a closed fist), "paper" (a flat hand), and "scissors" (a fist with the index finger and middle finger extended, forming a V). "Scissors" is identical to the two-fingered V sign (aka "victory" or "peace sign") except that it is pointed horizontally instead of being held upright in the air. A simultaneous, zero-sum game, it has only two possible outcomes: a draw, or a win for one player and a loss for the other.
Rules
The rules are pretty simple which you could even memorize 😋
Scissors cuts Paper
Paper covers Rock
Rock crushes Lizard
Lizard poisons Spock
Spock smashes Scissors
Scissors decapitates Lizard
Lizard eats Paper
Paper disproves Spock
Spock vaporizes Rock
(and as it always has) Rock crushes Scissors
Creating the dataset
Although, the dataset is available at my GitHub repository, I would encourage you to create it yourself. Let's look at the dataset in detail and then I will explain the code for capturing it.
Here's what you could do :
Download the dataset from my GitHub repository. The page would look like this.
Or Create your own dataset
Notes from this code gist
Total pictures per gesture - 1200 (line 15)
Images are stored in '/gestures' folder (line 19)
Size of images would be 50 x 50 - image_x, image_y = 50, 50 (line 5)
Filtering out only skin color from the entire image - cv2.inRange(hsv, np.array([2, 50, 60]), np.array([25, 150, 255])) (line 29)
From lines 29 - 40, we are developing our filters based on skin color and filtering out the contours.
Selecting region of interest - x, y, w, h = 300, 50, 350, 350 (line 17)
You will see 2 frames on the screen 'frame' for normal image and 'thresh' which will contain your filtered frame. (line 60, 61)
Press the button 'c' on the keyboard to start capturing the images (line 63)
Procedure
Run the CreateGest.py file. Enter the gesture name and you will get 2 frames displayed. Look at the contour frame and adjust your hand to make sure that you capture the features of your hand. Press 'c' for capturing the images. It will take 1200 images of one gesture. Try moving your hand a little within the frame to make sure that your model doesn't overfit at the time of training.
Setting the environment
Now, we have images in the directory, we can do multiple things with it - either we could load it in pickle file, create csv or use tensorflow's Image generator to automatically get the images from the folder. I got this python script for converting images from directory into CSV files - CreateCSV.py
Loading the data
After running the script, we have our csv file ready as well. Note -
5 gestures
1200 images per gesture
Size of image 50 x 50 (2500 pixels)
Training
Defining the model
Model : The above gist defines the model that we'll use.
Layers : There are two convolution-max pool layers in the model followed by fully connected layers and dropout.
Dropouts : They are really important since you don't want to overfit on your training set. Dropout forces your model to learn from all the neurons and not just some of them, this helps to generalize on new data.
Output : The output layer has softmax activation which enables the output to be one of the 6 labels that we defined previously(Rock - Paper - Scissor - Lizard - Spock and 1 arbitrary label).
Optimizer : I am using Adam optimizer
Epochs :2
Batch size : 64
Callbacks : I am using TensorBoard for visualization of my model.
Save & load : I am saving the model ('RPS.h5') after each batch so that I can make predictions on the trained model later.
Visualization
Metrics
At the end of the training, my accuracy is around 98% on the training set and our loss is around 0.04. I think this could be achieved only because we are using filters and removing all the background noise. The test set also has similar accuracy (around 95%). I think that our model has performed extremely well based on the fact that there are only 5 labels.
Building the application
Now, we have everything ready. Let's build the application which uses the webcam to help you 'play' the game.
This is the application that I wrote.
Webcam : I am using the webcam for streaming live webcam feed. Notice that I am filtering out only skin color from the entire image - cv2.inRange(hsv, np.array([2, 50, 60]), np.array([25, 150, 255])).
Filters : I filter out only blue skin from the image using these 2 numpy arrays and several filters and masks. At the end of it, I have used find contours function to detect my hand.
Gesture detection : Based on the contours, I generate an image which is sent for prediction at run-time. I am assigning a random emoji to CPU and based on the result, from my gesture and CPU's random gesture I am declaring the winner for that round.
Prediction : The gestures are then sent to the model for prediction.
Output
Congratulations, you have successfully built the prototype for
Rock - Paper - Scissors - Lizard - Spock.
Comments