ISLAR : Indian Sign Language Recognition (Phase 0)

Sign language uses visual-manual modality to convey meaning. They are full-fledged natural languages with their own grammar and lexicon and they are not mutually intelligible with each other, although there are also striking similarities among sign languages.

As discussed, Sign languages around the world are different and that makes sense. Let's take an example of American Sign Language (ASL), it is universal throughout the US and it is standardized as well.

In India, however, the case is very different. The crux of the problem is the fact that Indian Sign Language is different within different regions in India with multiple signs having the same meaning. Through ISLAR, I have tried to come up with my implementation for building an Indian Sign Language Recognition Application.

Background

One of the first projects that I did was building and integrating a smart band with a women's safety application, Native Application for Rescue, India (नारी – Woman) along with my colleagues Sarthak Tyagi and Abhinandan Mishra. During the making of this project, I realized helping people and working on a social cause makes the entire effort worthwhile. Furthermore, while I was working on machine learning projects, I came across this story which gave my life a new perspective. This is Tania's story and how her voice helped others in finding theirs.

Problem Statement

In a country like India, we have a blend of different cultures coming together and living in harmony. In such a diverse country, different regions have entirely different languages and scripts. Moreover, Sign Language in India has never been unified or standardized simply because the task is very cumbersome. In Maharashtra, a state located in western India, 2 cities - Mumbai and Pune which are 80 km apart have different Sign Languages. The solution proposed must be generic as well as adaptive for Indian conditions specifically running accurately and efficiently on low resources. These points make this task extremely difficult and almost unsolvable.

However, the advent of machine learning and efficient computing has enabled researchers all over the world to provide utilitarian solutions for previously unsolvable problems. While going through similar implementations by my fellow researchers I found out that, in most of those implementations, the subject was using external hardware for conveying signs. This ranged from an external glove to having depth-sensor cameras. However, I wanted to keep my implementation unique and independent of these external hardwares. Through this project, I am presenting my findings on a recognition system for Indian Sign Language along with a proposal for standardizing Sign Language across India.

Prelude

To be very frank, I was under the notion that this project would be done pretty quickly (since I previously worked on American Sign Language through Kaggle). I was also under the impression that the resources would be freely available on the internet. However, that was not true at all. The resources were inconsistent along with no proper guidelines to follow. At this point, I approached an NGO called as Center for Research and Development for Deaf and Mute, Pune. At the center, I met the head faculty Mrs. Ashwini who oversaw all the kids. During my conversation with her, I learnt a couple of key points.

Indian Sign Language is not standardized.
In Maharashtra itself, there are different sign languages. This happens because the teachers are trained in different institutes which have different signs leading to different languages.
Sign Language is not given importance as a foreign language.
Learning ISL is difficult.

Preparation

After the discussion, I realized this task was not going to be as easy as it seemed. The approach that I followed :

Friends to the rescue - I convinced my friend, Raghav Patnecha, for helping me in this task and sharing his insights.
Learning - I learnt Indian Sign Language for about 3 months before even starting the project. We went to the NGO and interacted with the kids to get key insights for the project.
Resources - Thanks to the internet, we were able to get a couple of videos on Indian Sign Language on YouTube. Also, there are a couple of Smartphone applications that have pre-recorded gestures that came very handy for the project.
Phase distribution - I decided to distribute the development in an iterative phase, mostly like a prototyping model. Different phases had different types of learning model that I used, increasing the complexity slightly to incorporate more changes.
Hustle - I was volunteering at the NGO on weekends since I work on weekdays. But trust me, now that I am writing the blog post I feel that my work has paid off.

Preparation for the project.

Approach

For Indian Sign Language interpretation, you need to infer these things on a high level.

Hand movement
Facial Expressions
Pose Estimation

To get into the details, it is important to track these features along with their relationship with each other. Static detection works well for capturing gestures however, for capturing the entire context, it was important to have a sequential model for transferring context through each frame.

As already mentioned in the preparation part, I followed a prototype approach and distributed the work in different phases. The thought behind it was to develop and validate small prototypes iteratively and keep doing incremental changes.

Phase 0 - In this phase, I only worked on hand segmentation and gesture tracking along with filtering. Since I was not using any special gloves or camera, this task became a bit cumbersome. Also, I had to make sure that the tracking was done in real-time on low resources.
Phase 1 - In this phase, I used the previous module of hand gesture tracking and integrated it with facial key-points detection and tracking. Since we had two modules now, working on low resources and maintaining acceptable FPS became extremely difficult.

In this part of the blog, we will be talking about Phase 0 of the project and mold it according to our use case.

Phase 0

As discussed previously, in this phase, I was targeting only hands and tracking those. We could also go for an object detection approach using YOLO, however, that would have been difficult to work on edge devices. To maintain the frame-rate, I went with a filtering and segmentation approach. If I could get correct thresholds for the filter, my lag would be significantly reduced along with the model being efficient.

Data Collection - The most important thing for me is getting valid and authentic data. After trying for a couple of weeks and going through online repositories for data sets, it was certain that none of them were useful for my use-case. So I designed my own filters and created my own data set.
Data Validation - For validation, I involved my friends and collected data from different hand sizes and shapes to avoid overfitting. We also noted that image augmentation would also be followed while training to model to further improve efficiency.

Let's look at how our dataset looks like.

Digits

0 1 2 3

4 5 8 9

Alphabets

A B C D K

L P W X Z

Since we are using filters and segmentation, we can filter out just our hands and gestures from the entire image. This makes training quick and inferring efficient on edge devices.

Environment - I consider myself to be lucky for having so many collaborations and open source tool access. For training, I used these environments

- Google Colab

- Google Cloud Platform

- Intel Dev Cloud

However, for this part of the blog, I will mention only Google Colab. In part 2 of the series, I will talk about using Google cloud platform and Intel's Dev cloud.

Development

For the development of this phase, we would use local machine and Google Colab.

Local Machine - We need a local machine to capture the training data. I have written filters for filtering just the hand part. That makes this application faster and runnable on low resource environment. I am going to be releasing the code for the filters soon.

Google Colab - For training, we are going to use colab because it is free and is very well maintained. It is a Google research project created to help disseminate machine learning education and research. It's a Jupyter notebook environment that requires no setup to use and runs entirely in the cloud.

Click on New Python3 Notebook. You will be asked for a runtime that you could specify whether you want to use CPU, GPU or TPU.

Once you have your new Python notebook, you can try out a few things.

Let's start with the modeling part. It is important to note that I am using TensorFlow 2.0

Accuracy Loss

Output

Once the model is trained, download it and test it locally. I am going to release the source code and dataset soon after the next phase.

Help Me

You can help me by

Spreading awareness
Signing this petition (here)
Collecting dataset and resources.
Open to suggestions.

Future

In the next phase, I will be sharing my work on recognizing gestures using facial key-points and hand gestures. I would like to extend my gratitude towards the Center for Research and Development for Deaf and Mute, Pune. Also, special mention to my friends Raghav Patnecha and Vijay Patidar for their contribution to this project. I would also like to thank Symantec for encouraging me, Google and Intel for their continuous support and belief, furthermore, this is also for all the people whom I reached out to for help and guidance.