Recognition of facial expressions using OpenCV and Haar classifiers
Jaime BuenaventuraInforme18 de Enero de 2018
2.602 Palabras (11 Páginas)139 Visitas
Recognition of facial expressions using OpenCV and Haar classifiers
Arteaga M. B., Buenaventura J., Chimbo J., Vaca P.
DECEM- Department of Energy and Mechanical Sciences, University of the Armed Forces ESPE
Sangolquí, Ecuador
mbarteaga1@espe.edu.ec
jrchimbo1@espe.edu.ec
gpvaca@espe.edu.ec
jmbuenaventura@espe.edu.ec
Abstract. - Recognizing human facial expression and emotion by computer is an interesting and challenging problem. Identification of facial feature points plays an important role in many facial image applications including human computer interaction, video surveillance, face detection, face recognition, facial expression classification, face modeling and face animation. In this paper we present a system for recognizing emotions through facial expressions displayed. The aim of this project is detection, analysis and recognition of facial features. This paper presents a method based of three main components, face detection, facial expresión feature extraction and facial expresión categorization. For the analysis during the process was used a Haar classifier. The system localizes characteristic points of analyzed face and, based on their displacements certain emotions can be automatically recognized.
Keywords—Emotion Recognition, Facial Expression Recognition (FER), Computer Vision, Haar Features, OpenCV.
- INTRODUCTION
Effective computer image analysis was always a great challenge for many researchers. Tasks, usually quite simple for humans, such as object or emotion recognition proves to be very complicated in computer analysis. Among the main problems are susceptibility to varying lightning conditions, color changes and differences in transformation. Effective detection of human faces is one of the greatest problems in image analysis. Therefore, it is even more challenging to efficiently and effectively localize features of a face in analyzed image and to relate them to expression of emotions. Hereby work is actually an attempt to create a computer system able to automatically detect, localize and recognize facial features. Sought features are, in this case, characteristic points placed in selected locations on human face model. The locations and distances between them change during facial expressions. There are many, more or less effective solutions capable of detecting or recognizing faces, however only a few comprehensive and effective solutions exist connecting all those features together and, at the same time, able to cooperate with an emotion recognition system. [1]
This work describes a real-time automatic facial expression recognition system using video or webcam input. Our work focuses on initially detecting the human face in the video stream, on classifying the human emotion from facial features and on visualizing the recognition results.
- RELATED WORK
- PROPOSED METHODOLOGY
The system consists of 2 main parts generalized: the detection of the face and classification of these.
[pic 1]
Fig. 1. Parts System [7]
- Face Detection
Face detection is the first stage which is desired to be automated. In most of the research, face is already cropped and the system starts with tracking and feature extraction. In others, vision-based automated face detectors or pupil tracking with infrared (IR) cameras are used to localize the face. Alternatively, a face detector can be used to detect the faces in a scene automatically. [6]
Some free face detection softwares are available to researchers for usage and improvement. Most popular of these is the face detector of Open Source Computer Vision Library (OpenCV). This face detector depends on Haar-like wavelet-based object detection proposed by Viola and Jones and improved by Lienhart et al [7,8]. Their algorithm makes three main contributions:
- The use of integral images.
- A selection of features through a boosting algorithm
(Adaboost).
- A method to combine simple classifiers in a cascade
structure.
Each frame is processed firstly through Haar classifiers [9] trained for profile faces. To further improve frame rate and compensate for pose variation. We propose to use interleaved Haar Classifiers. Interleaving is done between front and profile classifiers.
[pic 2]
Fig. 2 Intervaled Classifiers [9]
- Integral Images
Analyzing images is not an easy task. Using just the pixel information can be useful in some fields (i.e. movement detection) but is in general not enough to recognize a known object. In 1998, Papageorgiou et al [10] proposed a method to analyze image features
using a subgroup of Haar-like features, derived from
the Haar transforms. This subgroup was extended
later by Lienhart et al [11] to also detect small rotations
of the sought-after object. The basic classi-
fiers are decision-tree classifiers with at least 2 leaves.
Haar-like features are the input to the basic classifiers
and are calculated as described below. The algorithm
we are describing uses the Haar-like features shown
in figure 3.
[pic 3]
Fig. 3 Haar features [9]
The feature used in a particular classifier is specified by its shape (1a, 2b, etc), position within the region of interest and the scale (this scale is not the same as the scale used at the detection stage, though these two scales are multiplied). For example, in case of the third line feature (2c) the response is calculated as the difference between the sum of image pixel under the rectangle covering the whole feature (including the two white stripes and the black stripe in the middle) and the sum of the image pixels under the black stripe multiplied by 3 in order to compensate for the differences in the size of areas. Calculating sums of pixels over rectangular regions can be very expensive in computational terms, but this problem can be solved by using an intermediate representation of the images, namely integral images. [9]
[pic 4]
Fig. 4 Calculation of rectangular regions [9]
Those intermediate images are easily generated by the cumulative sums of the original image’s pixels: every pixel of the integral image ii(x, y) corresponds to the sum of all the pixels in the original image i from i(0, 0) to i(x’, y’).
[pic 5]
Using recursive formulas, it is possible to generate an integral image from an original with a single computational step:
[pic 6]
where s(x, y) is the cumulative sum of the row.
Once an integral image is generated, it is rather easy to calculate the sum of pixels under an arbitrary rectangular region D using the values of points 1, 2, 3 and 4. This is illustrated in figure 4. In fact, the value of point 1 is the cumulative sum of A, point 2 is the cumulative sum of A+B, point 3 is A + C and point 4 is A + B + C + D. Since we are looking for the value of D, we should subtract from the value of point 4 the value of point 3 and the value of point 2, and add the value of point 1 since it was subtracted twice during the previous operation. [9]
- Featuring selection using Adaboost
Proposed by Schapire [12, 13], the Adaboost algorithm is used to ‘boost’ the performance of a learning algorithm. In this case, the algorithm is used both to train the classifiers and to analyze the input image.
[pic 7]
Fig. 5 First two iterations of Adaboost
In a 24x24 pixel image, there are over 180.000 Haar like features that can be detected, a lot more than the number of pixel in the image (576). In case we are dealing with a bigger image, the number should be multiplied for all the sub-windows of 24 pixels in the image. The computational cost of this operation is clearly prohibitive. Instead, Adaboost is used to select which of the features are actually relevant for the sought-after object, drastically reducing the number of features to be analyzed. In every iteration, Adaboost chooses the most characterizing feature in the entire training set from the 180.000 features possible in every image.
The first two selected feature are displayed I figure 5: it is clear that the most discriminative feature is the difference between the line of the eyes and the surrounding; for a face the surroundings are lighter than the eyes themselves. The second feature selected is the difference in tonality between the eyes and the nose; the nose is also lighter when compared to the area of the eyes. The algorithm will continue to select good features that can be combined in a classifier. [9]
- Cascade of classifiers
Every step, a simple classifier (also called weak because of their low discriminative power) is built. The combination of all the weak classifiers will form a strong classifier that can recognize any kind of object it was trained with. The problem is to search for this sized window over the full picture, applying the sequence of weak classifiers on every sub-window of the picture. Viola and Jones [6] used a cascade of classifiers (see figure 6) to tackle this problem: the first classifier (the most discriminative) is applied to all the sub-windows of the image, and at different scale. The second classifier will be applied only to the sub-windows in which the first classifier succeeded. The cascade continues, applying all the weak classifiers and discarding the negative sub-windows, concentrating the computational power only on the promising areas. [6]
...