RIASSUNTO
The area of Automatic Speech Emotion Recognition (ASER) has garnered a lot of interest among researchers. The framework of ASER typically includes three steps viz. speech feature extraction, dimensionality reduction and feature classification. At the base of this framework lies the design and recording of the database of emotional states through which the most popular set of emotions-happiness, sadness, anger, fear, disgust, boredom (which are typically called as `archetypal emotions') and neutral among others have been obtained. This paper surveys the extent of work done in this field especially highlighting the three steps of the ASER framework. Starting with the different languages that have been explored till date for creating the databases, this paper attempts to categorize the features that have been typically extracted, enlist the dimensionality reduction techniques that have been chosen and discuss the pros and cons, if any, of the feature classifiers that have been modelled.