Gomorrah is an Italian crime drama TV series that has been appreciated worldwide, being sold in 190 countries, despite its wide use of Neapolitan dialect, hardly understandable without the use of subtitles even for most Italians. Scholars immediately approached the study of this serial phenome- non, analysing it from different points of view, framing it within the broad- er context of studies on the new Italian television and its serial products. Our approach to Gomorrah, taking these elements into account, adds a new perspective that concerns character recognition, an emerging branch of re- search, to associate dialogues with characters and identify the verbal features of characters. We have then chosen Gomorrah as a challenging dataset to perform character recognition. We rely on the transcripts of the series after a pre-processing stage to standardize the lexicon despite the vagaries of dialect and remove stopwords. A machine learning approach, based on a selection of tools, is then employed to identify characters from the lexicon they employ. The problem is approached as a multi-class classification scheme. We compare several representations of texts, including the simple one-hot en- coding and more advanced embedding techniques. The results are presented through a confusion matrix, which can also serve to identify similarities in the linguistic profiles of characters.
Character recognition; Gomorrah; machine learning; text analysis.