Faculty Award
Dr. Maite Taboada receives funding for two important projects
Congratulations to Dr. Maite Taboada for receiving funding through the National Sciences and Engineering Research Council of Canada Discovery Research Programs. More than a $506 million investment in research was announced through the National Sciences and Engineering Research Council in this year's competition.
Dr. Taboada's project, Natural language processing for detecting toxic, abusive, and hateful language online, will present a timely look into the detection of online toxicity. The goal of the project is to continue work that Dr. Taboada's Discourse Processing Lab has been pursuing to develop tools to automatically identify and classify non-constructive and toxic comments.
Dr. Taboada is also part of a team that recently received a grant from Quebec funding agency L'Observatoire international sur les impacts soci茅taux de l'IA et du num茅rique (OBVIA) for projects on societal impacts. The project, Mind the Gap: representation des femmes dans les media Qu茅b茅cois durant la pand茅mie de COVID-19 (Representation of women in Quebec media during the COVID-19 pandemic), is a collaborative project with researchers at Universit茅 Laval, Queen's University, the University of Ottawa, and 天美mv天美 that extends the Discourse Processing Lab's Gender Gap Tracker to look specifically at French media.
NSERC Discovery Grant Project Summary- Natural language processing for detecting toxic, abusive, and hateful language online
Digital technologies offer incredible power, from artificial intelligence and virtual assistants to social media and recommendation systems. Deploying such technologies in a manner beneficial to both individuals and society is a pressing challenge. In mainstream and social media, content providers welcome feedback; such feedback, however, may be 'toxic': malicious, abusive, or offensive.
Toxic comments and posts online are those that intend to cause harm. They may take the form of personal attacks, abuse, harassment, threats and may include profane, obscene, or derogatory language, with hate speech being the most extreme. In the last few years, I have closely studied online news comments and developed natural language processing (NLP) methods to analyze them. My long-term program of research develops robust methods for text classification in tasks such as sentiment analysis, misinformation detection, and content moderation. In the next few years, my 天美mv天美 laboratory, the Discourse Processing Lab, will continue to study toxic language online, to develop methods and algorithms to detect toxicity automatically. Our work identifying constructive comments, those that contribute positively to an online discussion, has provided excellent insight for how to automatically classify non-constructive and toxic comments.
Current approaches to detecting online toxicity are based either on general text characteristics (word length, text length, capitalization, and punctuation) or on lists of words likely to cause offense. Machine learning approaches (supervised, semi-supervised, or based on neural networks) rely on large annotated datasets, but many studies have shown that such approaches often fail because negativity in language may be wrapped in positive words, through metaphors and other figures of speech. Research, including our own, has found that accurately identifying and filtering toxic content requires a multidisciplinary perspective, drawing on a deep understanding of linguistics and on current methods in NLP and machine learning.
To address existing gaps in the automatic detection of toxic comments, in the next five years I plan to: (Objective 1) study how metaphors and other figures of speech well known since antiquity (euphemisms, litotes, hyperbole, sarcasm) convey toxic language. I will then develop (Objective 2) a system to detect figures of speech automatically, while I integrate into (Objective 3) a new content moderation platform.
The results of this work will mobilize research among scholars interested in evaluative language and the role of media in public discourse, including linguists, computational linguists, and communication and media researchers. At a time when media organizations, social media platforms, and the public are concerned about online abuse, misinformation, and the role of digital technology in politics and society, this project is timely and will make an important contribution to public discourse.
OBVIA Project Summary (English follows French) - Mind the Gap: representation des femmes dans les medias Qu茅b茅cois durant la pandemie de COVID-19
Ce projet vise 脿 analyser les repr茅sentations m茅diatiques des femmes dans les m茅dias qu茅b茅cois durant la pand茅mie de COVID-19 et ce, dans une perspective d鈥櫭﹖ude de genre incluant ses dimensions intersectionnelles. Cette recherche s鈥檃ncre dans une m茅thodologie emprunt茅e aux m茅thodes mixtes, combinant les apports technologiques de l鈥橧ntelligence artificielle aux possibilit茅s de compr茅hension et d鈥檌nterpr茅tation offertes par une analyse th茅matique de contenu. Dans un premier temps, notre recherche s鈥檃ppuie sur les outils intelligents du Gender Gap Tracker (GGT), d茅velopp茅s par la Pre. Maite Taboada ( 天美mv天美) qui quantifie en temps r茅el les repr茅sentations des hommes et des femmes dans les m茅dias canadiens principalement anglophones. Nous utiliserons ces outils dans un contexte francophone afin de quantifier le ratio de repr茅sentation hommes-femmes du 1er janvier 2020 au 1er janvier 2022 dans dix grands m茅dias qu茅b茅cois. Dans un second temps, nous d茅velopperons une fonctionnalit茅 suppl茅mentaire au GGT afin de proc茅der 脿 une classification th茅matique des articles de presse analys茅s dans notre corpus. L鈥檕bjectif ici est d鈥檌dentifier les th猫mes r茅currents en lien avec les logiques de repr茅sentations m茅diatiques des femmes et des hommes. Pour cela, nous voulons d茅velopper une m茅thode d鈥檃pprentissage automatique non-supervis茅e inspir茅e du 芦Topic Modeling禄.
Dans un troisi猫me et dernier temps, nous proc茅derons 脿 une analyse th茅matique de contenu afin d鈥檈xpliquer la signification de ces repr茅sentations m茅diatiques dans une perspective intersectionnelle.
Nos r茅sultats constitueront une base empirique solide 脿 partir de laquelle il nous sera possible d鈥檈ngager une discussion avec les diff茅rentes entreprises de presse qu茅b茅coise et de collaborer ensemble 脿 la construction d鈥檜n espace m茅diatique plus inclusif et auto-critique sur ses propres pratiques. Ce faisant, nous souhaitons participer activement 脿 la prise en compte des enjeux de genre de l鈥檌nclusion et de la diversit茅 dans la production de l鈥檌nformation.
English
The project analyses media representations of women in the Quebec media during the COVID-19 pandemic from an intersectional perspective. This research is grounded in mixed methods approaches, combining the technological contributions of Artificial Intelligence with the possibilities of understanding and interpretation offered by a thematic content analysis. In a first step, our research is based on the intelligent tools of the Gender Gap Tracker (GGT) developed by Professor Maite Taboada ( 天美mv天美) which quantifies in real time the representations of men and women in the main English-language Canadian media. We will use these tools in a French-language context to quantify the ratio of male-female representation from January 1, 2020 to January 1, 2022 in ten major Quebec media sources. In a second step, we will develop an additional functionality for the GGT to carry out a thematic classificationi of the press articles analysed in our corpus. The objective here is to identify recurring themes related to the logic of media representations of women and men. For this, we want to develop an unsupervised machine learning method inspired by topic modeling.
In the third and final step, we will carry out a thematic content analysis to explain the meaning of these media representations from an intersecional perspective.
Our results will provide a solid empirical basis from which it will be possible for us to engage in a discussion with the various Quebec press companies and to collaborate in the construction of a mroe inclusive and self-critical media space. In doing so, we want to actively participatein taking into account the gender issues of inclusion and diversity in the production of information.