AI Seminar
Language Use in Written Dialog: Beliefs, Gender, Social Networks, and Power
Add to Google Calendar
Email remains an ubiquitous form of communication, but studying it, and developing computational tools for it, is restricted for academic research by the paucity of corpora. The major available corpus is the Enron corpus, which dates to 2001. There have been many studies based on the Enron corpus, leading in some quarters to "enron fatigue" . However, surprisingly few studies have treated the Enron data as what it is: written dialog. I will argue that the corpus is still a rich und largely untapped source of insights for sociolinguists and computational linguists.
I will present a series of studies performed at Columbia University whose aim is to understand how linguistic choices in dialog are affected by various aspects of the communicative setting, such as beliefs, gender, power, and the underlying social network. Specifically, we have investigated how power relations affect linguistic choices, both lexical choices and choices in terms of dialog acts. We see clear differences in language use between people in power and people without power. These differences allow us to predict who has power in a dialog. We have asked how this power-related behavior changes when we consider the gender of the discourse participants. We have found profound differences in language use between men and women in power. I will also discuss how power interacts with the degree of commitment that discourse participants express in propositions: as expected, subordinates report more more non-committed beliefs, and report more beliefs of others than bosses. A further study relating to power in written conversation considers how social networks relate to power relations. We find that when we take the content of emails into account, we can make better predictions about power relations than if we only use meta-data (as has often been done in the literature). Finally, I will report on ongoing work to distinguish personal email from professional email. We find that the social network helps us find personal emails, and I will report results from training on the Enron corpus and testing on a recently released corpus of emails, the Avocado corpus.
Owen Rambow is a Senior Research Scientist in the Center for Computational Learning Systems, Columbia University. He
holds a PhD from the University of Pennsylvania and has previously worked at AT&T Research. His area of research is
Natural Language Processing, in particular the areas of formal and computational models of syntax and other levels of
linguistic representation.