How AI tools can help us understand human behavior

Natural language processing tools can help social scientists process larger and longer-term text datasets to keep up with fast-paced digital communication.

A common interest in language unites social scientists and natural language processing (NLP) researchers. While both fields leverage the strong connection between language and behavior, social scientists seek to understand human behavior while NLP researchers aim to predict it.

Leveraging NLP can help social scientists efficiently sift through the abundant digital text data that comes with the Information Age to understand underlying behaviors, according to a comprehensive review published in Nature Human Behavior.

“My lab has been working for many years in close collaboration with psychologists. Exploring the interactions between the two has been the perfect opportunity to build on our decade-long collaborations,” said Rada Mihalcea, the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering at the University of Michigan and corresponding author of the study.

Digital text—like social media posts, text messages or zoom transcripts—provide a window into people’s minds, whether the focus is on an individual, relationship, group or society.

NLP peers into that window, extracting surface-level information like age, relationship or education and even deeper topics including thought patterns, linguistic signals, motives, goals and values. 

“Our methods can detect subtle shifts in people’s use of pronouns (such as I, we, and they), articles (a, an, the), and other forgettable words which reveal changes in people’s psychological states,” said James Pennebaker, a professor emeritus of psychology at the University of Texas at Austin and senior author of the study.

With NLP, the scope of language sources has expanded far beyond what can be scored by hand. For scale, a study differentiated the language of loneliness from depression using 3.4 million Facebook posts while another leveraged 300,000 X posts to understand public opinion surrounding nuclear energy

Beyond an increase in volume, NLP methods provide the ability to track relationships on a previously inaccessible timescale. Tracking 6,800 Reddit users who posted about a breakup, Pennebaker’s team analyzed over 1 million posts from those participants in the year before and after the breakup to understand changing patterns in analytic thinking, cognitive processes, anxiety and self-focus. 

A graphic outlining what social scientists can extract from digital text using natural language processing. Left: A box lists digital text input: Assignments, blogs, digital conversations, live conversations, phone conversations, text messages, Reddit, X (Twitter). Right: Nested rectangles listing output information at each level. The smallest, behavior of the individual: personality and individual differences, values and behaviors, affect, mental health. Medium, interpersonal behavior: morality, ideological behavior, cross-cultural differences. Largest, behavior of the group and society: status and leadership, deception, persuasion, close relationships.
Natural language processing can help social scientists extract behavioral information from individuals, relationships, groups and society.

“We discovered that people’s writing styles changed in the months before the breakup in ways that even the Reddit users didn’t see. Even though many claimed that the breakup ‘came out of nowhere’, their language use suggested that something was going on,” said Pennebaker.

Along with research advancements, NLP also raises issues in data use ethics. Privacy concerns arise, particularly surrounding social media data, if personally identifying information is leaked.

The field of ethical AI is growing, identifying and addressing weak points in the technology. Ultimately, the researchers say, laws must clearly outline how personal data can and cannot be used to protect users. In the meantime, researchers can increase trust in their research by including clear, ethical statements on data permissions and how AI was used. 

Another concern is that over time, NLP models become more of a ‘black box’—meaning it is difficult for a human to trace the dots from model input to output. The lack of transparency can become a problem for social scientists who want to understand which language cues within the large language model link to certain behaviors. With mindfulness of drawbacks, NLP has a lot to offer social scientists as they distill information from digital text. 

While the review focused on how NLP can make advances in social sciences, those advances can then feed back into improving NLP, creating a “virtuous cycle” of discovery.

“I believe we are at a point where NLP, and more broadly AI, can benefit in the same way if not more from the findings in human behavior. It’s an exciting time for the intersection between these two fields,” said Mihalcea. 

Middlebury College, University of Texas at Dallas, Max Planck Institute for Intelligence Systems and Oakland University also contributed to this research.

Full citation: “How developments in natural language processing help us in understanding human behaviour,” Rada Mihalcea, Laura Biester, Ryan L. Boyd, Zhijing Jin, Veronica Perez-Rosas, Steven Wilson, and James W. Pennebaker, Nature Human Behaviour (2024). DOI: 10.1038/s41562-024-01938-0

Explore:
Rada Mihalcea; Research News; Veronica Perez-Rosas