Share

an abstract image of a hand with a network of data points and binary code behind it
Computational Text Analysis & Its Applications

We are awash with data. Eighty percent of that data is unstructured, and that number is growing between 55 and 65% annually. Case in point, in 2022, 500 hours of video content were uploaded to YouTube every minute. Much of this unstructured data is text or “natural language” data, and accounts for approximately three-quarters of all recorded digital data. “Text” includes, but is by no means limited to, websites, blogs, social media posts, research papers, news articles, and transcripts. In this age of AI, that text also includes what Generative AI tools (rather than human authors) compose. 

By its very nature, unstructured data is difficult to search or analyze, and a time-consuming task to parse in its entirety. New technological methods such as Computational Text Analysis (CTA), an umbrella term for various digital tools and quantitative techniques that optimize the power of computers and software, use a method called, “distant reading” or “text mining.” With CTA, Dartmouth researchers can gather vast amounts of unstructured text and examine it all at once, when previously they were limited to “close reading” a single or handful of texts at a time.

Head of Research Facilitation at Dartmouth Libraries, Lora Leligdon, shares that text analysis enables researchers to go beyond individual texts, revealing meaningful trends across vast datasets while connecting scholars across disciplines. When we pair close reading and text mining, we can develop new insights, make different interpretations, and ask new kinds of questions.

Computational Text Analysis empowers scholars to go beyond the constraints of close reading, analyzing large bodies of texts to uncover hidden narratives and emerging patterns.

Lora Leligdon

PARTNERS IN ADVANCING RESEARCH WITH AI

At Dartmouth Libraries, we’re partners in advancing these technologies, including Dartmouth’s homegrown AI chat app, DartmouthChat, which gives students, faculty, and staff access to ten large language models. We also subscribe to research support tools, including Proquest TDM and Transkribus to accelerate Dartmouth research. ProQuest TDM enables text and data mining (TDM) of large datasets from ProQuest's vast collection of content, including newspapers, scholarly articles, dissertations, and other publications. Transkribus uses AI to recognize, transcribe, and search historical documents. It leverages machine learning and artificial intelligence, transforms handwritten and printed text into digital format, and enhances access, search, and analysis of historical texts.

We’re also spearheading initiatives incorporating CTA tools to parse unstructured texts at scale. Research Data Services team members recently launched a community of practice, “Text Analysis in a World of AI” (TAWAI). It aims to blend interdisciplinary communication and collaboration to solve problems while advancing and amplifying scholarship. TAWAI brings together scholars, students, and staff from across Dartmouth, including Scott Sanders, Associate Professor of French, and Salar Khaleghzadegan, MPP and PhD student at The Dartmouth Institute for Health Policy and Clinical Practice, who share an interest in CTA. Participants range from novices to expert practitioners and experienced teachers.

COOPERATIVE RESEARCH TOOLS & METHODS

Each month, TAWAI meet to discuss, examine, and collaboratively experiment with new tools, models, methods, and other developments in this field. Tools range from text databases to out-of-the-box software, text analysis programming packages, and both open-source and proprietary Generative AI tools. Both inter and transdisciplinary, these meetings, by design, offer engaging materials and activities that can be adapted and used by researchers in diverse fields across the sciences, arts, humanities, and vocational fields like engineering and business. Together, participants focus on applying and adapting text analysis methods and tools (integrated with language models) to answer domain-specific research questions, rather than developing the tools or models themselves. This group also explores the ethical considerations and societal impacts of AI-driven text analysis.

Jeremy Mikecz, Research Data Scientist Specialist and lead TAWAI facilitator, shares the benefits of TAWAI.

By bringing together scholars with varied expertise, the group creates a dynamic environment for knowledge exchange and innovation, allowing participants to discover new methodologies and perspectives that can enrich their individual projects.

Jeremy Mikecz

In a recent session, for example, Scott demonstrated his use of large language models such as Claude and GPT-4 to help him process and analyze his corpus of eighteenth-century French theater texts (including playscripts and theater performance schedules). Most interestingly, session participants found Scott’s methods “travel;” that is, others can just as easily apply them to sociology interviews, literary studies of novels, or to analyze social media posts. 

That cross-disciplinary work is part of what excites Salar about being in this group. He writes, “TAWAI has allowed me to connect with experts from different departments at Dartmouth, such as French and Biology, who share similar methodological interests. These interactions have broadened my perspective and opened up opportunities for exciting collaborative projects.” He adds how “the group’s dynamic discussions and exposure to novel research methods keep me at the forefront of emerging trends, enriching my work and inspiring creative approaches to my research projects.” 

These outcomes are just a peek into how TAWAI’s collaborative setting not only facilitates skill development through targeted training sessions, it also fosters the exchange of ideas across disciplinary boundaries.

Emerging tools like AI are quickly becoming integral to all aspects of our lives. TAWAI plays a crucial role in breaking down academic silos at Dartmouth by fostering discussions on how to navigate these tools in both academic and professional contexts.

Salar Khaleghzadegan, MPP

Ultimately, the synergy generated by this diverse community strengthens research outcomes and promotes the responsible advancement of technology in academia. In partnership with our colleagues across Dartmouth, we're excited to be meeting the moment by providing avenues for our communities to incorporate the latest, best practices in AI-enhanced research methodologies.

Back to top