Sources
Sentiment Analysis Graphs:
The text corpus I analyzed during this process was the Carletonian provided by the Carleton College Archives. I used the TextBlob Python library to conduct this analysis, and frequently referenced the TextBlob documentation while completing this part of my project.
Word Frequency Graphs:
The data for these graphs comes from the Carletonian text files provided by the Carleton College Archives. I used RStudio to process and prepare these files for analysis, and Flourish to create the interactive visualizations.
Methods
Sentiment Analysis Graphs:
The data used to create these plots was gathered using the TextBlob Python library, which includes a sentiment analysis tool. (Note: this tool does not make use of language models, rather the library uses a lexical approach to assign a sentiment score to each token contained in the text corpus and averages the score). First, I looped over every publication in the dataset and created a polarity and subjectivity score for both. I then created a dataframe to store this data, which made it easy to aggregate and average the scores by year to create a new dataset of yearly averages. I then used the python matplotlib library to graph the polarity and subjectivity scores over the entire time range, producing the plots that you can see above.
Word Frequency Graphs:
All data cleaning, processing, and analysis were carried out using RStudio. I began by loading the raw Carletonian text files and extracting their dates from the filenames. I then cleaned the text by converting all characters to lowercase, removing punctuation and extra whitespace, and tokenizing the text into individual words. I removed English stopwords and filtered out non-English words using the hunspell dictionary. From this cleaned dataset, I generated several analytic summaries, including yearly word frequencies, the top positive and negative sentiment words per year, TF-IDF scores to identify uniquely frequent words in each year, and lists of words appearing for the first time in the Carletonian. For the visualizations, I reshaped the data into the wide formats required by Flourish before exporting them as CSV files. After importing these datasets into Flourish, I made only minor adjustments related to labeling, layout, and styling.
Data Visualization Analysis
Sentiment Analysis Graphs:
The first graph plots the average polarity of the Carletonian over time. This is really a “sentiment” score, where a positive polarity corresponds to positive sentiment and a negative polarity corresponds to a negative sentiment. First of all, it is important to note that the sentiment scores are overall quite low. As the Carletonian is a newspaper, authors try to maintain a consistent style and a neutral tone. This accounts for the relatively low variation in sentiment, although we can notice certain trends. For example, the sentiment score starts off fairly high at the paper’s inception, before trending downward to reach an all time low around the year of 1969. Without looking at the exact content of the Carletonian during this time, various reasons may account for this. During this period of time, Vietnam War protests were in full swing, and the LGBTQ rights movement was ignited after a police raid of the Stonewall Inn. It makes sense that sentiment was possibly low among college students at this time. Since then, sentiment has been on the rise, with a notable spike in the early 2000s. This spike showcases some of the limitations of lexicon-based sentiment analysis – this was possibly caused by the use of the term “The Great Recession,” which possibly causes inflated sentiment scores due to the usage of the word “great.”
The second graph shows average yearly subjectivity as determined by TextBlob, again using a lexicon-based approach. Higher scores here indicate a greater degree of subjectivity, while lower scores indicate a lesser degree of subjectivity (more objective text). While the overall trends appear similar to the preceding sentiment graph, there are some notable differences. We can see in the late 1800s, subjectivity was fairly high. Throughout the early 20th century, the ideal of objective journalism was widely promoted, which is reflected in the Carletonian as well. Subjectivity was at its lowest around 1946. Since then, the subjectivity of the Carletonian has been on the rise, which is again reflective of wider trends in journalism in the age of the internet and now social media, where it is easier than ever to publish information that is not always completely objective.
Word Frequency Graphs:
These visualizations are primarily exploratory, allowing readers to interact with the data and notice patterns across more than a century of Carletonian issues. The top-20 words by year show the expected dominance of campus-specific terms—such as Carleton and college—which appear consistently throughout the dataset and reflect the newspaper’s focus on institutional life. The TF-IDF chart of uniquely frequent words each year and the visualization of words appearing for the first time in the Carletonian provide more unexpected and amusing insights. Some years include surprising or context-specific vocabulary—warthogs in 1964, for example, or gingers appearing for the first time in 2025. These visualizations highlight the ways language shifts in response to cultural moments, student interests, and broader historical events. While not all patterns have a clear explanation, the interactivity of the charts encourages exploration and invites readers to discover small linguistic oddities and trends across Carleton’s history.
Presentation Choice
Sentiment Analysis Graphs:
As I am already familiar with plot creation using matplotlib in Python, and I already had the data collected within a pandas dataframe, this seemed like the obvious choice. I had to change some of the default values, such as adding a more informative title and more informative labels on the x and y axes, however this was not too difficult. Throughout the process of creating these visuals, I wanted to make sure that my plots were as readable as possible. Because of this, I wanted to ensure that I used simple style choices that ensured the data and trends took center stage.
Word Frequency Graphs:
For the TF-IDF results and the top 20 words by year, I created bar chart races because these datasets show gradual shifts over time, and animating the changes makes it easier to follow how certain words rise or fall in prominence from one year to the next. In contrast, the other visualizations use static bar charts paired with sliders. These datasets change so dramatically from year to year that a bar chart race would not have been meaningful or readable. Instead, the interactive controls allow viewers to explore each year independently and uncover interesting patterns at their own pace. Overall, the presentation choices reflect the exploratory nature of the project, emphasizing clarity while still encouraging discovery.

