What is sentimental analysis?

Sentiment analysis is the computational process of extracting the subjective mood (typically on a spectrum from negative to positive) from a particular piece of text. Sentiment analysis uses a wide range of fields from machine learning, natural language processing, text analysis, to biometrics (analysis of people’s behaviors and physical appearance). Sentiment analysis is often used in the corporate world with companies using it to analyze their advertisements and emails. to make sure that they are sharing the right sentiment. It is also often used to analyze online reviews for hotels and products.

Why sentimental analysis?

Carleton’s library had an abundant amount of microfiches from a wide variety of sources. However, it was difficult to find a large dataset that was interesting and relevant to analyze. Thus, we selected the college advertising material. Austin Mason recommended sentiment analysis to us as a potential analyzing strategy. Initially, we wanted to compare and contrast the sentiments of advertising material from different colleges and departments to get a better sense of the colleges’ temperament to themselves.

Process

Initially, we wanted to compare six sections (Introduction, English, chemistry, political science, and art of the college marketing material ) across the four types of college (Public universities, HBCUs, liberal arts colleges, and community colleges). We were able to scan and find all of these for the four types of colleges; however, to do sentiment analysis, you need some form of extractable text from the scanned PDFs. The results from Adobe acrobat were indecipherable.

Here is an example of the scan and below is what Adobe Acrobat produced from this PDF.
Hielory.

Ctrteton College _... lourtdH by tM MlnMIOta Corif~nce of Congrega-
tional Chu(Ches, undff the rwn• of Northa.ld Coll., on Novftftber 14, ·1866.

Preparatory tchool dMIN bepn i.n S.ptetnbttt, 1861, but it_... not til 1810
. . that tlw lln•ttnd James W. Stroftg took olltt • th• Int pt'Niclen , h•
• collea• d ... _...·formed, ind the Int on-ampus building_... begun. It_...
111"C1 1t th• outwt tha aftn on• ye.u formal chutth control thould end, but
throughout lti fo,.u.ttv• yean, th• College received algniftcant support ind
~Ndton from th• Congregational chutthea. AJthou,h .it t. now autonomOUI
and non-eectarian, th• CoU.g• tttp«ta thew hbtorkal tift and gives continuing
ncognition to ·•h•m through m•mbenhip ln the Cound1 for H'&hn Education
of t_h• United Chutth of Chrilt. • .
ly th• fall of 1871, the Nm• of the Collep been changed to honor 1n
8rly b.ntPflCtor, William C.rl~on 9 Charlettown, M•1chuwtts, who earlier
that ynr had bestow.cl I gift of S50.000 on the •ti:ugling young college. At th.
time, lt WII th• largest 1ingl• contribution ever m.,de to a weetem ~liege, and. it
wu ,nade unconditaoruilJy, wjth no design that the.name of the Colleg• thould
be changed The College nuffntly has an endowment of $182 million and IIM'ta
v.iued•t·li,28'million . • • •
Carl•ton hM alway·• been a co-edaational institution. The original graduating
ct.. in 181, wu compoeed of one man and one woman who follow.cl simi~r
academic program., C.rldon's.curttnt enJ'Qllment of 1,800 continun to include
Mar~y ~u•I numbt-n of men and women.

Therefore, we essentially had to decrease the scope of content we were hoping to analyze as the only way we could continue is if we transcribed the files by hand (which is extremely time-consuming and labor-intensive). We ended up going with manual transcription and decided to look at the introduction of four colleges of each type: Carleton College, Howard University, University of California Los Angeles, and Windward College. Each manually-transcribed introduction contains the first 500 or so words found in the microfiche. These transcribed files were then ready for sentiment analysis

We used a pre-trained sentiment analysis model from MonkeyLearn. There are dozens of other sentiment analysis software and tools, but MonkeyLearn was the most accessible as others required training your own model and intense coding that none of us were familiar with.

MonkeyLearn is quite simple for sentiment analysis. We just put the transcribed introductions into the text box where MonkeyLearn gave the percentage of the sentiment (either positive or negative). For further details on how to do sentiment analysis on your text, check out this tutorial.

Picture of MonkeyLearn

An example of the sentiment analysis on the UCLA introduction.

Results

Here are the results from the sentiment analysis:

CollegePercentage
Carleton71.3% Positive
Howard97.3% Positive
UCLA53.7% Negative
Winward98.7% Positive
Results

Carleton, Howard, and Winward have all positive sentiments while UCLA has a neutral to negative sentiment. It makes sense that most colleges have positive sentiments in their advertising as they want people to attend their college. UCLA, however, is negative which does not make much sense. It may be the fact that UCLA is a popular public school that decided not to spend as much money and time on their advertising compared to their contemporaries. This is just a small sample of the text (roughly 500 words), so it makes us wonder if we had transcribed the entire brochure if would have different results. Moreover, we also are interested to know if there is ever a case when you may not want a positive sentiment in your writing.