Sources

Since our project centers around corpus analysis of the Carletonian, we are currently considering the college’s digital Carletonian archive our main source. We’re unsure of the best citation practice for an entire collection, but the citation format for an object in an archive is something like this:

Author (if applicable) / Title of the item (Rule) / date of the item (Rule) / item number (if applicable) / series title (if applicable) / series number (if applicable) / name of the collection (if applicable) / collection number (if applicable) / name of the depository or archive / location of depository (if applicable) / URL or DOI or name of database (if applicable)

From the Gould Guide for citing archival material in Chicago format

Additionally, we used data from the Google Books Ngram Viewer to compare trends in the Carletonian with trends in English-language books. The Google Books Team prefers that when being cited in academic publications, their citation refers to their original paper published in Science, 2010.

Jean-Baptiste Michel*, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, William Brockman, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden*. Quantitative Analysis of Culture Using Millions of Digitized Books. Science (Published online ahead of print: 12/16/2010)

Format

We’ve been accessing the Carletonian issues so far from the digital archives, where they’re stored as searchable PDFs.  We’re currently going through and downloading the front pages of each of those, since we can definitely use that format at least for the timelapse that we want to create.  We’re not sure yet of the process we’ll need to go through to get xml files from the images (Angie is in touch with Sarah Calhoun about it), but we think that we can either get existing files from the archives or use a tool like Docparser to get them into that format ourselves.

Rights

The Carletonian archives are part of the Carleton College archives, and are owned by the college.  Since we’re student researchers, we should be allowed to use the material housed there (according to the website, “Students, faculty, and other researchers are welcome to use the resources housed in the Carleton College Archives.”)  If we end up using the print archives, there is a registration that we’ll need to complete.

Privacy/Ethics

Students, alumni, faculty, staff, community members, and even non-community members are depicted in the Carletonian one way or another over the years.  Students and faculty are featured more than anybody else (quoted for articles, as the subject of articles, or as contributors to the paper), but most of those people are no longer here.  Regardless, it’s primarily the newspaper’s responsibility to deal with privacy concerns — contributors know that what they put in the paper will become publicly accessible, and editors navigate the ethics of publishing features, opinion pieces, etc. (sometimes controversially).  With that being said, we can also exercise sensitivity in our handling of the data — we’re looking to perform a generic analysis of words, not of individuals, and separate people probably will not emerge in our discussion of results.  We want to look at campus trends in discourse.

We may be analyzing trends in words that deal with delicate issues — race, gender, politics, etc.  We’ve already noticed that some of the earlier issues of the paper use slurs and offensive language.  Since our project is directly concerned with language usage over time, we don’t want to ignore those words, but we don’t want to perpetuate their usage under the guise of ‘research’ either.  If they enter into our project, we’ll censor them; and we won’t be making them a central focus or spotlighting them just for the sake of demonstrating their presence.  Thoughtful discussion of whatever results we get will also be important in navigating this concern.

css.php