Big Data and Big Novels: Text Mining the Prose of the Russian Revolution

Website: http://slavic.ucla.edu/person/sean-griffin/ Principal Investigator(s): Dr. Sean Griffin, Slavic

This course on the literature of the Russian Revolution balances the customary close reading and discussion of texts with “humanities labs” that enable students to experiment directly with new technologies.  The course aims to teach undergraduate students not only traditional literacy (critical thinking, argumentation, writing) but technological literacy as well (programming, text mining, data visualization).

Dr. Sean Griffin and his students will read three major novels (Doctor Zhivago, The Master and Margarita, and We) by major twentieth-century Russian writers (Boris Pasternak, Mikhail Bulgakov, and Evgenii Zamiatin). At the same time, they will learn the basics of topic modeling and data visualization software—digital technologies associated with the increasingly important computer science field known as Machine Learning. Students will still read and enjoy the literature; they will still listen to brief lectures on literary history and textual criticism, and they will still profit from lively class discussions.  But they will also learn to experiment with much larger data sets: not one novel, but hundreds and hundreds of novels. In so doing, they will push the boundaries of what is possible for research in the humanities, while simultaneously acquiring valuable new technological skills that they can take with them beyond the class and out into the world.

CDH is helping with this project by providing topic modeling expertise and training to Sean, cleaning the data set used for the topic modeling, leading the topic modeling and data visualization trainings for Sean’s students, and providing technical support to the students as they work on their research projects.