Big, Smart, Human Data

Note: This post was written by a student in the course and reposted here. Source: Big, Smart, Human Data.

Culturally, intuitively, and even scientifically, there’s a gap between the humanities and data. We are so dedicated to the separation between science (data) and the arts (humanities), we trace the dichotomy all the way to the different hemispheres of the human brain (Roopika Risam touches on this in Chapter 5 of her book New Digital Worlds). Digital humanities poses the question of whether we have such a hard time conceptualizing humanities data because all of our other methods for dealing with data are so far removed from what we understand to be human. We know data to be computer algorithms and data frames that are so large they are practically inaccessible to us except through the use of convoluted code (or maybe that’s just me as a student of economics).

A diary page or, with a quick mental shift, a piece of humanities data before its been digitized. Source: Wikimedia Commons

Before learning to look at the world through a digital humanities lens, I wouldn’t have considered Macbeth to be data. I never would have considered that my social media feeds function as archives. And I might have even scoffed at the idea that a diary page could be digitized and studied through computational analyses (or instead, I would have scoffed at the idea that that should happen). Ventures like the  Dear Data project show how full our lives are with meaningful events waiting to be recorded and organized (the number of times we look at a clock, or the number of times we complain in a given week, for example). The reality is that these exact forms of data have an incredible potential to be organized and utilized for humanistic inquiry. One of the most important objectives of digital humanities itself is answering the theoretical, methodological, social and technical questions that arise when data represents so much more than numbers in a vector. 

Two postcards cataloguing a week full of complaints from the Dear Data Project.Source: Dear Data Project

Christof Schöch met part of this question head-on in the article, “Big? Smart? Clean? Messy? Data in the Humanities.” In the article, the author introduces the nuts and bolts of humanities data (structured and unstructured, or discrete and continuous) and presents two core types of data: big data and smart data. Big data adheres to the quantity over quality mentality and is used to collect a lot of data to enable far-reaching comparison. Smart data instead is data that is more structured and intentionally organized. 

Perhaps the most pertinent distinction between the two types of data for my purposes is the human element of smart data. Computers are smart, but they’re smart because we tell them how to be so. Working with big data offers huge advantages when it comes to sample size and predictive capacity for analysis but it does shut out many of the complexities or nuances that are uncovered when real human brains get to work with smaller amounts of data. We can synthesize and  contextualize, but algorithms can’t do that quite yet (at least not in the same way). In the aforementioned article, Schöch—who showed a preference for smart data in his own research—came to the conclusion that humanities data should fall somewhere in the middle.

Intuitively this makes sense; humanistic data should not be used and analyzed in such a way that completely strips away the human elements of collection, organization, analysis, and education. Humanistic data analysis can’t and shouldn’t be completely mechanized. Instead, working with humanistic data to explore the answers to humanistic questions provides even more insights into the many ways digitization shapes us and vice versa.

Digital Humanities has two guiding principles: using computational tools to further explore humanistic questions, and asking humanistic questions about computational tools and the online universe. Discussions of data in digital humanities should also be two-pronged: How can data analysis shape the humanities? And how can the humanities help us to better understand data analysis? The majority of this post is dedicated to the former, the rest is dedicated to the latter. 

In statistics it is understood that individual observations come from a distribution of many observations. One sample isn’t going to be able to communicate everything about its population—a single sample just is not complex enough to do the job. In statistics, there are countless hoops in the form of p-values and F-statistics that statisticians have to jump through to make mathematically sound assumptions and predictions about a population from insights learned from a sample. What if the same care (a type of academic reverence, almost) was given to humanistic data? Humanistic—and specifically cultural heritage—data deserves the same amount of respect for the complexities inherent in both individual observation (read: human being) and population (read: culture). Data that express beliefs and traditions of a culture are precious. Even data that tracks how many times a person walks through a door in a week is precious because humanistic data captures something about humanity (which, after all is one of our most precious collective possessions). 

By connecting the idea of data with the concept of humanity, I think that we’re one step closer to using data as a tool for answering humanistic questions and not as a means by which to isolate and reject any human element from analysis. 


Dear Data. “THE PROJECT.” Accessed October 15, 2020.

“Big? Smart? Clean? Messy? Data in the Humanities Journal of Digital Humanities.” Accessed October 15, 2020.

Risam, Roopika. New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis, and Pedagogy. Evanston, Illinois: Northwestern University Press, 2019.