When I, along with my CEL Scholar colleague Dr. Cora Wigger, describe our CEL project about data literacy and data justice to colleagues, we’re confronted with the same problem: every academic in every discipline defines “data” differently. As a historian and economist, we work with very different kinds of data at first glance. I work with what most would consider qualitative data: primary sources pulled from archival repositories or digitized copies. Cora works with traditional quantitative data: spreadsheets about housing transactions and school finances. However, and especially as my archival sources have increasingly become what I call “nineteenth-century spreadsheets,” we’ve come to understand our data in similar ways. No longer is the categorization of “qualitative” or “quantitative” data so useful to us. Instead, as we explained in our first post, we’re more interested in the ways in which the human origins of data and tools shape research and decision-making based on that data.  

For that reason, our project takes a wide definition of data, including both quantitative and qualitative data. It asserts that researchers and students should evaluate both forms of data similarly in order to understand the extent and limits of the data, which is an essential question informing data literacy. We have investigated many definitions of data literacy to delimit our definition for the purpose of this project. Luckily for us, researchers in education, STEM, and library and information science have increasingly studied how students, teachers, and more define data and how those definitions influence their data literacy. As defined by Wolff et al. (2016), data literacy is “the ability to ask and answer real-world questions from large and small data sets through an inquiry process, with consideration of ethical use of data.” Fontichiaro and Oehrli (2016) instead situate data literacy as a subset of “information literacy.” Different frameworks focus on topics such as the use and interpretation of statistics, ethics, data visualization, selecting, collecting and/or cleaning data, communicating findings, and many other sub-topics. In our research, we’re learning that we can’t take one data literacy approach because they are so context specific. In this and future posts, we’ll outline the framework we are adopting and adapting in order to define data literacy. This post begins with data identification, which we view as the foundational step in building student’s data literacy. 

Data Identification 

Our definition and framework start with the existence of data itself, or the ability to identify data as data. Students should be able to find data, evaluate its quality, evaluate what questions and measures we can use the data for, and what the data can’t tell us. While we are calling this “data identification,” Ridsdale et al. (2015) calls this “data collection.” Many definitions of research and data literacy take into account the multi-step processes involved in collecting data firsthand, such as survey collection (Ghodoosi et al. 2023). However, the modern data-driven world is filled with examples of secondary and administrative data–not originally collected with research in mind–and students are ill-equipped to question these sources. In our research, and in our experiences teaching students to use economic data and historical primary sources in our 1000-level US history surveys, 2000-level Statistics for Decision Making, and 3000-level Economics of Racial Inequality courses, students need to also articulate where data came from, who created it, and why in order to analyze and interpret subsequent research. These skills are what assignments like Cora’s data biographies and primary source analysis methods such as source, contextualize, corroborate, observe (SOCC) analyses (which I often use in my classes), refined by Cate Denial and Bringing History Home, are intended to build. 

Our focus on data identification–evaluating the extent and limits of data–closely aligns with the field of information literacy, an approach that “addresses the broader access and use of data as a source of information” (Kim, Hong, and Evans 2024). It’s not at all surprising that a legal historian and an economist who uses administrative data would borrow from this field. Our research depends on data that was not created for researchers. It was created by a diarist, an antiquarian, or a busy and underpaid bureaucrat who’s just trying to meet federal reporting requirements. Information literacy is rooted in analyzing or critically engaging with data, but as Kim, Hong, and Evans note, it doesn’t prioritize quantitative analysis or skill-building. Instead, it emphasizes “the importance of making data useful and converting data into relevant information” for storing, sharing, and reusing data (Carlson et al. 2011; Storksdieck 2016). Similarly, we want our students to understand how to select and synthesize data before integrating it with other kinds of information.  

Conclusion 

In researching different approaches to data literacy, we’re seeing how many of these approaches are multi-disciplinary and multi-step processes. Likely, there are other sub-fields or disciplines we haven’t yet named, and there are definitely many other related studies we haven’t cited here. We plan for this to be the first post in a series on building the framework of data literacy for this project. Data identification is just step one, laying the foundation for other topics like data transformation and critical data studies.  


References 

Carlson, Jacob, Michael Fosmire, C.C. Miller, and Megan Sapp Nelson. 2011. “Determining Data Information Literacy Needs: A Study of Students and Research Faculty.” Portal: Libraries and the Academy 11(2): 629–57. https://doi.org/10.1353/pla.2011.0022. 

Fontichiaro, Kirstin and Jo Angela Oehrli. 2016. “Why Data Literacy Matters,” Knowledge Quest 44(5): 21–27. 

Ghodoosi, Bahareh, Tracey West, Qinyi Li, Geraldine Torrisi-Steele, and Sharmistha Dey. 2023. “A Systematic Literature Review of Data Literacy Education,” Journal of Business & Finance Librarianship 28(2): 112–27. https://doi.org/10.1080/08963568.2023.2171552

Kim, Jeonghyun, Lingzi Hong, Sarah Evans. 2024. “Toward Measuring Data Literacy for Higher Education: Developing and Validating a Data Literacy Self-Efficacy Scale.” Journal of the Association for Information Science and Technology 75(8): 916–31. https://doi.org/10.1002/asi.24934

Ridsdale, Chantel, James Rothwell, Michael Smit, Hossam Ali-Hassan, Michael  

Bliemel, Dean Irvine, Daniel Kelley, Stan Matwin, Bradley Wuetherick. 2015. Strategies and Best Practices for Data Literacy Education: Knowledge Synthesis Report. http://hdl.handle.net/10222/64578.  

Storksdieck, Martin. 2016. “Critical Information Literacy as Core Skill for Lifelong STEM Learning in the 21st Century: Reflections on the Desirability and Feasibility for Widespread Science Media Education,” Cultural Studies of Science Education 11(1): 167–82. https://doi.org/10.1007/s11422-015-9714-4

Wolff, Annika, Daniel Gooch, Jose J. Cavero Montaner, Umar Rashid, and Gerd Kortuem. 2016. “Creating an Understanding of Data Literacy for a Data-Driven Society.” The Journal of Community Informatics 12(3): 9–26. https://doi.org/10.15353/joci.v12i3.3275


About the Author  

Amanda Laury Kleintop is an assistant professor of history and a 2025–2027 CEL Scholar. She specializes in the US Civil War, Reconstruction, and emancipation. Her book, Counting the Costs of Freedom (2025), explores debates about compensating former enslavers in the US and profitmaking in slavery. It inspired her historical data and digital humanities project on African American soldiers in the Border States.  

How to Cite This Post 

Kleintop, Amanda. 2025. “Defining Data and Data Literacy, Step 1.” Center for Engaged Learning (blog), Elon University. September ##, 2025. https://www.centerforengagedlearning.org/defining-data-and-data-literacy-step-1.