Emerging Research Methodologies in the Age of Artificial Intelligence and Big Data
Emerging Research Methodologies in the Age of Artificial Intelligence and Big Data
By Richard Amoako
As a doctoral student of Evaluation, Statistics, and Methodology (ESM), I am constantly immersed in a world of evolving research methods. Advanced technologies and artificial intelligence (AI) have brought significant shifts to our research space, especially influencing how data is collected, analyzed, and reported. Methodological adaptations prompted by digital advancements shape how researchers address complex questions across disciplines.
![](https://cehhs.utk.edu/elps/wp-content/uploads/sites/9/2024/12/H-Shot1_1-1-683x1024.jpg)
Hello! I’m Richard D. Amoako, a third-year doctoral student in the ESM program at the University of Tennessee, Knoxville. In this post, I delve into some research methods and methodologies that are emerging in education and the broad social sciences. By emphasizing methodologies central to my studies, I hope to showcase how technological advancements reshape research. I will start with a discussion contrasting traditional and emerging methods, proceed to an in-depth exploration of Internet Data Mining, and conclude with challenges, ethical considerations, and a look at the future of these exciting developments.
Traditional vs. Emerging Methodologies
The research space has changed significantly in recent years (Selwyn, 2014). Traditional methods such as cross-sectional studies, survey research, longitudinal research, randomized controlled trials, and qualitative interviews have long been the backbone of social science research. These methods have provided valuable insights into human behavior, social phenomena, and educational outcomes. However, the advent of Big Data, AI, and internet-based research has introduced dynamic alternatives that adapt to the digital age’s unique demands and possibilities.
Emerging methodologies like Data-Driven and AI-enhanced methods, including Natural Language Processing (NLP), Adaptive Research Designs, Computational Ethnography, Crowdsourced Data Collection, publicly accessed internet data mining, and multimodal research—reflect a shift towards interdisciplinary, diverse datasets and real-time data analysis. NLP, for example, facilitates the analysis of massive datasets, transforming qualitative data analysis through machine learning. Adaptive research designs adjust based on real-time inputs, an advantage that enables iterative improvements, particularly beneficial in health and education. Computational ethnography offers new ways to analyze digital behavior and cultures, making it possible to study online communities on platforms like Reddit or Twitter. Multimodal research combines data from diverse sources—such as text, images, audio, video, physiological signals, and gestures- enabling researchers to gain a richer, more complete understanding of a phenomenon.
Furthermore, crowdsourced data collection and citizen science projects tap into citizen participation, gathering data from thousands of individuals quickly, enabling massive-scale studies that would be excessively costly or impractical using traditional methods. Collectively, these methodologies represent an evolving toolkit for researchers who seek to explore complex phenomena in real-world contexts beyond traditional controlled environments. They not only increase the volume of data available but also democratize the research process, allowing non-scientists to contribute to scientific endeavors.
These emerging methods have immense potential but also present some challenges. AI models, such as NLP, often lack transparency, making it hard to understand how they generate decisions or insights, which can undermine trust. Additionally, big data from sources like crowdsourcing might not always be representative, introducing biases that can limit the accuracy and applicability of the results.
To read more about these methods, I have included some helpful resources at the end of this post for your reference.
Deep Dive into Public Interest Data Mining Methods
In this age of digital data abundance, Public Internet Data Mining stands out as a potent research methodology with broad applications across fields like education, technology, and the social sciences. I came across this research approach from one of my readings in my educational data science foundation class. A notable paper that utilized this approach is by Kimmons and Veletsianos, (2018). They examined the use of public internet data mining to analyze trends and patterns in online interactions by collecting data from public websites, social media, and forums. Their study highlighted how researchers can work with large datasets by employing tools such as SQL queries, web scraping, or APIs (Application Programming Interfaces) to extract and analyze data from digital platforms.
Public internet data mining opens new avenues for research by enabling researchers to gather large quantities of data from diverse public platforms. For instance, using Python or R, a researcher might automate the extraction of public data, such as tweets or YouTube comments, to examine trends in educational attitudes or analyze discussions surrounding public policies. In one of their studies, Kimmons and Veletsianos (2016) demonstrated how they extracted data from K-12 websites and social media to analyze technology use patterns and engagement in online discussions.
Here, I share how to use web scraping and web-based API query in R to extract data from publicly accessible websites using platform-provided APIs to access data in a structured manner.
Find other examples here.
For detailed information or training on using R’s rvest package for web scraping, visit here. For an SQL query in R, see here.
In addition to its flexibility, public data mining allows researchers to conduct both quantitative and qualitative analyses, surpassing traditional methods through automated processing and the ability to uncover complex patterns across massive data sets. This method makes it possible to quantify social media engagement metrics as demonstrated by Kimmons, et al. (2017a, 2017b) where they examine higher education institutions’ Twitter activity. With its applicability to social sciences, internet data mining enables real-time monitoring of public sentiment or policy impacts, adding valuable insights that traditional methods may overlook. Through extensive data sets, this approach facilitates exploring subpopulations, such as by analyzing student engagement with educational content on different platforms to identify their engagement patterns and interests. Unlike traditional methods where data collection might influence participant behavior, public internet data mining allows researchers to observe and analyze behaviors and interactions as they occur naturally in online spaces.
Challenges and Ethical Considerations
Ethical concerns present a more profound challenge, especially when working with sensitive data that may reveal personal information. Even if the data is publicly available, researchers face dilemmas about privacy and potential harm to participants. While most internet users might not expect their public posts to be aggregated for research, such practices can inadvertently expose them to risks. For example, a study analyzing sentiments toward educational policies could inadvertently expose identities of data from specific school districts or teachers if used without careful anonymization. As Kimmons and Veletsianos (2018) note, although such data may not be classified as “human subjects research” (p.498) by conventional ethical standards, it can nonetheless influence or harm individuals if used irresponsibly. Other challenges include the potential for bias in the data, concerns about data quality, legal issues, and a risk of over-reliance on algorithms and automated tools for data collection and analysis.
Despite their benefits, these emerging methodologies including internet data mining raise significant challenges, primarily around the expertise required and ethical concerns associated with handling large datasets. These methods demand proficiency in various technical skills—such as coding, database management, and API handling—that may be unfamiliar to many researchers. Kimmons and Veletsianos (2018) argue that without interdisciplinary collaboration, researchers may struggle to perform the necessary technical tasks or interpret findings in the appropriate context. For instance, my own experience trying to analyze large-scale social media data highlighted the steep learning curve associated with data cleaning and storage.
Conclusion
Emerging research methodologies in the digital age are remodeling the research space, allowing us to explore real-world phenomena with unprecedented depth. Public internet data mining exemplifies how technology enables the collection and analysis of vast datasets, supporting new ways to examine complex questions in education and beyond. As we integrate these methods into our work, it is crucial to consider the ethical implications and recognize the limitations inherent in using automated and large-scale methods.
As we look to the future, it’s clear that these methodologies will continue to evolve alongside technological advancements. Artificial intelligence and machine learning are likely to play an increasingly significant role in research, potentially automating more aspects of the research process and uncovering patterns that human researchers might miss. However, developing the research methodology of the future relies on our ability to use these innovations thoughtfully, responsibly, and inclusively. By embracing these tools, researchers in all fields can explore vast new territories of knowledge while contributing to ethical practices that respect individual privacy and integrity. I hope that other researchers will be inspired to explore these methodologies and engage critically with the ethical considerations they entail, ultimately contributing to a more inclusive and data-informed research ecosystem.
Resources
Abramson, C. M., Joslyn, J., Rendle, K. A., Garrett, S. B., & Dohan, D. (2018). The promises of computational ethnography: Improving transparency, replicability, and validity for realist approaches to ethnographic analysis. Ethnography, 19(2), 254-284. https://doi.org/10.1177/1466138117725340
Brooker, P. (2022). Computational ethnography: A view from sociology. Big Data & Society, 9(1). https://doi.org/10.1177/20539517211069892
Dataquest. (2020). R API tutorial: Getting started with APIs in R. Retrieved from https://www.dataquest.io/blog/r-api-tutorial/
Javaid, S. (2024). Crowdsourced data collection benefits & best practices. AI Multiple Research. Retrieved from https://research.aimultiple.com/crowdsourced-data/ Keyes, D. (2021). How to Scrape Data with R. https://rfortherestofus.com/2021/04/how-to-scrape-data-with-r/
Kimmons, R., & Veletsianos, G. (2018). Public internet data mining methods in instructional design, educational technology, and online learning research. TechTrends, 62(5), 492–500. https://doi.org/10.1007/s11528-018-0307-4
Ofosu-Ampong, K. (2024). Artificial intelligence research: A review on dominant themes, methods, frameworks, and future research directions. Telematics and Informatics Reports, 14, 100127. https://doi.org/10.1016/j.teler.2024.100127
Selwyn, N. (2014). Data entry: towards the critical study of digital data and education. Learning, Media and Technology, 40(1), 64–82. https://doi.org/10.1080/17439884.2014.921628
Stryker, C., & Holdsworth, J. (2024). What is NLP (natural language processing)? IBM. Retrieved from https://www.ibm.com/topics/natural-language-processing
Urban Institute. Education Data Portal: https://educationdata.urban.org/documentation/schools.html
YouTube Tutorials
Dean Chereden, How to GET data from an API using R in RStudio: https://www.youtube.com/watch?v=AhZ42vSmDmE
APIs for Beginners 2023 – How to use an API: https://www.youtube.com/watch?v=WXsD0ZgxjRw&t=39s