Pechenick, E. A., Danforth, C. M. & Dodds, P. S. Characterizing the Google Books Corpus: strong limits to inferences of socio-cultural and linguistic evolution. PLoS ONE 10, e0137041 (2015).
Dietrich, B. J., Hayes, M. & O’Brien, D. Z. Pitch perfect: vocal pitch and the emotional intensity of congressional speech. Am. Polit. Sci. Rev. 113, 941–962 (2019).
Dietrich, B. J. Using motion detection to measure social polarization in the U.S. House of Representatives. Polit. Anal. 29, 250–259 (2021).
Michel, J.-B. et al. Quantitative analysis of culture using millions of digitized books. Science 331, 176–182 (2011). In this study, 4% of all books that have been published were digitized and used to examine changes in phonology, word use and the adoption of new technologies over long periods of time.
Merton, R. K. in Social Theory and Social Structure 39–72 (Free Press, 1968).
Watts, D. J. Everything Is Obvious: Once You Know the Answer (Crown Business, 2011).
Simon, H. A. Bandwagon and underdog effects and the possibility of election predictions. Public Opin. Q. 18, 245–253 (1954).
Mutz, D. C. Impersonal Influence in American Politics (Cambridge Univ. Press, 1998).
Westwood, S. J., Messing, S. & Lelkes, Y. Projecting confidence: how the probabilistic horse race confuses and demobilizes the public. J. Polit. 82, 1530–1544 (2020).
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Crown, 2016).
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
Landsberger, H. A. Hawthorne Revisited (The New York State School of Industrial and Labor Relations, 1958).
Mayo, E. The Human Problems of an Industrial Civilization (Routledge, 2004).
Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google Flu: traps in big data analysis. Science 343, 1203–1205 (2014). This paper shows that the increasing over-prediction of flu prevalence of Google Flu Trends was largely the result of changes to Google’s search algorithm, which altered the terms that people used to find flu-related information.
Brunton, F. & Nissenbaum, H. Obfuscation: A User’s Guide for Privacy and Protest (MIT Press, 2015).
Davis, D. W. The direction of race of interviewer effects among African-Americans: donning the Black mask. Am. J. Pol. Sci. 41, 309–322 (1997).
American National Election Studies. 1978 Time Series Study https://electionstudies.org/wp-content/uploads/2018/03/anes_timeseries_1978_qnaire_post.pdf (1978).
Salganik, M. J. Bit by Bit: Social Research in the Digital Age (Princeton Univ. Press, 2017).
Patty, J. W. & Penn, E. M. Analyzing big data: social choice and measurement. PS Polit. Sci. Polit. 48, 95–101 (2015).
Kraemer, M. U. G. et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science 368, 493–497 (2020).
Jia, J. S. et al. Population flow drives spatio-temporal distribution of COVID-19 in China. Nature 582, 389–394 (2020).
Badr, H. S. et al. Association between mobility patterns and COVID-19 transmission in the USA: a mathematical modelling study. Lancet Infect. Dis. 20, 1247–1254 (2020).
Munger, K. The limited value of non-replicable field experiments in contexts with low temporal validity. Soc. Media Soc. 5, 1–4 (2019).
Deaton, A. & Cartwright, N. Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210, 2–21 (2018).
Vraga, E. K., Bode, L., Smithson, A.-B. & Troller-Renfree, S. Accidentally attentive: comparing visual, close-ended, and open-ended measures of attention on social media. Comput. Human Behav. 99, 235–244 (2019).
Guess, A., Munger, K., Nagler, J. & Tucker, J. How accurate are survey responses on social media and politics? Polit. Commun. 36, 241–258 (2019).
Aleta, A. et al. Modelling the impact of testing, contact tracing and household quarantine on second waves of COVID-19. Nat. Hum. Behav. 4, 964–971 (2020).
Echeverría, J. et al. LOBO: evaluation of generalization deficiencies in Twitter bot classifiers. In Proc. 34th Annual Computer Security Applications Conference 137–146 (ACM, 2018).
Ferrara, E., Varol, O., Davis, C., Menczer, F. & Flammini, A. The rise of social bots. Commun. ACM 59, 96–104 (2016).
Hughes, A. G. et al. Using administrative records and survey data to construct samples of Tweeters and Tweets. Public Opin. Q. https://doi.org/10.1093/poq/nfab020 (2021).
Napoli, P. M. Audience Evolution: New Technologies and the Transformation of Media Audiences (Columbia Univ. Press, 2011).
Yang, T., Majó-Vázquez, S., Nielsen, R. K. & González-Bailón, S. Exposure to news grows less fragmented with an increase in mobile access. Proc. Natl Acad. Sci. USA 117, 28678–28683 (2020). This study tracked the news consumption of users across mobile and desktop devices and found that most individuals do not self-sort their news consumption by partisanship but, instead, consume news from a diversity of sources including partisan and nonpartisan ones.
Haythornthwaite, C. Exploring multiplexity: social network structures in a computer-supported distance learning class. Inf. Soc. 17, 211–226 (2001).
Campbell, K. E. & Lee, B. A. Name generators in surveys of personal networks. Soc. Netw. 13, 203–221 (1991).
Wagner, C. Measuring algorithmically infused societies. Nature https://doi.org/10.1038/s41586-021-03666-1 (2021).
Healy, K. The performativity of networks. Eur. J. Sociol. 56, 175–205 (2015).
Rahwan, I. et al. Machine behaviour. Nature 568, 477–486 (2019).
Neuendorf, K. A. The Content Analysis Guidebook (Sage, 2017).
Davidov, D., Tsur, O. & Rappoport, A. Semi-supervised recognition of sarcasm in Twitter and Amazon. In Proc. 14th Conference on Computational Natural Language Learning 107–116 (Association for Computational Linguistics, 2010).
Groves, R. M. Nonresponse rates and nonresponse bias in household surveys. Public Opin. Q. 70, 646–675 (2006).
Hargittai, E. Potential biases in big data: omitted voices on social media. Soc. Sci. Comput. Rev. 38, 10–24 (2020). Using survey data, this study finds that younger, wealthier and more technically skilled people tend to use social media and that there were substantial gender and education differences in which platforms people used.
Lazer, D. & Radford, J. Data ex machina: introduction to big data. Annu. Rev. Sociol. 43, 19–39 (2017).
Correa, T. & Valenzuela, S. A trend study in the stratification of social media use among urban youth: Chile 2009–2019. J. Quant. Descr. Digit. Media 1, https://doi.org/10.51685/jqd.2021.009 (2021).
Mellon, J. & Prosser, C. Twitter and Facebook are not representative of the general population: political attitudes and demographics of British social media users. Res. Polit. 4, 1–9 (2017).
Beisch, N. & Schäfer, C. Internetnutzung mit großer Dynamik: Medien, Kommunikation, Social Media. AS&S https://www.ard-werbung.de/media-perspektiven/fachzeitschrift/2020/detailseite-2020/internetnutzung-mit-grosser-dynamik-medien-kommunikation-social-media/ (2020).
Hargittai, E. & Litt, E. The Tweet smell of celebrity success: explaining variation in Twitter adoption among a diverse group of young adults. New Media Soc. 13, 824–842 (2011).
Henrich, J., Heine, S. J. & Norenzayan, A. Most people are not WEIRD. Nature 466, 29 (2010).
Wang, W., Rothschild, D., Goel, S. & Gelman, A. Forecasting elections with non-representative polls. Int. J. Forecast. 31, 980–991 (2015).
Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B. & Lazer, D. Fake news on Twitter during the 2016 U.S. presidential election. Science 363, 374–378 (2019).
Bakshy, E., Messing, S. & Adamic, L. A. Exposure to ideologically diverse news and opinion on Facebook. Science 348, 1130–1132 (2015).
Meng, X.-L. Statistical paradises and paradoxes in big data (I): law of large populations, big data paradox, and the 2016 US presidential election. Ann. Appl. Stat. 12, 685–726 (2018).
Hargittai, E., Füchslin, T. & Schäfer, M. S. How do young adults engage with science and research on social media? Some preliminary findings and an agenda for future research. Soc. Media Soc. 4, 1–10 (2018).
Blumenstock, J. Don’t forget people in the use of big data for development. Nature 561, 170–172 (2018).
Battle-Baptiste, W. & Rusert, B. (eds) W. E. B. Du Bois’s Data Portraits: Visualizing Black America (Princeton Architectural Press, 2018).
Siegel, A. A. et al. Trumping hate on Twitter? Online hate speech in the 2016 US election campaign and its aftermath. Quart. J. Polit. Sci. 16, 71–104 (2021).
Allen, J., Howland, B., Mobius, M., Rothschild, D. & Watts, D. J. Evaluating the fake news problem at the scale of the information ecosystem. Sci. Adv. 6, eaay3539 (2020).
Foucault Welles, B. On minorities and outliers: the case for making big data small. Big Data Soc. 1, 1–2 (2014).
Newman, M. E. J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 46, 323–351 (2005).
González-Bailón, S. Decoding the Social World: Data Science and the Unintended Consequences of Communication (MIT Press, 2017).
Stopczynski, A. et al. Measuring large-scale social networks with high resolution. PLoS ONE 9, e95978 (2014).
Lazer, D. Studying human attention on the Internet. Proc. Natl Acad. Sci. USA 117, 21–22 (2020).
Aral, S. & Eckles, D. Protecting elections from social media manipulation. Science 365, 858–861 (2019).
Puschmann, C. & Burgess, J. The politics of Twitter data. HIIG Discussion Paper Series No. 2013-01 http://www.ssrn.com/abstract=2206225 (2013).
Chen, W. & Quan-Haase, A. Big data ethics and politics: toward new understandings. Soc. Sci. Comput. Rev. 38, 3–9 (2020).
Breuer, J., Bishop, L. & Kinder-Kurlanda, K. The practical and ethical challenges in acquiring and sharing digital trace data: negotiating public–private partnerships. New Media Soc. 22, 2058–2080 (2020).
Zook, M. et al. Ten simple rules for responsible big data research. PLOS Comput. Biol. 13, e1005399 (2017).
Greenberg, A. An absurdly basic bug let anyone grab all of parler’s data. Wired (12 January 2021).
Valentino-DeVries, J., Singer, N., Keller, M. H. & Krolik, A. your apps know where you were last night, and they’re not keeping it secret. The New York Times https://www.nytimes.com/interactive/2018/12/10/business/location-data-privacy-apps.html (10 December 2021).
Sweeney, L. Simple demographics often identify people uniquely. Privacy Working Paper 3 https://dataprivacylab.org/projects/identifiability/paper1.pdf (Carnegie Mellon University, 2000). Using census data, this paper shows that 87% of the US population could be uniquely identified by date of birth, postal code and gender; demonstrating the ease with which study respondents can be re-identified from ostensibly anonymous data.
Wood, A. et al. Differential privacy: a primer for a non-technical audience. Vanderbilt J. Entertain. Technol. Law 21, 209–276 (2019).
Dwork, C. & Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2013).
King, G. & Persily, N. A new model for industry–academic partnerships. PS Polit. Sci. Polit. 53, 703–709 (2020).
Bruckman, A., Luther, K. & Fiesler, C. in Digital Research Confidential: The Secrets of Studying Behavior Online (eds Hargittai, E. & Sandvig, C.) 243–258 (MIT Press, 2015).
Marwick, A. E. & boyd, d. Networked privacy: how teenagers negotiate context in social media. New Media Soc. 16, 1051–1067 (2014).
Bieber, F. R., Brenner, C. H. & Lazer, D. Finding criminals through DNA of their relatives. Science 312, 1315–1316 (2006).
Zheleva, E. & Getoor, L. To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. In Proc. 18th International Conference on World Wide Web 531–540 (2009).
Miller, G. As U.S. election nears, researchers are following the trail of fake news. Science (26 October 2020).
Merton, R. K. The self-fulfilling prophecy. Antioch Rev. 8,193–210 (1948).