Community Science is investing in strategies that leverage the tools of big data science and predictive analytics to improve health and human service systems, programs, and policies. This work is fundamental to what we are all about. We use “state-of-the-art qualitative and quantitative methods . . . to strengthen the science and practice of community change.” Another core part of our mission is to use our research and evaluation expertise “to build healthy, just, and equitable communities.” So, here’s the interesting rub: When it comes to using big data science and predictive analytics to promote positive community change, we are at the same time very aware of the warnings, admonitions, and proof that predictive analytics can also perpetuate and reinforce institutionalized inequities.

Our work has already led to some very clear guiding principles as we endeavor to bring the big data science tools of predictive and prescriptive analytics together with the social science methods of research and evaluation. There is an excellent book that everyone should read — Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, written by Cathy O’Neil. In this book, O’Neil shares example after example of how big data algorithms are making predictions, decisions, and recommendations with prejudice and bias, and the companies building these algorithms will not share what is in the black box and are not being held accountable for perpetuating systemic and institutional biases. Community Science strongly agrees with O’Neil’s conclusions about the dangers of invisible, unaccountable algorithms, as well as her ethical proposition for requiring that algorithms not be opaque and/or proprietary, and that they are audited, tested, and evaluated in public to assess and address any bias they may promulgate.

As Community Science embarks on this journey to leverage the ever-growing and more robust datasets of organizations, agencies, institutions, and systems as well as big data, we are beginning to lay down some guideposts for making this work impactful as well as a powerful resource for social equity, empowerment, and justice. Those guideposts are as follows:

    1. Algorithms should avoid using a person’s identity as data for making predictions! “Identity data” include things like race, gender, sexual orientation, disability, address/census track, the spelling of one’s name, country of origin, religious affiliations, and/or any other identifying information that either by birth or life circumstance should have NO bearing on whether someone can achieve a desired outcome.

      Unfortunately, when it comes to identity data, they are frequently used by algorithms to predict the likelihood of many outcomes. For example, many juvenile and adult justice algorithms build predictive models for recidivism using race and gender. These algorithms often find that if a person is African American and male, they will be more likely to recidivate, even though they committed the exact same crime as their White peers. We all know it is untrue that being African American and male “causes” a person to commit a subsequent crime. The algorithm is simply identifying an historical pattern of correlation/association within the system. This pattern is NOT a reflection of causation, but rather an indicator of biased decision-making that is getting recorded by system actors. The choice to arrest someone or let them go gets recorded by a police officer, and if the police have a history of arresting more African American males and letting others go for the same crime, then, of course, the algorithm will find the pattern.

      It is critically important to understand that if identity information is a predictor within an algorithm, it is not a reflection of inherent problems within people, but, in fact, it is a canary in the coal mine indicating systemic failures to be fair and just. There is no statistical doubt that predictive algorithms are more accurate when they use identity data, but we all need to understand that in this case, accuracy is not a good thing, but instead is a reflection of system-wide prejudice. If we use identity data to predict the future for any individual, we will be perpetuating unjust biases, and with today’s technological advances, these injustices will get scaled, rapidly.

    2. Algorithms cannot be secret, proprietary black boxes where no one can see what’s going on! Algorithms can inherit the biases of the systems and the human actors from which they derive their data, whether these biases are, in fact, prejudices or simply incorrect assumptions about a presenting situation and what might be best to do about it. It is important to be able to transparently scrutinize algorithms to ensure that the measures selected and the conclusions derived are not giving an unfair advantage to one group over another. There are many algorithms that can share a highly accurate prediction of the likelihood of success and a set of odds-beating recommended actions. However, if we cannot see what it was that led the algorithm to this conclusion, it could be the result of any number of biases inherent to the decision and/or decision-makers who put the data into the machine.
    3. Let’s train algorithms to think like scientists, not humans! The data science world has created powerful algorithms that mirror the ways in which our brains work. IBM’s Watson is one of many examples of deep learning, which is a type of learning algorithm within the data science family of neural networks that are getting embedded in every aspect of our digital lives. These powerful algorithms, like our brains, are incredibly effective at associative and immediate feedback learning. These algorithms are behind everything from facial recognition software to Netflix’s amazingly accurate recommendation software.

These powerful algorithms are doing what our brains do best—learning from patterns of sensory and/or immediate satisfaction experiences in order to effectively make rapid decisions. These tasks require lightning-fast processing of sensory and/or affective/emotional information to predict the likelihood of success from a set of immediate possible actions. Most of these processes occur subconsciously/reflexively, and the effect (or outcome) that is being predicted is instantaneous or immediate. Put another way, our brains and the algorithms that mirror them are great at predicting what will work when the feedback loop of cause and effect is instantaneous or immediate, when all of the cause (input) data derive from sensory information (i.e., eyes, ears, mouth, nose, skin) and when our brains and these algorithms have the “technological” capacity to gather, store, and process the gargantuan amount of sensory data needed to achieve close to 100% accuracy.

When it comes to making life-changing, longer-term decisions—where a deep understanding of all the social, circumstantial, situational, and environmental causes must come together to produce a lasting change—mirroring the human brain is not the way to go! For example, when complex decisions need to be made for a child who has been abused and neglected, in order to preserve the family or find a safe and loving permanent home, the sensory-based tools of rapid predictive learning that are so accurate at recognizing faces, making sure we avoid accidents, etc., will not work. When causes and effects are sensory-based and instantaneously and/or immediately connected, associative and correlational patterns are the same as cause and effect. However, when the causes of a problem are multifaceted, contextual, and relational and the effect may not be measurable for a long time, neural networks—including our brains—can continue to accurately see correlations and associative patterns, but because of the complexity of causes and time span between them and the results, we have learned that correlations are no longer the same as causation. When our brain sees a car stopped in the middle of the highway, we know that the correlation between immediately turning the wheel and avoiding the accident is true cause and effect. However, when our brains try to help a child who has been abused and neglected to either be reunified safely with their family or in another permanent and loving home, what we learn and believe as to what caused the problem as well as what will cause the best outcome are filled with biases because the problem is complicated by so many relational, contextual, environmental, and other external factors, including time, that cannot all possibly be measured. This is why we humans created another algorithm: the scientific method.

The scientific method was developed because scientists understood that the human brain, while super powerful when it comes to immediate associative learning, is terrible at figuring out complex cause and effect. It is grossly biased toward its own limited and subjective (sensory) experiences, and as a result, makes many incorrect conclusions and decisions about the objective world. If we, as a society, were going to advance our cause and effect knowledge about longer-term complex situations, relationships, and environments in an unbiased manner, we need a method to control our brain’s natural predilection for drawing cause and effect conclusions from correlations. It is time to introduce big data science to the scientific method.

The challenge of our big data science generation is that while deep learning and neural network algorithms will help us ride safely in driverless cars, if we use them for more complex social problems, they are instead super-powered bias generators. We need the countervailing force of the scientific method, and machines can and have to be trained on these procedures. Check out this recent Community Science webinar to learn more about how we are training machine-learning algorithms to apply the scientific method in order to rigorously figure out what will work for each child who has entered the child welfare system.

The above guideposts are only a starting place for Community Science, but we believe they put our innovative work on a straight and narrow path that will be ever conscious and vigilant of the potential for algorithmic bias. We believe in and have seen a growing body of evidence as to the power of big data science, predictive and prescriptive analytics, which is why we are committed to incorporating these groundbreaking tools, techniques, and methods into the work we do. We will continue to transparently share our lessons, insights, successes, and failures with our community of colleagues as we integrate these data science tools into our community change work. Stay tuned . . .