September 12, 2010

Stereotypically Wrong

OkTrends, the blog and data crunching arm of the dating site OkCupid, came out with a hot post on race and stereotypes. Working with the self-defined race and profiles of 526,000 users, the analyst(s) parsed text, crunched the numbers and identified the most distinguishing features of each racial group.
Using this kind of analysis, we were able find the interests, hobbies, tastes, and self-descriptions that are specially important to each racial group, as determined by the words of the group itself. The information in this article is not our opinion. It’s data, aggregated from the essays of half a million real people.
OkTrends’ yardstick of “statistical distinction” is relative frequency—how much more a term or phrase is used by one group over others.* As they explain, “[f]or example, it turns out that all kinds of people list sushi as one of their favorite foods. But Asians are the only group who also list sashimi; it’s a racial outlier.” OkTrends then goes on to make a number of Racial Stereotypes, such as the following:
White women show off their eyes (mascara is #5 on their list).
Black women show off their lips (lip gloss, #7).
Latinas show off both (mascara, #18 / lip gloss, #22).
Asian women, however, show off their practicality (lip balm, #48).
And thus we could also conclude that Asians like sashimi, right?


Although the numbers are in their own way intriguing, the final writeup suffers from the unfortunate analytical scourge that the economist Bill Easterly refers to as Reversing Conditional Probability (see here and here too). That is, the writers took one conditional probability—“If [your profile says] you like sashimi, then you are Asian”—and flipped it around—“if you are Asian, then you like sashimi.”

This logical fallacy is worth explaining with an extended analogy in another domain. Consider the following relative frequency:

Vietnamese are two to three times more likely than white Americans to be of the type B blood group.**
In other words, if you collected blood samples from equal numbers of Vietnamese and white Americans, then you’d end up with two to three times as many samples of type B blood from Vietnamese as from the white folk. Assuming a 3:1 ratio, that probability looks like this:

We can take this example one step further. Suppose you pick up a random type B sample. Given what we know about the equal sample populations in our hypothetical example and the proportion of type B blood in Vietnamese versus white Americans, we can then make a reasonable guess that this anonymous type B blood sample most likely came from a Vietnamese donor.

So if someone is Vietnamese, then they likely have type B blood, right?


This question makes the mistake of reversing the conditional probability. I took a simple relative frequency—the type B rate for Vietnamese is much higher than for white Americans—and inferred another probability—a random type B sample is likely from a Vietnamese donor—which itself depends on certain conditions, namely that the sample populations are equal. But this conditional probability can’t be logically reversed. The percentage of Vietnamese with type B blood could be anything from 90% to 3% of the whole Vietnamese population—all we know is that they’re more likely to be so than white Americans.

When we look at the overall blood group percentages by ethnic group, it turns out that for both Vietnamese and white American populations, any given individual is most likely to have type O blood. Only about 20%–30% of Vietnamese are of type B. If you happen to meet a Vietnamese person, they probably have type O blood, even while they are up to three times more likely than white Americans to have type B blood.


Reversing conditional probabilities is the nuts-and-bolts of “data-driven” stereotyping. It’s where we jump from “Vietnamese are more likely than white Americans to have type B blood” (fact) to “Vietnamese have type B blood” (fiction). Or from “terrorists in the news are more likely to be Muslim” to “Muslims are terrorists.”

What makes OkTrends’ post so potentially damaging is that they hold up their findings as empirically based reflections of the world as it is. Yes, their findings are both data-driven and not entirely useless, but their faulty conclusions-rolled-up-as-stereotypes have no logical basis. Their data simply don’t allow a logical progression from “the term lip balm occurs most frequently on profiles of Asian women” to “Asian women show off their practicality.”

Where this issue applies to this blog in particular, and to the Western Buddhist community more generally, is when we run across stereotypes rooted in the very same mistake of reversing conditional probabilities. Elsewhere in this blog and on Dharma Folk, one commenter happened to make this kind of claim. Not only did his comment imply that certain individual Vietnamese practice a superstitious Buddhism (stereotype), he also attempted to justify his statement with relative frequencies based on anecdotal observations accumulated through the substantial period of his life spent in Asia (reversing conditional probabilities). This stereotype further becomes racist when one’s supporting evidence/anecdata has no relation to Vietnamese Buddhists other than the tacit assumption that they must be like all the other Asians one has met. You just don’t know enough to assume that any given Asian Buddhist practices superstitious Buddhism.

That’s not to say that this particular commenter is either racist or irrational. Very smart people can make logical mistakes, and well-meaning individuals often say things that come out completely wrong. I get the sense that we base our understanding of the world on relative frequencies, and often operate with the mistaken base assumption that our experiences are reflective of the wider world. We all have at one time or another probably fallen prey to the seemingly innocent mistake of reversing conditional probabilities. But it’s still wrong.

And I will be all too happy to call it out when I see it.

* I’m actually not sure if OkTrends’ stats measure relative to the overall average frequency or to the frequency of all other groups.

** Blood types for white Americans can be viewed via the American Red Cross. Blood types for Vietnamese are estimated from an old Japanese study and


  1. Very interesting and helpful explanation, Arun. I know that I fall prey to my own faulty conclusions all the time, particularly when it comes to the conclusions I've "reached" based on my experiences dating Asian men. Unconsciously, I make erroneous conclusions based on these specious observations, and when I make myself aware of when I am doing it, I am so ashamed. It is indeed challenging. But as a Buddhist, if I am to free myself from the fetters of greed, hatred and delusion, I MUST address this and cut out its roots.

  2. I checked out those lists. Apparently I'm an Asian woman. How cool is that?

  3. I liked your little smiley-faces. Ironically, they're extremely fitting since most Vietnamese people I've met smiled at me just like that. It's the only good stereotype in your article :)

  4. I think the biggest red flag for me is that the data came from an on-line dating website. Intent on the part of the poster and how much truth they have filtered out in hopes of catching a mate don't seem to be taken into account anywhere here. They seem to just assume that these trends must be accurate since that's what people have put out there. Personally I wonder about all the data that ISN'T there....