Saturday, August 05, 2006

A Metric for the Popular Imagination

In a paper that I'm currently revising for publication, I introduced the idea of fetishism as a category of analysis in the social sciences by referring to the popular idea of a fetish. One of the volume editors asked me for evidence to back up my claim. How do we know which ideas are popularly held? Surveys, maybe, but it doesn't make sense to do one in this case and my sample size would be far too small anyway.

After some reflection, I realized that I could provide evidence about popular notions of fetishism by surfing for porn doing some statistical linguistics. In "Automatic Meaning Discovery Using Google," Rudi Cilibrasi and Paul Vitanyi provide a metric they call the Normalized Google Distance (NGD). The basic idea is straightforward although the underlying math is deep. If you have two search terms like 'cat' and 'mouse', then each of those terms will appear on some number of pages, and a smaller number of pages will contain both 'cat' and 'mouse'. Intuitively, we expect to find 'cat' with 'mouse' more frequently than 'cat' with 'louse'. The NGD formalizes this idea by providing a measure of how far apart particular terms are in conceptual space.

Time for a hack. I wrote a simple Perl script that uses the Google Search API to calculate the NGD for a pair of terms.

Using this tool, we can provide a measure for the distance between the term 'fetish' and some popular and scholarly associations. (Lower numbers mean the terms are more closely associated.)

latex0.356331874622984
heels0.421554691152762
gag0.497568478291903
choke0.549934320808182
rubber0.553427573822638
leather0.57443530297729
doll0.581254531681847
dungeon0.604959474281258
handcuffs0.621969750564945
smoking0.629600508128091
balloon0.648364151347237
cigar0.715386689730539
fur0.787872974829155
freud0.792196156666702
psychoanalysis0.797465589884342
marx0.8195787086072
krafft-ebing0.955885248436093
commodity1.00639028092102


At this point Google really does constitute what John Battelle called "the database of intentions." More about Google in my next post...

Update (26 Aug 2006): Nicolás Quiroga translated this post into Spanish for his new digital history blog Tapera.

Tags: | | |