Romance and Quantitative Literary Studies

Katherine Bode's review article titled "Why You Can’t Model Away Bias" (2020) is about Ted Underwood's Distant Horizons: Digital Evidence and Literary Change (2019). Neither are about romance, but I thought some readers might be intrigued by part of Bode's article which uses romance to rebut one of Underwood's claims:

In chapter 4 Underwood employs a data set derived from HathiTrust to identify a decline in the proportion of English-language fiction by women from around 50 percent of titles in the late nineteenth century to roughly 20 percent by 1970 before a reversion to a bit under half of all titles at the end of the twentieth century. Noting that elite university and public library collections may have simply collected more books by men than by women, Underwood seeks to test whether this bias influenced his results by comparing the proportions of male and female authors in HathiTrust to those in manual samples from four years of Publishers Weekly listings. Because the Publishers Weekly samples indicate an even more dramatic fall in women’s writing, Underwood claims that the comparison “addresses . . . doubts” about “how well . . . those collections represent the wider world of fiction” (135). While Publishers Weekly incorporates a great deal of popular fiction that does not figure in academic collections, it indexes almost no titles by even the most prominent and prolific popular romance fiction publisher of the twentieth century, Mills and Boon. Women authors predominate in this genre, and its heyday—the 1950s to the 1970s—corresponds with the most dramatic decline in the proportional representation of women authors and characters in Underwood’s results.

To explore how much the exclusion of romance fiction may have influenced his results, figure 1 amends Underwood’s figure 4.9 (134), using data on Australian women’s novels from 1945 to 2000. If American and British women wrote romance fiction at levels similar to that recorded in the Australian context, then rates of fiction by women would remain relatively flat through the 1940s, 1950s, and 1960s, at levels equivalent to that found at the turn of the twentieth century. There would be a decline in women’s fiction in the 1970s, but a less dramatic one than Underwood reports, and the general trend across the twentieth century would be fairly stable or growing. The Australian data thus undermine Underwood’s conjecture that the decline in female characterization was due to a decline in women authors of fiction, and expose the fragility of inferences based on literary data sets that have not been adequately historicized. I am not saying that my results show what actually happened; I am using them here as another sample. My point is that comparing two—or three or four or five or however many—samples cannot rule out similar biases in them, nor can it define the degree or limits of bias introduced by sampling methods.

I'm not qualified to give any opinion on the methodology used by either Bode or Underwood but I was a little perplexed by Bode's statement that the "heyday" of romance fiction was "the 1950s to the 1970s." Certainly as far as the US market is concerned, The Flame and the Flower (published in 1972) is credited as starting a new era in popular romance.

Also, Bode refers to Mills & Boon, but I think that Publishers Weekly is an American publication, so I would imagine that if Mills & Boon were going to be included there, they'd have been published by Harlequin. Was Bode unaware that Mills & Boon novels were published by Harlequin in the North American market? Or is Bode correct in identifying a lack of romance in the data and PW didn't include Harlequin romances?

Anyway, it's always wise to be aware that there may be problems with data (e.g. as discussed with regards to bestseller lists here).


