Sometimes, when I'm looking for things to write about in this blog, I turn to the Physics and Society section of the arXiv eprint server. I'm usually not disappointed. Indeed, Physics and Society is such a rich source of interesting and intriguing papers that an enterprising publisher should perhaps create a journal dedicated to the field.
When I visited Physics and Society on Wednesday this week, I came across a paper bearing the title "Anatomy of scientific evolution." Its authors are Jinhyuk Yun and Hawoong Jeong of the Korea Advanced Institute of Science and Technology and Pan-Jun Kim of the Asia Pacific Center for Theoretical Physics.
Yun, Jeong, and Kim introduce their paper by remarking on humanity's use of keywords such as "stone," "steam," and "internet" to characterize ages of technological development. Those keywords were once coined by a few individuals and later accepted by society. Although the three authors don't phrase the goal of the paper in the the following way, they could have: Can the words used by society be used to infer the evolution of technologies?
To address the question, Yun, Jeong, and Kim turned to the Google Books Corpus, a publicly accessible database that includes 8 million books—roughly 6% of all the books ever printed between 1506 and 2008. Searchable terms in the corpus are known as "Ngrams," which consist of N strings of characters separated by N − 1 spaces. "Positron" is a 1-gram; "positron emission tomography," a 3-gram.
You can plot the evolving usage of Ngrams for yourself using the Google Books Ngram Viewer. Alternatively, like Yun, Jeong, and Kim, you can incorporate Google's application programming interfaces (APIs) in a computer program that conducts searches and analyzes the results however you like.
Trawling the corpus
The authors' program trawled the corpus to tabulate the annual frequency of 7588 scientific and technical 1-grams. To ensure adequate statistics, they began their search with 1800, the first year in the corpus that contains at least 70 million words.
It's possible—and likely— that the annual frequency of "semiconductor" and other 1-grams might grow merely because the number of publications grows. Yun, Jeong, and Kim's program therefore divided each annual frequency by the total number of 1-grams in the year.
To their surprise, they found that the annual frequencies of their 7588 1-grams evolved more or less in just three ways. What they call type-I 1-grams, such as "phototube," break out of specialist literature, reach a definite peak, and then decline. Type-IIs, such as "biofuel," break out and rise continuously. Type-IIIs, such as "Skyrmion," never attain the average frequency of words that appear in nontechnical dictionaries.
Yun, Jeong, and Kim assert that type-IIs have "excessively long and distinguished effects on a life and culture." Having identified hundreds of them, the authors go on to ask whether type-II technologies can be predicted before their emergence is apparent. The answer is yes. No matter how long it takes, once a 1-gram achieves a relative annual frequency of 1 in a million, its associated technology or scientific concept is destined to become type-II. On that basis, they predict that "bioethics," whose relative frequency surpassed 1.5 × 10−6 in 2007, "will continue to receive the spotlight," as they put it.
The original covers of the first three novels in Isaac Asimov's Foundation trilogy.
Fans of science fiction might have noticed the word "psychohistory" in my title. Introduced by Isaac Asimov in 1942, the concept lies at the heart of his Foundation series of novels. Psychohistory constitutes one of the most original technical inventions in the genre.
Borrowing from the kinetic theory of gasses, psychohistory posits that once a population is large enough, its destiny can be predicted on the basis of universal statistical laws. In the novels, psychohistory's founder, Hari Seldon, applies those laws to anticipate, and then foreshorten, a period of anarchy and strife across the Galaxy from 30 millennia to one. At least that's his goal.
I think Asimov deserves credit for predicting the type of study that Yun, Jeong, and Kim performed. But it's not just the huge size of the Google Books Corpus and other databases that makes such studies possible. You also need the data to be conveniently searchable.