The Brown Corpus gathers together a million words of printed American English from the year 1961. It is called “Brown” because it was made at Brown University.
A corpus gathers together the writings (and even speech) of a language to do research on how the language is really used. With so many words, you need a computer to do the research.
A corpus can also be used to make dictionaries and usage guides. Pam Peter’s “Cambridge Guide to English Usage” (2004) is based on a corpus drawn from English as it is used all over the world, written and spoken.
The Brown Corpus was one of the first and one of the best. The first “American Heritage Dictionary” (1969) was based on it.
Today they have corpora that are a hundred times bigger. And they have them for all parts of the English-speaking world – America, Britain, India, Australia and New Zealand. And, of course, with much better computers they can find out way more.
The Brown Corpus is made up of 500 writings, each of which were first printed in 1961, and each with an author who spoke American English as his first language.
The writings are representative of everything that was printed in that year – not just books, but also magazines, newspapers, learned articles, industry reports and government reports.
Comparing the 5000 most common words in the Brown Corpus with those of Shakespeare made my heart sink. Instead of adventure, you have advertising. Death, nature and religion are almost unknown. Hydrogen is more talked about then humility.
It is Sinclair Lewis’s Babbit to Shakespeare’s Juliet.
The view of life is so different.
In my last posting I defined Common English as made up of those words that are most common to all centuries and countries. But the common ground between the Brown Corpus and Shakespeare seems a sad thing. It has no advertising, but neither does it have adventure. It has no computers, but neither does it have ape nor ant.
I am up to the word credit. I was thinking of forgetting the whole thing, but now I want to see how it all turns out.
To create a usable Common English it seems likely I will have to add words from Basic English, at least those in Shakespeare. The thousand words of Basic English were chosen so that anything that can be said in Full English can be said in Basic too.
I am afraid that what is common between Brown and Shakespeare will not give me enough power of expression. Thus the words from Basic English.
See also:
I realize this is a very old post. (I ran across it by following a current post’s “see also” links down the rabbit hole.)
Although my comment is far too late to make any difference in Abagond’s Common English project of 2006, I’d still like to point something out.
“Comparing the 5000 most common words in the Brown Corpus with those of Shakespeare made my heart sink. Instead of adventure, you have advertising. Death, nature and religion are almost unknown. Hydrogen is more talked about then humility.”
But Shakespeare’s works consist of plays and poetry, whereas the Brown Corpus included “not just books, but also magazines, newspapers, learned articles, industry reports and government reports.”
I’m willing to bet if a university did a similar corpus for only one year during Shakespeare’s career (thereby limiting his inclusion to the works published that year) and also included broadsheets, scientific texts, and govermental records, the resulting 5000 most common words would have been very different.
LikeLike