Stephen Bruce wrote an interesting article regarding learning new words (for non-native speakers of English) and how this process might be optimized.
EAPing
The numbers are what I’m interested in. It would seem that a reader requires to understand 96% of the words in a text so that the remaining 4% could be understand from context. This would be that each text of $N$ words could increase one’s vocabulary be $0.04N$. Also, if would be most efficient if $N$ is also the size of the known vocabulary.
What is also interesting is that words follow a distribution according to Zipf’s law. That roughly says that words appearance is inversely proportional to its rank. Although, of course, the more “common” or highly ranked a word is then the more likely it is to appear in a text of say $N$ words long.
What I find amazing, is that, if a huge text of made up random words are
produced at random (by a monkey bashing a keyboard) then this would result in something that also follow Zipf’s law.
Therefore just selecting random texts is not an efficient way to acquire new words. If students are to acquire certain vocabulary then they would benefit from specially constructed texts. That is, assume a known base of words and in the first reading text add in 4% new words. Assume this new base of words and in the second reading text add in 4% more. If each text is $N$ words long then the growth of new words would be;
$0.04 N$, $0.08 N$, $0.12 N$, and so on. After $n$ texts the total increase in vocabulary would be $0.04nN$. A nice linear growth. Looks good. Ah but no :)
Let us not forget that we forget — according to Ebbinghaus Forgetting Curve only 33% is remembered after one day. Therefore if we assume one text per day then the 4% is eaten up the next day by using the forgotten 67% of the previous day. But it is not know what words will be forgotten.
Assume we can produce a test that can detect the forgetting words and modify the text on day two. We can therefore introduce 33% of 4% of $N$ of new words. Now the growth looks more like this (each term is how many news words can be introduced on each day);
$0.04 N$, $\frac{1}{3} \times 0.04 \times N$, $\frac{1}{9} \times 0.04 \times N$, which now stinks. After $n$ days (or texts) the total new vocabulary would be $\left(3 – 3^{1 – n}\right)\frac{N}{50}$.
This converges quickly and the result is that through reading in this optimum manner the student can increase their vocabulary by 6%. But this can be done in only 5 readings.
So, for me, the best way to acquire new vocabulary would be
(1) Assume a know base of words (or discover what this base is)
(2) Decide what area of vocabulary needs expanding, eg, “Physics” vocabulary
(3) Produce a text with 4% of the target vocabulary and 96% of the base
(4) After one day test what is recalled from the 4%
(5) Produce a new text with the forgotten words and some news words (so that this combined still totals 4%)
(6) Repeat for 5 days
(7) Give a weekend assignment that uses ALL the new vocabulary.
The tricky part is step 5 —- how can this be automated? It would seem that this would work well for students with a 500-1,000 or so sized vocabulary but as $N$ gets bigger then one could not have students reading 5,000 word texts but if they were willing and could do this daily then with the right software they could see a 6% growth per week. BUT you notice this also assumes the text of $N$ words uses each word in the base vocabulary only once.
I do feel the 6% is what students would get at the start, starting from a 500 word base. This would fall, and, probably reach a new below 1% at the 5,000 word range. Since it would be impossible to produce a 5000 word text that uses each word only once. (Must easier to do when students are faced with sentences at the 500 word level such as “This is a boat”).
I have heard of bots from tech start-ups that can write articles, maybe EAP could do with an inject of this kind of software? I do feel the acquisitions of new words can be optimized, granted, not a 6% growth per week. But anything close to 1% would be amazing.
Extensive Reading - playing the numbers game
Posted
Author Stephen Easley-Walsh