Humans absorb about 1.5 megabytes of data from their birth to adulthood to learn their native language, scientists say. Learning one's native language may seem effortless. However, the research from the University of California (UC) Berkeley in the US shows that language acquisition between birth and 18 is a remarkable feat of cognition, rather than something humans are just hardwired to do. Researchers calculated that, from infancy to young adulthood, learners absorb approximately 12.5 million bits of information about language -- about two bits per minute -- to fully acquire linguistic knowledge.
If converted into binary code, the data would fill a 1.5 Mb floppy disk, the study found. The findings, published in the Royal Society Open Science journal, challenge assumptions that human language acquisition happens effortlessly, and that robots would have an easy time mastering it.
"Ours is the first study to put a number on the amount you have to learn to acquire language," said Steven Piantadosi, an assistant professor at UC Berkeley.
"It highlights that children and teens are remarkable learners, absorbing upwards of 1,000 bits of information each day," said Piantadosi.
A bit, or binary digit, is a basic unit of data in computing, and computers store information and calculate using only zeroes and ones. The study uses the standard definition of eight bits to a byte.
"When you think about a child having to remember millions of zeroes and ones (in language), that says they must have really pretty impressive learning mechanisms," he said.
Researchers wanted to gauge the amounts and different kinds of information that English speakers need to learn their native language.
They arrived at their results by running various calculations about language semantics and syntax through computational models.
The study found that linguistic knowledge focuses mostly on the meaning of words, as opposed to the grammar of language.
"A lot of research on language learning focuses on syntax, like word order," Piantadosi said. "But our study shows that syntax represents just a tiny piece of language learning, and that the main difficulty has got to be in learning what so many words mean," he said.
That focus on semantics versus syntax distinguishes humans from robots, including voice-controlled digital helpers such as Alexa, Siri and Google Assistant. "This really highlights a difference between machine learners and human learners. Machines know what words go together and where they go in sentences, but know very little about the meaning of words," Piantadosi said.
As for the question of whether bilingual people must store twice as many bits of information, Piantadosi said this is unlikely in the case of word meanings, many of which are shared across languages. "The meanings of many common nouns like 'mother' will be similar across languages, and so you won't need to learn all of the bits of information about their meanings twice," he said.