Words and word-like entities are combined (arranged) according to rules to form larger units like phrases and sentences. The rules constitute a grammar. Where do the rules come from? How do we know if an utterance is well-formed?
Rules or Guidelines?In programming languages, rules are explicit and prescriptive. In spoken natural language, we tend to treat them as guidelines, but in writing, they are often more rigidly followed.
A grammar will tell you where all the components of an utterance should go—based on the category (e.g., for English: noun, verb, adjective, noun phrase, verb phrase, subject, predicate, adverbial clause, prepositional phrase, etc.) of the word or phrase—of the components, but it won’t tell you what the utterance means. Classic examples are:
Technically, the grammar composes utterances into hierarchical structures, even though they tend to be written or spoken as temporal sequences. The tree structure is what allows compositionality to work at scale.
Recursion is the property of language that allows rules to be applied repeatedly, embedding structures within structures. This is what enables sentences to be infinitely long and complex, even with a finite set of rules and vocabulary.
Here’s a grammar that demonstrates recursion:
We can generate sentences as in this example:
S NP VP DET NOUN RP VP VP the NOUN RP VP VP the dog RP VP VP the dog that VP VP the dog that SV S VP the dog that thought S VP the dog that thought NP VP VP the dog that thought PN VP VP the dog that thought grace VP VP the dog that thought grace TV NP VP the dog that thought grace hit NP VP the dog that thought grace hit PN VP the dog that thought grace hit alan VP the dog that thought grace hit alan DV NP PP the dog that thought grace hit alan threw NP PP . . . the dog that thought grace hit alan threw the new blue toy to the fast rat
Recursion can allow sentences to become arbitrarily long and complex:
S NP VP NP SV S NP SV NP VP NP SV NP SV S NP SV NP SV NP VP NP SV NP SV NP SV S . . . she dreamed she dreamed she dreamed . . . she dreamed the dog swam

Chomsky thought language acquisition is too fast and input too impoverished for learning alone, putting forth the Poverty of the Stimulus argument: children know grammatical rules they’ve never seen exemplified. So he proposed a kind of innate Language Acquisition Device (LAD) for what became known as Universal Grammar.
Many opposing views exist. Tomasello and Christiansen note that general learning mechanisms + social interaction are sufficient. The computer simulations of Kirby and colleagues support this view, showing that cultural transmission can lead to the emergence of structured language over generations.
Discuss the fact that LLMs acquire language from stimulus alone, with no innate LAD, and they do remarkably well. LLMs learn grammar implicitly from the data, without explicit instruction. They can generate grammatically correct sentences, but they don’t have an explicit representation of grammatical rules like humans do. How is this possible? Are transformers just big enough to brute-force what evolution gave us for free? Or do we have something else?
Perhaps you’ve heard of pidgins (rudimentary contact languages with no native speakers and minimal grammar) or creoles (pidgins that children acquire natively, with a spontaneously developed full grammar).
You should read about how deaf children in 1980s Nicaragua invented a full sign language in one generation. Themselves. With no adults to mimic.
There seems to be a lot of evidence that given the right social conditions, grammar emerges.
TODO
Here are some questions useful for your spaced repetition learning. Many of the answers are not found on this page. Some will have popped up in lecture. Others will require you to do your own research.
TODOWe’ve covered: