Introduction
Over the course of this semester you’ve looked at language from three perspectives: (1) as a formal mathematical system, (2) as the result of an ongoing process of biological and cultural evolution, and (3) as a statistical phenomenon. For your final project, you will synthesize all three perspectives and invent your own language.
Your constructed language, or conlang, does not need to be complete. No natural language was designed in a semester. But it does have to be principled: every choice you make should reflect something you learned in this course, and you should be able to explain and defend those choices. Be creative while at the same time show your proficiency in the language theories you’ve encountered. Create something that could have evolved naturally.
You will produce two artifacts for your conlang:
- A technical document that begins with an overview and origin story and includes: a phonology and sound inventory; a morphology (how words are built and modified); a grammar with example sentences; a treatment of how your language encodes meaning (semantics and pragmatics); a small lexicon of core vocabulary; and notes on how the language varies, changes, or is endangered in its fictional world; and
- A presentation in which you teach the foundations of the language to a moderately knowledgeable audience. The presentation can be a TED-style talk (with visuals), a short film or video essay, a screenplay (only attempt this route if you have screenwriting experience), an interactive web essay (look to Nicky Case for examples of these), or a linguistic field report. For the latter, imagine you are a linguist who has uncovered fragmentary evidence of a previously unknown extinct language; your job is to document what the fragments reveal, reconstruct what they imply, and acknowledge what remains beyond recovery.
Learning Objectives
This course has had plenty of mini quizzes and exams for you to demonstrate proficiency with the fundamentals of language.
With this term project, you are creating and teaching, and therefore your learning will show more cognitive effort (depth of knowledge). You’ll design a phonology (so you’ll have to prove you know what phonemes are). You’ll design a grammar (so you’ll have to demonstrate your understanding of syntactic structures and where meaning comes from). You’ll have to write the “origin story” of your language (so you’ll have to prove you’ve read Mithen’s book).
Preparation
Before undertaking this project, you should be familiar with what a conlang is and have studied several examples. Prepare for the design of your own language by through:
- Reading the entire Wikipedia article on Constructed language, and follow several of the many linked articles and resources from that page.
- Studying at least one of Ido, Interlingua, or Esperanto.
- Studying at least one conlang not based on Latin.
- Studying at least one conlang created for a fictional world from books or movies (e.g., Klingon, Quenya, Sindarin, Na'vi, or Dothraki).
- Browsing several resources of the The Language Creation Society
- Thinking about the goals and constraints of your language. What makes it unique? How does it reflect the principles you've learned in this course?
Requirements
Your technical report must be formatted as any serious academic work, with a title, abstract, introduction, body, conclusion, and a (comprehensive-ish) reference section. Footnotes, citations, and inline quotes should appear where appropriate.
The body of the report must address each of the following items somewhere in the report. Don’t simply list “answers” to each of these points. Write a nice report and just make sure each of the items is covered somewhere.
From Part I — The Formal Perspective
- Place your language in the Chomsky hierarchy. What kind of grammar does your language have? Is it context-free? Mildly context-sensitive? Make an argument, with at least one example construction that supports your claim.
- Show a generative or analytic grammar fragment (e.g., EBNF, CFG, or PEG) that can generate a subset of your language's sentences. This does not need to be complete, but it must be precise enough that someone could write a parser for it.
- Address ambiguity. Does your language permit ambiguous sentences? If so, show an example and explain how speakers resolve the ambiguity (context? prosody? word order?). If not, explain what design choices prevent it and what expressiveness you gave up in order to achieve that.
From Part II — The Evolutionary and Structural Perspective
- Design a phonology. Your language needs sounds. Make sure you have at least 20 phonemes (remember that English has around 44) and explain your choices. Are any iconic (resembling what they refer to)? What existing human languages influenced your choices?
- Take a position on word order and morphology. Is your language analytic or synthetic? Head-initial or head-final? Does it use case endings, word order, or something else to track grammatical relationships? Justify your choices explicitly—you are making an argument about how meaning can be encoded in structure.
- Address the syntax-semantics question. In your language, can meaning come from structure, or something prior to structure? Show at least one example where your language encodes a meaning that English cannot easily express, or expresses a common English meaning in a structurally surprising way. You may draw inspiration from Inuktitut, Owens Valley Paiute, Irish, Russian, or any other language we examined or that you happen to know well.
- Write an origin story. In one paragraph, describe the fictional community of speakers who developed your language—their environment, their social structure, their communicative needs. Invent anthropological pressures that shaped your language. What does your language have to mark grammatically, and what does that reveal about what its speakers habitually pay attention to?
- Address endangerment or change. Is your language thriving, endangered, or extinct in its fictional world? If it has contact with other languages, show one borrowing or one grammaticalization in progress. Give thoughts about where it might be headed in the future.
- Reflect on what your language cannot express. Every language encodes some things and leaves others to context, gesture, or silence. What can your language not say easily? What does that absence reveal about the relationship between language, thought, and the community that speaks it?
From Part III — The Statistical Perspective
- For a given (lengthy) sentence in your language, show the tokenization that is likely to be produced by a statistical model.
- Show a statistical pattern. Find at least one interesting statistical pattern in your language — a Zipfian distribution of word frequencies, a tendency for certain sounds to cluster together, or something else. Explain how this pattern might have arisen through cultural evolution.
- Reflect on learnability. Could an LLM trained on a corpus of your language learn its grammar? What features of your language would be easy for a statistical model to acquire, and what features might resist statistical learning? (Think about what transformers can and cannot do with recursion, long-distance dependencies, and counting.)
- (OPTIONAL) If you have experience with training open source models, see how much text you can generate and train a small model. Attempt a conversation or a text generation task, translation, or other experiment. You can put results and findings in an appendix.
Submission Instructions
In the text of your BrightSpace submission, provide a link to your “presentation” (web essay, video, etc.). For the attachment of the BrightSpace submission, provide your technical report as a PDF file.
If your presentation is a TED-style talk, you can submit it either as a video (preferred) or, if you really enjoy presenting live, you can present it in person during the last class period. If you wish to present live, please arrange a time slot at least two weeks in advance.