Speech Technology
Course overview
Humans are usually pretty good at speaking. A person is able to communicate linguistic structure from a young age, despite the complexity of both producing and perceiving speech sounds. In spite of this—or perhaps because of it—"how" we produce and perceive speech is in many ways still a mystery. The big questions in cognitive speech science are dedicated to hypothesizing how speech production and perception work.
Machines have no innate capacity for speech, but a substantial amount of research and creative thinking in the last century has led to machines that (sometimes) appear to speak and understand speech. Modern research in machine speech is often about how to give machines more natural speech.
This course explores why it is so difficult for machines to produce and perceive speech. We start with a foundation of the speech production and perception systems and the characteristics of speech that make it so complex; we then study a few techniques that have been designed to give machines the power of speech. At the core of it all we will find dynamical systems—ubiquitous systems in which the present is driven by the past.
This course involves math and computer programming. You are not expected to have a background in either, and a substantial portion of the course is dedicated to teaching you these things from scratch. However, you will benefit from a willingness to make lots of mistakes (and ignore whatever Math Fear you may have picked up earlier in life).
Learning objectives
The goal is that, by the end of this course, you will have some of the foundational skills and knowledge required to work as a linguist or language engineer at a tech company. These include the ability to:
- Explain how speech is produced and perceived by humans (especially through the language of dynamical systems).
- Compare the advantages and disadvantages of different methods of speech synthesis.
- Explain and apply various algorithms used for speech synthesis and recognition.
- Explain why there are racial disparities in speech technology.
- Write a script that synthesizes a vowel and performs a Fourier transformation on it.
During this course, you will work with spectrograms and waveforms in Praat, build sound from scratch in MATLAB, build your own Fourier transform, code basic programs in MATLAB, compare the efficiency and effectiveness of some path-finding algorithms, and explain the movements and dynamics integral to speech and hearing.
Schedule
Week | Topic |
---|---|
Week 1 | Orientation |
Week 2 | Phonetics in speech technology |
Week 3 | Dynamical systems |
Week 4 | Masses, springs, and speech synthesis |
Week 5 | Concatenative synthesis I |
Week 6 | Concatenative synthesis II |
Week 7 | Speech recognition |
Week 8 | Neural networks I |
Week 9 | Neural networks II |
Week 10 | Reflection |
Weekly plan
Due to COVID-19, this course will primarily be conducted asynchronously, which here means that there is no requirement for all of us to be in the same space (virtual or physical) at the same time. I've made this choice to accommodate our recent trying circumstances; for example, to ensure that students who now live in different time zones or have unreliable internet access can fully participate in the course.
The class is organized around weekly deadlines, as summarized in the table below. (The timeline will be a little different in the first week as we get our bearings.)
Day | Things to do on or before that day |
---|---|
Monday |
|
Tuesday | Contribute to the week's discussion on the course site. |
Wednesday | |
Thursday | |
Friday |
|
Grades
Category | % of grade |
---|---|
Weekly quizzes | 30% |
Discussion | 20% |
Weekly code assignments | 30% |
Final project | 20% |
Weekly quizzes
New readings, video lectures, or other materials will be available on the course site each week. To check your understanding, each week a quiz will be posted in the course site to assess your understanding.
The quizzes will usually be in the following format: 10-20 multiple-choice (or similar) questions, untimed. The quiz will be made available at the same time as the material it assesses, which will generally be 5-7 days before the quiz is due.
Discussion
Discussions in this course will take place through the course site. You are expected to participate in each week's discussion by posting your own thoughts early in the week and replying to your peers later in the week.
Each student will start their own thread in the week's discussion earlier in the week. Messages that start threads can be questions, criticisms, extensions, or epiphanies about the course content; said another way, your posts should show that you've put thought into the week's content.
Later in the week, you will reply to at least two of your peers' threads. Good replies could be affirm-and-add responses (the "yes, and…" model used in improvisation) or thought-provoking questions/challenges on the topic. Ideally, your reply should continue conversation, not shut it down. Likewise, the best thread-starters will invite comments and lengthy discussion.
Outside of the weekly graded discussion, you are encouraged to participate in the open discussion on the course page. This will be monitored, but not graded; it is a space for you to talk about whatever you want (as long as it would be appropriate to talk about in the classroom too), post gifs, ask unrelated questions, etc. Informal/conversational messages are welcome as long as they are respectful and appropriate (which is true of the discussion posts as well).
Weekly code assignments
You are not expected to have any coding experience. Programming is not a prerequisite for the course. Instead, this course will teach you the basics of coding in MATLAB.
There will be a code assignment each week. You will complete the assignment in MATLAB or MATLAB Online, then upload the finished assignment to the course site for grading. The weekly code assignments will build toward the course's final assessment, which includes a coding project.
Early coding assignments will give you practice with basic programming concepts like variables, lists (arrays/vectors), and for loops. We will then use those concepts to implement commonly used algorithms like dynamical systems and the greedy search algorithm. (You might not know what those mean now, but hopefully you will by the time we're done with the course!)
Final project
The summative assessment for this course (what in other courses might be a final exam) is a final project. The purpose of the project is to provide a practical assessment of how you apply the knowledge and skills you gained from this course.
The project for this course is to synthesize three vowels using mass-spring systems, then code a Fourier transform that recognizes those vowels. (By the end of the course you should also have learned everything you need to know to build your own neural network. You can add that to your final project too if you want—you won't get extra credit for it or anything, but I'll be impressed!)