Speech Technology

Course overview

Humans are usually pretty good at speaking. A person is able to communicate linguistic structure from a young age, despite the complexity of both producing and perceiving speech sounds. In spite of this—or perhaps because of it—"how" we produce and perceive speech is in many ways still a mystery. The big questions in cognitive speech science are dedicated to hypothesizing how speech production and perception work.

Machines have no innate capacity for speech, but a substantial amount of research and creative thinking in the last century has led to machines that (sometimes) appear to speak and understand speech. Modern research in machine speech is often about how to give machines more natural speech.

This course explores why it is so difficult for machines to produce and perceive speech. We start with a foundation of the speech production and perception systems and the characteristics of speech that make it so complex; we then study a few techniques that have been designed to give machines the power of speech. At the core of it all we will find dynamical systems—ubiquitous systems in which the present is driven by the past.

This course involves math and computer programming. You are not expected to have a background in either, and a substantial portion of the course is dedicated to teaching you these things from scratch. However, you will benefit from a willingness to make lots of mistakes (and ignore whatever Math Fear you may have picked up earlier in life).

Learning objectives

The goal is that, by the end of this course, you will have some of the foundational skills and knowledge required to work as a linguist or language engineer at a tech company. These include the ability to:

  • Explain how speech is produced and perceived by humans (especially through the language of dynamical systems).
  • Compare the advantages and disadvantages of different methods of speech synthesis.
  • Explain and apply various algorithms used for speech synthesis and recognition.
  • Explain why there are racial disparities in speech technology.
  • Write a script that synthesizes a vowel and performs a Fourier transformation on it.

During this course, you will work with spectrograms and waveforms in Praat, build sound from scratch in MATLAB, build your own Fourier transform, code basic programs in MATLAB, compare the efficiency and effectiveness of some path-finding algorithms, and explain the movements and dynamics integral to speech and hearing.

Schedule

Week Topic
Week 1 Orientation
Week 2 Phonetics in speech technology
Week 3 Dynamical systems
Week 4 Masses, springs, and speech synthesis
Week 5 Concatenative synthesis I
Week 6 Concatenative synthesis II
Week 7 Speech recognition
Week 8 Neural networks I
Week 9 Neural networks II
Week 10 Reflection

Weekly plan

Due to COVID-19, this course will primarily be conducted asynchronously, which here means that there is no requirement for all of us to be in the same space (virtual or physical) at the same time. I've made this choice to accommodate our recent trying circumstances; for example, to ensure that students who now live in different time zones or have unreliable internet access can fully participate in the course.

The class is organized around weekly deadlines, as summarized in the table below. (The timeline will be a little different in the first week as we get our bearings.)

Day Things to do on or before that day
Monday
  • Read readings, watch videos, and consume other assigned media.
  • Take a quiz about that content.
Tuesday Contribute to the week's discussion on the course site.
Wednesday
Thursday
Friday
  • Continue the week's discussion.
  • Submit a code assignment.

Grades

Category % of grade
Weekly quizzes 30%
Discussion 20%
Weekly code assignments 30%
Final project 20%

Weekly quizzes

New readings, video lectures, or other materials will be available on the course site each week. To check your understanding, each week a quiz will be posted in the course site to assess your understanding.

The quizzes will usually be in the following format: 10-20 multiple-choice (or similar) questions, untimed. The quiz will be made available at the same time as the material it assesses, which will generally be 5-7 days before the quiz is due.

Discussion

Discussions in this course will take place through the course site. You are expected to participate in each week's discussion by posting your own thoughts early in the week and replying to your peers later in the week.

Each student will start their own thread in the week's discussion earlier in the week. Messages that start threads can be questions, criticisms, extensions, or epiphanies about the course content; said another way, your posts should show that you've put thought into the week's content.

Later in the week, you will reply to at least two of your peers' threads. Good replies could be affirm-and-add responses (the "yes, and…" model used in improvisation) or thought-provoking questions/challenges on the topic. Ideally, your reply should continue conversation, not shut it down. Likewise, the best thread-starters will invite comments and lengthy discussion.

Outside of the weekly graded discussion, you are encouraged to participate in the open discussion on the course page. This will be monitored, but not graded; it is a space for you to talk about whatever you want (as long as it would be appropriate to talk about in the classroom too), post gifs, ask unrelated questions, etc. Informal/conversational messages are welcome as long as they are respectful and appropriate (which is true of the discussion posts as well).

Weekly code assignments

You are not expected to have any coding experience. Programming is not a prerequisite for the course. Instead, this course will teach you the basics of coding in MATLAB.

There will be a code assignment each week. You will complete the assignment in MATLAB or MATLAB Online, then upload the finished assignment to the course site for grading. The weekly code assignments will build toward the course's final assessment, which includes a coding project.

Early coding assignments will give you practice with basic programming concepts like variables, lists (arrays/vectors), and for loops. We will then use those concepts to implement commonly used algorithms like dynamical systems and the greedy search algorithm. (You might not know what those mean now, but hopefully you will by the time we're done with the course!)

Final project

The summative assessment for this course (what in other courses might be a final exam) is a final project. The purpose of the project is to provide a practical assessment of how you apply the knowledge and skills you gained from this course.

The project for this course is to synthesize three vowels using mass-spring systems, then code a Fourier transform that recognizes those vowels. (By the end of the course you should also have learned everything you need to know to build your own neural network. You can add that to your final project too if you want—you won't get extra credit for it or anything, but I'll be impressed!)