Shadowing Practice: Introduction to Diminsionality Reduction (a.k.a. Manifold Learning) - Learn English Speaking with YouTube

C1
Hi everyone, my name is Bing Breton and I'm a professor at the University of Washington in Seattle.
⏸ Paused
123 sentences
If sentences are too short or too long, click Edit to adjust them.
1
Hi everyone, my name is Bing Breton and I'm a professor at the University of Washington in Seattle.
2
In this next series of lectures we're going to be talking about one of my favorite topics, nonlinear dimensionality reduction, sometimes also known as manifold learning.
3
So what is this field about?
4
What is actually a manifold?
5
We're going to be breaking it down and going through some of the most popular ways of performing nonlinear dimensionality reduction.
6
We're also going to be giving some examples
7
and talking about different ways in which you can
8
and also cannot use manifold learning as a tool in your work
9
so the basic tenant of manifold learning and non-linear dimensionality reduction is
10
that even if you have really really big
11
and complicated data patterns do in fact exist in the data we believe this to be true
12
because you know after all
13
if the patterns don't exist why are we bothering to collect this data in the first place
14
so we believe um as a as a starting point
15
that patterns exist in complicated data so we have to to believe this.
16
Now, when we're talking about dimensionality reduction, part of the problem is that we have to be able to visualize the data.
17
Now, I and hopefully most of you are humans who are constrained to walk around in this three-dimensional world.
18
And a lot of what my visual intuition is in two dimensions, so things that can be done on a piece of paper
19
or at least on a computer screen and so
20
if i have data that is higher than two or three dimensional
21
and that's most data sets we have a the next video is going to be all about examples
22
and intuition about high dimensional data sets what we have a problem
23
which is that the data is there
24
and the patterns do exist the data we believe to be true i can't actually see it
25
and for me because i'm a really visual person i like seeing the data
26
so a lot of the challenge in manifold learning is figuring out what the the patterns actually are
27
so that we can actually see it and gain some intuition for data.
28
So I found these rare earth magnets in my office and it's one of my favorite office toys to play with.
29
So I brought them as a prop to show you what kind of high dimensional data might look like.
30
So let's say your data is like this little ball of little magnets here.
31
It's roughly a smash into a little ball.
32
And so in order to describe all of the data sets on this little ball, you kind of need all three dimensions because we exist in a three dimensional world.
33
Now on the other hand, let's say your data looked more like something that I smashed up into this little ribbon here.
34
Okay, now if your data looks more like this, okay, where it's a little ribbon, what you can see here is that even though this toy,
35
just like the other toy, exists in three-dimensional world, it is in in fact, lying on a surface.
36
Okay?
37
Now, the surface could be a flat surface, like you can describe it by a plane.
38
All right, so we call this in linear algebra subspace because it's planar and it's flat.
39
And that's great because you can use linear dimensionality reduction techniques to describe this plane.
40
So I can rotate it however I want in three-dimensional space.
41
It's still kind of on a plane, and I can describe that plane.
42
This is a simpler way of describing my data.
43
The problem becomes if the plane becomes warped of some kind.
44
So let's say I can make a little bracelet out of it, and now it's a little ring, okay?
45
Or maybe it's not connected, and it's just kind of curvy like this, okay?
46
This does not fundamentally change the fact
47
that all of the data points on my little toy are still on a flat-ish surface.
48
I can flatten it, it's locally flat, just like the surface of the Earth is locally flat, this Earth that we all walk on.
49
But if you zoom out and look at it, you can see that you actually do need three-dimensional to describe the data set,
50
but locally, it's lying on this flat-ish curved surface.
51
That's kind of roughly speaking, if I'm waving my hands around, what a manifold is.
52
It's a description of something that is approximately flat, if you look closely enough, but globally, it might be curved.
53
And so if I can learn what this curved surface is, then I'm able to describe my data much more simply
54
by describing the curve and then figuring out where my data points are on this curve
55
without having to use all three dimensions.
56
Now, the idea here is that we need to be able to reduce and visualize the data.
57
So here's like a physical prop of visualizing my data set.
58
Most data sets are not something that you can play with as a desk toy.
59
And so the goal of Manifold Learning is to reduce the data.
60
We need to reduce and visualize the data.
61
We want to reduce it because we suspect that patterns do in fact exist, so we can describe it more simply.
62
And we want to visualize it because humans are really intuitive visual creatures.
63
And so when we can see something, we believe in it and we can actually see patterns in it that wouldn't have been obvious otherwise.
64
And the reason we want to do this is because we want to gain intuition.
65
And we want to communicate to ourselves
66
and to each other about what we've actually got is one
67
of the most compact ways of communicating your data is being able to make a really compelling visualization.
68
Now, the trick here with dimensionality reduction and manifold learning is how do we do this, right?
69
How do we actually pick out patterns that exist in the data set and reduce and visualize them?
70
So it turns out that, hopefully I can demonstrate again with this little toy here, it has to do with this notion of what's actually close, like what's similar to each other.
71
Like, am I similar to my cousin more than some random person on the street?
72
Probably.
73
But, like, how do you define that?
74
So let's say that we have this curved surface here, okay, my little toy.
75
And you can see that it's lying on this flattish surface, this curved surface.
76
And so two neighboring points are on the surface, are close to each other because they're actually touching each other.
77
They're close to each other, right?
78
So we kind of want to say that if we're going to reduce the dimensionality of my data set here, my little ring, my little bracelet I've made,
79
I want the points that are closer in the original data set to also end up closer in my reduced learned manifold,
80
in my reduced dimensionality space.
81
And points that are farther apart should also end up farther apart in my reduced space, so that I haven't lost information.
82
So things that used to be similar should actually be similar should end up closer together in my reduced space.
83
And things that are farther apart, less similar, should also end up less similar and less farther apart in my reduced space.
84
The problem then becomes, how do I actually define that?
85
So how do you actually compute distances in high dimensional spaces?
86
And what is the most compact way of doing it?
87
What's the most convenient way of doing it?
88
These are decisions we have to make.
89
So part of what we're going to be learning in the
90
next couple of lectures are common ways to defining distances and similarity.
91
So the punchline here is that there's no one right way of doing it.
92
This is a decision that one makes, and I'm gonna tell you about some of the most common ways of doing it
93
that seems to work well for different types of datasets, and how do you make these kinds of decisions.
94
And then also, what do we mean by more similar?
95
Like all similarities, are they all equally important?
96
So for example here, I can compute kind of like, grid size distances between any of these two points on my dataset here, okay?
97
But you can kind of see that the data sets over here are close in physical 3D space, in this studio space, to the points over here, right?
98
Because they're actually really close to each other.
99
Is that the same?
100
Does that matter as much as the fact that you actually have to go?
101
They're not actually connected.
102
They're not actually touching each other as little magnets.
103
Does that matter?
104
Because you had to count connected magnets, you'd have to go all the way up here to get to the other side.
105
And is that notion of distance more important than the fact that as the fly flies, you can get right over there.
106
These are all valid notions of similarity and distance, but are they all equally important in the context of manifold learning?
107
That's something that we're going to be talking about.
108
And then this idea of points that start out closer together should end up close together.
109
Well, what does ending up close together mean?
110
How do we interpret the fact that we end up with some kind of visualization of a beautiful manifold?
111
And can we actually interpret two points
112
that are closer together in the manifold space as being actually more similar in an interpretable, meaningful, engineering relevant way.
113
This is something that we'll discuss as well
114
because it is variously different depending on the algorithm you use and also on your notion of distance.
115
So we're going to dig right into it in the next lecture.
116
But I'm going to leave you with the idea that manifolds are everywhere.
117
We're going to do some manifold explaining right now.
118
But there's no one right way of doing manifold learning.
119
We're going to start by looking at some linear methods first
120
and trying to draw connections between our notions of similarity
121
and distance with some linear algebra and some linear dimensionality reductions
122
that you've already heard about earlier in the series
123
and then we're going to generalize these concepts to talk about non-linear dimensionality reduction with some examples and to build some intuitions.

Download App

Everything you need to speak fluently

AI PronunciationScore every sentence
IPA PracticeMaster every sound
VocabularyBuild your word bank
Vocab GameLearn while playing

About This Lesson

In this lesson, you will practice your English speaking skills by exploring a complex yet fascinating topic: nonlinear dimensionality reduction, also known as manifold learning. As you immerse yourself in this subject, you will enhance your vocabulary and comprehension while developing your shadowing technique. Through this process, you will gain insights into high-dimensional data and the visualization methods used to interpret it. The lesson will focus on how to articulate and discuss these intricate concepts, providing you with a solid foundation for advanced English speaking, particularly in areas related to data science and analytics. This practice is particularly beneficial for those preparing for IELTS speaking practices, as it offers you the chance to familiarize yourself with specialized terminology and expressive speech.

Key Vocabulary & Phrases

  • Dimensionality reduction - A process of reducing the number of variables under consideration, by obtaining a set of principal variables.
  • Manifold learning - A type of nonlinear dimensionality reduction where data is assumed to reside on a manifold.
  • High-dimensional data - Data with a large number of features or variables that can complicate analysis and visualization.
  • Visual intuition - The ability to understand and interpret data visually, which aids in making sense of complex data patterns.
  • Locally flat surface - A surface that appears flat when viewed from a small scale, even though it may have curvature at a larger scale.
  • Data patterns - Recognizable arrangements within data that can provide insight or understanding.
  • Subspace - A lower-dimensional space within a higher-dimensional space that can simplify data representation.
  • Warped surface - A surface that has been distorted, complicating traditional data interpretation methods.

Practice Tips

To make the most of this lesson, consider using the shadowspeaks technique during your practice. Shadowing involves listening to the video and mimicking the professor's speech in real-time, which can greatly enhance your speaking fluency. Start by playing the video at a slower speed if necessary to ensure you can follow along. Pay attention to the tonal variations and pacing used by the speaker, as this will help you articulate complex vocabulary more naturally. Try to repeat phrases like "nonlinear dimensionality reduction" and "locally flat surface" until you feel comfortable saying them with confidence. To further improve your IELTS speaking practice, incorporate pauses to reflect and paraphrase concepts, allowing for deeper comprehension and stronger speaking skills. Engaging with these ideas will not only improve your English proficiency but will also equip you with the vocabulary needed for technical discussions.

What is the Shadowing Technique?

Shadowing is a science-backed language learning technique originally developed for professional interpreter training and popularized by polyglot Dr. Alexander Arguelles. The method is simple but powerful: you listen to native English audio and immediately repeat it out loud — like a shadow following the speaker with just a 1–2 second delay. Unlike passive listening or grammar drills, shadowing forces your brain and mouth muscles to simultaneously process and reproduce real speech patterns. Research shows it significantly improves pronunciation accuracy, intonation, rhythm, connected speech, listening comprehension, and speaking fluency — making it one of the most effective methods for IELTS Speaking preparation and real-world English communication.

Buy us a coffee