Практика Shadowing: printf is Actually a Secret Virtual Machine - And a Giant Security Hole! - Изучайте разговорный английский с YouTube

C2
Hey, I'm Dave.
⏸ Пауза
321 предложений
Если предложения слишком короткие или длинные, нажмите Edit, чтобы их изменить.
1
Hey, I'm Dave.
2
Welcome to my shop.
3
Today I want to talk about one of the strangest little machines hiding inside almost every C runtime that's ever shipped.
4
It's a function so familiar that most programmers stop even seeing it.
5
They use it the way you would use a light switch in your garage.
6
Flip it, it does its job, and you move on.
7
But Printf is not really just a function that prints things.
8
It's a formatting engine.
9
It's a tiny interpreter.
10
It reads a miniature language,
11
walks through a stream of dynamic arguments that it cannot actually see in any type-safe way,
12
way, converts binary values into human readable text,
13
aligns them, pads them, truncates them,
14
switches bases, rounds off floating point numbers,
15
and then pours the result out into your console,
16
a file, a string buffer,
17
or whatever other destination the runtime has wired up behind the curtain.
18
And the funny thing is,
19
just like that made up old cliche about the brain,
20
most programmers use about 10% of it.
21
So today we're going to start with the usual %D, %S,
22
and %X, because that's the part that everybody thinks they already know.
23
But we're going to get there by going backwards first,
24
down into a raw x86 assembly version of a fast SprintF replacement written by my first manager at Microsoft, Ben Slivka.
25
Ben was the development manager for MS-DOS when I was there,
26
and his formatter is wonderfully revealing because there's nowhere for the magic to hide.
27
There's no template engine, no managed runtime,
28
no allocator ferry, and no dependency graph the size of a European Parliament.
29
It just registers pointers in a state machine and a format string being chewed one byte at a time.
30
And once you've seen printf at that level,
31
the modern version stops looking like a boring library function and starts looking like what it really is.
32
A tiny little text-rendering virtual machine with a surprisingly sharp toolkit hidden inside of it.
33
Ben's file describes itself as a fast replacement for C runtime sprintf,
34
written in 1987 and updated in 1988,
35
and it explicitly supports a compact subset of the C formatting language.
36
Flags, width, optional H or L size modifiers,
37
decimal, unsigned decimal, string, lowercase and uppercase hex,
38
plus percent percent for printing a literal percent sign.
39
That's not the whole modern printf language,
40
but it's enough to see the skeleton,
41
and the skeleton is the important part for now.
42
The formatter begins life with two pointers,
43
one pointing at the format string and one pointing at the destination buffer.
44
In assembly terms, that's essentially the DSSI pair walking through the source and the ESDI pair writing to the output.
45
Every iteration loads one byte from the format string.
46
If it's an ordinary character, it just stores it.
47
If it's a percent sign, the machine changes personality.
48
It stops being a copier at that point and becomes a parser.
49
Ben's code literally jumps through the states named start,
50
flag, width, size, and type,
51
which is exactly the grammar hiding inside of every printf call that you've ever written.
52
And that's the first mental shift.
53
A format string is not just a string, It's a program.
54
A very small, very weird program, but a program nonetheless.
55
Take something like this.
56
Now, to you and me,
57
that reads like a sentence with a few holes in it,
58
but to printf it's a stream of instructions.
59
Copy the letters E R R O R and then a space,
60
and then it sees percent.
61
Resets its little internal defaults,
62
sees zero, decides now that the zero fill character is to be used instead of spaces,
63
sees eight, so it builds a field width of 8,
64
sees X, fetches an integer argument,
65
converts it to uppercase hexadecimal,
66
pads it to 8 characters,
67
and then copies it out.
68
Then it goes back to plain copying until the next percent sign.
69
Ben's assembly makes that very literal.
70
When it encounters a percent sign,
71
it clears the width, sets the default fill character to be blank,
72
assumes right justification, assumes it is not dealing with a long value,
73
and then transitions into the flag parsing state.
74
That is the runtime winding up the little clockwork mouse and sending it through the format specifier maze.
75
Width parsing is particularly fun because it shows how simple the machinery can be.
76
When Ben's formatter sees a digit in the width state,
77
it converts the ASCII digit to binary,
78
multiplies the current width by 10,
79
adds the digit, and stays in the width state.
80
That's it.
81
The string %123d is not special.
82
If the first width digit is 0,
83
the fill character becomes 0,
84
which is why %08 behaves the way it always has.
85
And that also explains the first,
86
most basic trick that programmers really learn without thinking about it.
87
%AD means print a decimal integer in a field of at least eight characters wide,
88
padded on the left with spaces. %08d means the same thing,
89
but padded with zeros. %-8d means the left justified instead, padding on the right.
90
Not one little grammar gives you primitive table layout.
91
And yes, in the age of HTML,
92
JSON, CSS, and terminal UI libraries,
93
this can look pretty quaint,
94
but if you're writing logs,
95
diagnostics, status dumps, firmware tools,
96
kernel utilities, old-school command line apps,
97
or anything that needs to be read by a tired human at 2.17 in the morning,
98
aligned columns still matter.
99
A good log file is not just data,
100
it's an emergency instrument panel and printf is one of the oldest examples that we have.
101
The basic numeric conversions are the ones that everybody knows,
102
at least roughly. %d and %i print signed decimal, %u prints unsigned decimal, %x prints hexadecimal with lowercase letters and %X uses uppercase. %O prints octal,
103
which is mostly useful if you live in Unix permission land or PDP-11 land,
104
or enjoy explaining to junior developers why 010 is not always
105
10. %C prints a character, %S prints a string. %P prints a pointer in an implementation-defined form,
106
which is standards committee language,
107
for you'll get something pointer-looking,
108
but don't build a religion around expecting the exact spelling.
109
The size These modifiers are where the harmless looking stuff starts to matter.
110
In Ben's version, H means short and L means long,
111
and his code tracks that with a flag named format long.
112
When it reaches the type state, %LD and %LU are handled differently from ordinary %D and %U because on that old 16-bit target,
113
a long is not just naturally sitting in one register waiting to be printed.
114
The formatter has to fetch two words and treat them as one 32-bit value.
115
This is one of the reasons the printf has always been both powerful and dangerous.
116
The format string is the only thing telling the runtime what to fetch next.
117
If you say %d but pass a double,
118
or say %s and then pass an integer,
119
there's no object metadata coming to rescue you in Classic C.
120
Printf will simply reach into the argument list,
121
pull out the number of bytes it believes that should be there,
122
and interpret them according to the little formatting program that you handed it.
123
It's a bit like giving a blindfolded machinist a box of random parts
124
and instructions and saying the third item is a carburetor.
125
If the third item is actually a mousetrap,
126
the machinist is still going to install it.
127
That's why compilers now try hard to warn you about mismatched format strings and why you should let them.
128
The machine is powerful, but it trusts you in that terrifying old C way,
129
like handing my teenager the keys to my Corvette and saying I'm sure you'll make good decisions.
130
But there's no stability control to save you when you make a mistake.
131
Now let's talk about strings because %s is simple until it isn't.
132
In the ordinary case of printf hello %s,
133
printf expects a pointer to a null terminated character array.
134
It starts at that address and copies characters until it finds a zero byte.
135
Ben's version does this with a sterlin macro that uses repnyscansby,
136
one of those lovely old x86 string instructions
137
that feels like it came from a CPU designed by somebody who wanted assembly language programmers to have a fighting chance.
138
It scans for the null terminator,
139
computes the length, and then uses that length to decide how much fill is needed after or before the string.
140
That gives you the basic string alignment tools.
141
It also gives you right-aligned and left-aligned text fields.
142
If you've ever seen a neatly aligned command table,
143
maybe top or some kind of task manager,
144
there's a decent chance that some descendant of the idea was involved.
145
But modern printf goes further.
146
Precision on a string means maximum string length.
147
That surprises people because with floating points,
148
precision means digits after the decimal or significant digits depending on the conversion.
149
The real trick is the dynamic version where you can pass the length.
150
It's one of the best printf tricks in all of C.
151
It lets you print a substring or a buffer or anything else that is not already null-terminated for you.
152
Suppose you just parse the packet or a file format or a slice of memory and you have a pointer plus length.
153
The beginner copies it out to a temporary buffer,
154
appends a null-byte, prints it,
155
and then frees the buffer.
156
The experience coder uses %.star-s and moves on with his life.
157
That's the kind of idiom
158
that makes you feel like you've just found a hidden door in a house that you've lived in for 20 years.
159
Now before we get to the more exotic stuff,
160
let's finish the basic numeric tour with floating point. %F prints fixed point decimal, %E prints scientific notation. %G chooses between fixed
161
and scientific depending on the value and the requested precision.
162
The uppercase forms of %E and %G use uppercase exponent markers.
163
If you care about the number of digits, precision is your friend.
164
The important thing to remember is that floating point formatting is doing real work.
165
It's not simply moving bytes around.
166
It's converting a binary floating point approximation into a decimal representation that satisfies rounding rules and precision requirements.
167
Modern implementations have extremely clever algorithms for this because the problem is a lot harder than it looks.
168
Converting double detect correctly is one of those jobs that seems pretty boring until you try to do it yourself accurately,
169
at which point it becomes a swamp full of math crocodiles.
170
And this brings us to one of the coolest floating point formats that most people never use, %A.
171
This prints the floating point number in hexadecimal with a binary exponent.
172
Now it's going to look weird, something like this.
173
Certainly not how you would print an invoice,
174
but it's how you print the truth.
175
Decimal output is often trying to be friendly.
176
Hex floating point output is showing what the actual binary structure of the value is.
177
If you're debugging floating point weirdness, %A is your x-ray machine.
178
The number 0.1 looks pretty innocent in decimal,
179
like a little cherub sitting on a cloud,
180
but in binary floating point it is a repeating fraction wearing a fake mustache and %A catches it at the border.
181
Now let's bring in the finale tricks,
182
because this is where printf stops being merely useful and starts being delightfully sneaky.
183
The first is dynamic width and precision with star.
184
Ben's formatter already supported star for width,
185
fetching the field width from the next argument instead of from the format string,
186
and modern printf does that too,
187
and it does it for precision as well.
188
It's cool because it allows your formatting to adapt at runtime.
189
Maybe you scan a table first to discover the widest name and then print all the rows using that width.
190
Maybe you let the user specify a precision.
191
Maybe your diagnostic tool aligns values based on the longest label in the current dataset.
192
Suddenly your format string is not a fixed stencil anymore,
193
it's like a layout engine with parameters.
194
The second trick is SNPrintf as a measuring tool.
195
This is a civilized way to build formatted strings when you don't know how big the result will be.
196
The first call tells you how many characters would have been written,
197
not including the Null Terminator.
198
Then you allocate enough space for exactly that to do the real formatting in.
199
This is infinitely better than guessing that 256 bytes ought to be enough,
200
because as we eventually all learn,
201
the universe loves to punish the phrase, ought to be enough.
202
And the third trick is the alternate format flag, or hash.
203
Most people only know it from hex,
204
printf, percent, hash, x, which gives you something like beef in this case,
205
but it also affects floating point output.
206
With %g trailing zeros are normally removed.
207
With % number sign g they can be preserved.
208
That can matter when you are generating text
209
that will be read by another programmer when 42 and 42.0 are not visually communicating the same idea.
210
The alternate form is printf saying,
211
fine, I'll leave the decimal furniture where you put it.
212
Then the fourth trick is positional arguments,
213
which are common in POSIX style printf implementations and are absolutely vital for localization.
214
Instead of consuming arguments strictly from the left to right,
215
the format string can say use argument number two here and argument number one here.
216
In English, you might say Dave has five messages,
217
but in another language, the natural order might be five messages has Dave,
218
except hopefully in better grammar than that.
219
Without positional arguments, translators are trapped by the English argument order,
220
but with them, the sentence can move around without changing the code.
221
The fifth trick is correct integer size formatting.
222
Not very flashy, but it's the difference between code
223
that works on your machine and code that survives contact with a different architecture.
224
For size T, use %ZU.
225
For pointer def, use %TD.
226
For maximum width integers, use %JD.
227
And for fixed width integers like UN64T,
228
use the macros from inttypes.h.
229
It's not glamorous, but neither is torquing the lug nuts on your Camaro.
230
You still do it because eventually the wheel comes off if you don't.
231
Ask me how I know these things.
232
And not just from C.
233
The sixth trick is %N, which is both fascinating and a little radioactive.
234
%N does not print anything.
235
Instead, it stores the number of characters printed so far into an integer pointer that you provide.
236
After this, the code contains 5.
237
That sounds like a party trick until you realize it can be used to mark offsets while building complicated output.
238
But %n is also infamous in security history
239
because uncontrolled format strings can use it to write to memory
240
and this is why printf of user input is not just bad style,
241
it's an engraved invitation to chaos.
242
So always write %f with a %s and then the user input.
243
Let me state this even more plainly.
244
If you pass user-provided text to the printf formatting engine,
245
it will process it just as surely as if you had hand-coded that yourself.
246
So never do that.
247
That one extra %s is the difference between printing a string
248
and letting the user hand your tiny formatting machine a bag of burglary tools.
249
The seventh trick is binary output where supported.
250
C23 adds %b, finally, for binary integer formatting,
251
and some libraries had their own extensions before that.
252
So when it's supported, this 42 will give you 101010.
253
And depending on your implementation,
254
you may get the 0b prefix.
255
That's such an obvious debugging feature that it's almost funny it took that long and becomes standard in C.
256
C had decimal, octal, and hex from the dawn of time,
257
but binary, actual base the computer uses was apparently too exotic and risque for use in polite company.
258
And then there are the platform-specific treats.
259
On GNU systems, %m prints the string associated with the current error number,
260
without you passing an argument.
261
So instead of writing, you know,
262
printf with open failed str error of the error number, you just write %m.
263
It's not portable C, but it's kind of a good example of how printf has evolved into a diagnostic language,
264
and not just a formatting function.
265
Now here's the part that I think is worth taking pretty seriously.
266
PrintF is old, but it's not primitive.
267
It comes from an era when every byte mattered and every abstraction had to earn its keep.
268
Ben's assembly formatter has dispatch tables for states,
269
lookup tables for hex digits,
270
a unit table for decimal long conversion,
271
scratch buffers on the stack,
272
and careful handling of signs,
273
fill characters, width, and justification.
274
You can almost see the runtime thinking.
275
It's like looking at a pocket watch with the back removed where now every gear has a reason that it exists.
276
And that's why this function is still worth studying.
277
Not because we should go back to writing 16-bit assembly formatters by hand,
278
although I admit there's a certain therapeutic quality to doing that.
279
It's worth studying because Printf is a compact lesson in language design,
280
parsing, calling conventions, data representation,
281
integer conversion, floating-point weirdness, security,
282
localization, and API ergonomics all crammed into one little function that
283
most of us learned in chapter 2 and then never looked at it again.
284
And so, the humble format string is doing a lot more than it lets on.
285
It's a contract between your program and the runtime.
286
It says, here's how to walk through my memory,
287
here's how to interpret each value,
288
and here's how to transform it,
289
and here's how to lay it out for human consumption.
290
Get that contract right, and you get clean diagnostics,
291
readable logs, compact code, and zero copy tricks like percent, point, star, s.
292
Get it wrong, and you get garbage outputs,
293
stack confusion, security vulnerabilities, and the kind of debugging session where you start bargaining with inanimate objects.
294
So the next time you type printf %d,
295
remember that you're not just printing an integer.
296
You're invoking a tiny machine that has been refined across decades of C runtimes,
297
operating systems, compilers, and architectures.
298
It may be old, but it's not dull.
299
And if you learn the parts of it that most programmers never touch,
300
you can make your code cleaner,
301
safer, faster, and sometimes just a little more elegant.
302
Elegance is when the code knows exactly what it's doing,
303
uses exactly what it needs,
304
and leaves nothing unnecessary behind.
305
Which, come to think of it,
306
is not a bad description of a good printf implementation either.
307
If you found today's episode interesting or entertaining,
308
remember that I'm mostly in this for the subs and likes,
309
so I'd be honored if you considered leaving me one of each before you go today.
310
And if you're already subscribed, thank you!
311
And if you have a question or a suggestion,
312
please leave it in the comments here,
313
and we answer the best of them every Friday on ShopTalk on the Dave's Attic channel.
314
Check out an episode at of the link I'll try to throw up here,
315
and subscribe there if you enjoy it as well.
316
In the meantime, and in between time,
317
hope to see you next time,
318
right here in Dave's Garage.
319
Do it, Glenn!
320
Do it!
321
Do it!

Скачать приложение

ИИ-оценка каждого произнесённого вами предложения

Сканировать для скачивания
Сканировать для скачивания
TRENDING

Популярные

Контекст и Фон

В этом видео Дэйв делится своим опытом разработки, обсуждая одну из самых необычных функций в языке программирования C — функцию printf. Он объясняет, как эта функция на самом деле является не просто инструментом для вывода текста, а сложным механизмом форматирования, похожим на интерпретатор. Дэйв ссылается на работу своего бывшего управляющего в Microsoft, который создал быструю замену стандартной функции sprintf, и демонстрирует, как printf можно рассматривать как виртуальную машину для рендеринга текста. Понимание глубинной работы этой функции может открыть новые горизонты для программистов и студентов, изучающих английский язык через технические темы.

Топ-5 Фраз для Повседневного Общения

  • "printf is actually a secret virtual machine"printf на самом деле является секретной виртуальной машиной.
  • "It does its job, and you move on"Она выполняет свою работу, и вы переходите к следующему делу.
  • "It's a formatting engine"Это механизм форматирования.
  • "Converts binary values into human readable text"Преобразует двоичные значения в читаемый текст.
  • "A tiny little text-rendering virtual machine"Маленькая виртуальная машина для рендеринга текста.

Пошаговое руководство по выполнению Shadowing

Чтобы улучшить свои навыки разговорного английского и научиться shadow speech, вы можете следовать этому пошаговому руководству:

  1. Слушайте внимательно: Начните с просмотра видео, не пытаясь повторять. Обратите внимание на произношение и интонацию Дэйва.
  2. Запишите полезные фразы: Выделите из видео 5-10 ключевых фраз, которые больше всего привлекают ваше внимание. Используйте их в своей повседневной практике.
  3. Повторяйте с паузами: Воспроизведите видео, останавливая его после каждой фразы Дэйва. Повторите его слова вслух, стараясь имитировать акцент и интонацию.
  4. Обсуждение: Найдите собеседника, с которым можно обсудить содержание видео и использовать выученные фразы в беседе. Это отличная практика разговорного английского.
  5. Повторяйте и углубляйтесь: Регулярно повторяйте пройденное. Создавайте новые предложения, используя изученные слова и фразы.

Эта методика будет эффективно способствовать вашей практике разговорного английского и поможет вам освоить shadowspeaks, взаимодействуя с контентом, который вам интересен. Не забывайте, что учить английский с YouTube — это не только полезно, но и весело!

Что такое техника Shadowing?

Shadowing — это научно обоснованная техника изучения языка, изначально разработанная для подготовки профессиональных переводчиков и популяризированная полиглотом доктором Александром Аргуэльесом. Метод прост, но эффективен: вы слушаете аудио на английском от носителей языка и немедленно повторяете вслух — как тень, следующая за говорящим с задержкой в 1–2 секунды. В отличие от пассивного прослушивания или грамматических упражнений, Shadowing заставляет мозг и мышцы рта одновременно обрабатывать и воспроизводить реальные речевые паттерны. Исследования показывают, что это значительно улучшает точность произношения, интонацию, ритм, связную речь, понимание на слух и беглость речи — что делает его одним из самых эффективных методов для подготовки к IELTS Speaking и реального общения на английском.

Угостите нас кофе