Visualizing the chain rule and product rule | Essence of calculus, chapter 4

In the last videos I talked about the derivatives
of simple functions, things like powers of x, sin(x), and exponentials, the goal being
to have a clear picture or intuition to hold in your mind that explains where these formulas
come from. Most functions you use to model the world
involve mixing, combining and tweaking these these simple functions in some way; so our
goal now is to understand how to take derivatives of more complicated combinations; where again,
I want you to have a clear picture in mind for each rule. This really boils down into three basic ways
to combine functions together: Adding them, multiplying them, and putting one inside the
other; also known as composing them. Sure, you could say subtracting them, but
that’s really just multiplying the second by -1, then adding. Likewise, dividing functions is really just
the same as plugging one into the function 1/x, then multiplying. Most functions you come across just involve
layering on these three types of combinations, with no bound on how monstrous things can
become. But as long as you know how derivatives play
with those three types of combinations, you can always just take it step by step and peal
through the layers. So, the question is, if you know the derivatives
of two functions, what is the derivative of their sum, of their product, and of the function
compositions between them? The sum rule is the easiest, if somewhat tounge-twisting
to say out loud: The derivative of a sum of two functions is the sum of their derivatives. But it’s worth warming up with this example
by really thinking through what it means to take a derivative of a sum of two functions,
since the derivative patterns for products and function composition won’t be so straight
forward, and will require this kind of deeper thinking. The function f(x)=sin(x) + x2 is a function
where, for every input, you add together the values of sin(x) and x2 at that point. For example, at x=0.5, the height of the
sine graph is given by this bar, the height of the x2 parabola is given by this bar, and
their sum is the length you get by stacking them together. For the derivative, you ask what happens as
you nudge the input slightly, maybe increasing it to 0.5+dx. The difference in the value of f between these
two values is what we call df. Well, pictured like this, I think you’ll
agree that the total change in height is whatever the change to the sine graph is, what we might
call d(sin(x)), plus whatever the change to x2 is, d(x2). We know the derivative of sine is cosine,
and what that means is that this little change d(sin(x)) would be about cos(x)dx. It’s proportional to the size of dx, with
a proportionality constant equal to cosine of whatever input we started at. Similarly, because the derivative of x2 is
2x, the change in the height of the x2 graph is about 2x*dx. So, df/dx, the ratio of the tiny change to
the sum function to the tiny change in x that caused it, is indeed cos(x)+2x, the sum of
the derivatives of its parts. But like I said, things are a bit different
for products. Let’s think through why, in terms of tiny
nudges. In this case, I don’t think graphs are our
best bet for visualizing things. Pretty commonly in math, all levels of math
really, if you’re dealing with a product of two things, it helps to try to understand
it as some form of area. In this case, you might try to configure some
mental setup of a box whose side-lengths are sin(x) and x2. What would that mean? Well, since these are functions, you might
think of these sides as adjustable; dependent on the value of x, which you might think of
as a number that you can freely adjust. So, just getting the feel for this, focus
on that top side, whose changes as the function sin(x). As you change the value of x up from 0, it
increases in up to a length of 1 as sin(x) moves towards its peak. After that, it starts decreasing as sin(x)
comes down from 1. And likewise, that height changes as x2. So f(x), defined as this product, is the area
of this box. For the derivative, think about how a tiny
change to x by dx influences this area; that resulting change in area is df. That nudge to x causes the width to change
by some small d(sin(x)), and the height to change by some d(x2). This gives us three little snippets of new
area: A thin rectangle on the bottom, whose area is its width, sin(x), times its thin
height, d(x2); there’s a thin rectangle on the right, whose area is its height, x2,
times its thin width, d(sin(x)). And there’s also bit in the corner. But we can ignore it, since its area will
ultimately be proportional to dx2, which becomes negligible as dx goes to 0. This is very similar to what I showed last
video, with the x2 diagram. Just like then, keep in mind that I’m using
somewhat beefy changes to draw things, so we can see them, but in principle think of
dx as very very small, meaning d(x2) and d(sin(x)) are also very very small. Applying what we know about the derivative
of sine and x2, that tiny change d(x2) is 2x*dx, and that tiny change d(sin(x)) is cos(x)dx. Dividing out by that dx, the derivative df/dx
is sin(x) by the derivative of x2, plus x2 by the derivative of sine. This line of reasoning works for any two functions. A common mnemonic for the product rule is
to say in your head “left d right, right d left”. In this example, sin(x)*x2, “left d right”
means you take the left function, in this case sin(x), times the derivative of the right,
x2, which gives 2x. Then you add “right d left”: the right
function, x2, times the derivative of the left, cos(x). Out of context, this feels like kind of a
strange rule, but when you think of this adjustable box you can actually see how those terms represent
slivers of area. “Left d right” is the area of this bottom
rectangle, and “right d left” is the area of this rectangle on the right. By the way, I should mention that if you multiply
by a constant, say 2*sin(x), things end up much simpler. The derivative is just that same constant
times the derivative of the function, in this case 2*cos(x). I’ll leave it to you to pause and ponder
to verify that this makes sense. Aside from addition and multiplication, the
other common way to combine functions that comes up all the time is function composition. For example, let’s say we take the function
x2, and shove it on inside sin(x) to get a new function, sin(x2). What’s the derivative of this new function? Here I’ll choose yet another way to visualize
things, just to emphasize that in creative math, we have lots of options. I’ll put up three number lines. The top one will hold the value of x, the
second one will represent the value of x2, and that third line will hold the value of
sin(x2). That is, the function x2 gets you from line
1 to line 2, and the function sine gets you from line 2 to line 3. As I shift that value of x, maybe up to the
value 3, then value on the second shifts to whatever x2 is, in this case 9. And that bottom value, being the sin(x2),
will go over to whatever the sin(9) is. So for the derivative, let’s again think
of nudging that x-value by some little dx, and I always think it’s helpful to think
of x starting as some actual number, maybe 1.5. The resulting nudge to this second value,
the change to x2 caused by such a dx, is what we might call d(x2). You can expand this as 2x*dx, which for our
specific input that length would be 2*(1.5)*dx, but it helps to keep it written as d(x2) for
now. In fact let me go one step further and give
a new name to x2, maybe h, so this nudge d(x2) is just dh. Now think of that third value, which is pegged
at sin(h). It’s change d(sin(h)); the tiny change caused
by the nudge dh. By the way, the fact that it’s moving left
while the dh bump is to the right just means that this change d(sin(h)) is some negative
number. Because we know the derivative of sine, we
can expand d(sin(h)) as cos(h)*dh; that’s what it means for the derivative of sine to
be cosine. Unfolding things, replacing h with x2 again,
that bottom nudge is cos(x2)d(x2). And we could unfold further, noting that d(x2)
is 2x*dx. And it’s always good to remind yourself
of what this all actually means. In this case where we started at x=1.5 up
top, this means that the size of that nudge on the third line is about cos(1.52)*2(1.5)*(the
size of dx); proportional to the size of dx, where the derivative here gives us that proportionality
constant. Notice what we have here, we have the derivative
of the outside function, still taking in the unaltered inside function, and we multiply
it by the derivative of the inside function. Again, there’s nothing special about sin(x)
and x2. If you have two functions g(x) and h(x), the
derivative of their composition function g(h(x)) is the derivative of g, evaluated at h(x),
times the derivative of h. This is what we call the “chain rule”. Notice, for the derivative of g, I’m writing
it as dg/dh instead of dg/dx. On the symbolic level, this serves as a reminder
that you still plug in the inner function to this derivative. But it’s also an important reflection of
what this derivative of the outer function actually represents. Remember, in our three-lines setup, when we
took the derivative of sine on the bottom, we expanded the size of the nudge d(sin) as
cos(h)*dh. This was because we didn’t immediately know
how the size of that bottom nudge depended on x, that’s kind of the whole thing we’re
trying to figure out, but we could take the derivative with respect to the intermediate
variable h. That is, figure out how to express the size
of that nudge as multiple of dh. Then it unfolded by figuring out what dh was. So in this chain rule expression, we’re
saying look at the ratio between a tiny change in g, the final output, and a tiny change
in h that caused it, h being the value that we’re plugging into g. Then multiply that by the tiny change in h
divided by the tiny change in x that caused it. The dh’s cancel to give the ratio between
a tiny change in the final output, and the tiny change to the input that, through a certain
chain of events, brought it about. That cancellation of dh is more than just
a notational trick, it’s a genuine reflection of the tiny nudges that underpin calculus. So those are the three basic tools in your
belt to handle derivatives of functions that combine many smaller things: The sum rule,
the product rule and the chain rule. I should say, there’s a big difference between
knowing what the chain rule and product rules are, and being fluent with applying them in
even the most hairy of situations. I said this at the start of the series, but
it’s worth repeating: Watching videos, any videos, about these mechanics of calculus
will never substitute for practicing them yourself, and building the muscles to do these
computations yourself. I wish I could offer to do that for you, but
I’m afraid the ball is in your court, my friend, to seek out practice. What I can offer, and what I hope I have offered,
is to show you where these rules come from, to show that they’re not just something
to be memorized and hammered away; but instead are natural patterns that you too could have
discovered by just patiently thinking through what a derivative means. Thank you to everyone who supported this series,
and once more I’d like to say a special thanks to For those of you who want to go flex those
problem solving muscles, Brilliant offers a platform aimed at training you to think
like a mathematician. I don’t know about you, but I’ve always
found it all too easy to fall into the habit of just reading math or watching lectures
without taking the time to do some real problem-solving in between, even though that’s always the
part where I learn the most. Brilliant is a great place to get that practice,
and if you visit, or more simply follow the link on the screen and in
the description, it lets them know you came from this channel. Their calculus material is a nice complement
to this series, but some of my other favorites are their probability and complex algebra


  1. Next up will derivative of exponential functions. See the full playlist at

  2. Hey, 3B1B, I'm a first year engineering student discovering calculus. I love the visualisations you do. Yet, as you said, one must practice to become fluent in the calculations. Do you have any suggestions for good websites with questions and answers to practice?

  3. After all these legendary explanation no one dared to solve the differential of what occurred at 14:55 No this can't happen here is the mighty answer:-

    4(e^sinx. Cos(1/x^3+x^3))
    + 3(1/x^4-x^2)sin(1/x^3+x^3).e^sinx)

  4. I asked my teacher for the proof of product rule, and he was like, "No, No . It's NOT in the SYLLABUS"
    Pretty much he only taught me how to solve questions not how to understand mathematics.
    I love you 3Blue1Brown

  5. You might actually be the best math teacher that ever lived! (if you consider the numbers you can reach with the internet). Thank you for what you do, brought passion back into math for me after getting crushed in a 5 year physics degree >< now I can't stop drawing connections between the high level understandings you elucidate!

  6. Smh, I first read the quote as “peeing on an onion” and I thought, “I mean I guess it is mildly unpleasant and somewhat confusing, and a bit of a non sequitur, weird metaphor though”

  7. Around 7:00 about the product rule… why do you literally ignore that smallest part of the area increase by saying ”because dx is really small”, but not for the other parts

  8. If someone made a structural engineering series as good as this we would have a load of people interested in going into engineering. The great thing about this series is that all this mathematics from 3B1B is extremely relevant to engineering and other real-world fields. The problem for the most part is that people are interested in stupid things like quantum mechanics and relativity because it's so weird but they will never actually apply that knowledge to anything.

  9. So I'm trying to convince myself that the little red rectangle can be discarded, but I don't see why. Sure, it's infinitesimal – but so are the other things there…

    How come the result is not dh(x)/dx * g(x) + dg(x)/dx * h(x) + dh(x)/dx * dg(x)/dx ?

  10. Congratulations to the Channel! These videos are awesome! I've recommended to my chemistry students too.

  11. Fortunately I got to know about you through "DOS",and now I'm marvelled with your thoughts , explanation , visualisation , narration and everything else…….

  12. Thank you for these lovely videos. I have taken Calculus 1 but quickly became scared I lost my knowledge of it due to the lack of usage of the rules, however now I can see once again, as I did in the beginning why it is mainly a conceptual jump than route memorization. You've erased my anxiety about the concept by being so thoughtful in your videos. I will definitely support you on Patreon soon.

  13. I've been learning for the last weeks for my phd defense and watched a lot of videos, but you are really on top of all learning sources: best explanation, best animation, easiest understanding..! Please carry on and thanks a lot!

  14. Will be cool to have an interactive plot where you can interact with the 3 axis plot for visualizing the chain rule, great series! Thank you!

  15. I'm a little upset with I learned divergence/convergence and subsets pretty quick through them… But they deleted my posts on Boolean function and astrophysics.

  16. One thing I would like to share my experience with you all during my JEE preparation days. (JEE Advanced is India's most difficult Engineering entrance exam. It contains topics from Algebra, Trigonometry, 3D Geometry, Calculus, Classical Physics, Electromagnetism, Atomic Physics, Organic Chemistry, Inorganic Chemistry and Physical Chemistry as it's syllabus)

    Since the entire portion of the advanced exam needed to be completed in 16 months or so, we were never really explained the basic crux of calculus and I was not that brilliant at that time to ask all sorts of questions. I never questioned as to why we needed to trust what was in the books. What inspiration did the inventor might have gotten while working this subject out? Never asked about it. Never imagined about it. But coming on this channel, I realised that how asking questions is the ultimate roadmap for answers. I know all the rules and stuff. But I still get exited like a 10 year old when the inner workings of these are shown to me. Keep up the good work, +3Blue1Brown!

  17. The examples he uses here are sin(x) and x^2.

    The Desmos logo has a parabola and a sine wave on it.


  18. At 6:55, why wouldn't it be sin(dx)x^2 + (dx)^2sin(x), instead of d(sin(x))x^2 + d(x^2)sin(x) ? Are they the same, or is there something I'm missing?

  19. Derivative of the function at 1:26

  20. The chain rule workthrough was elegent and eye-opening. Absolutely loved it. I'm about to work through a load of problems not by using the rules I memorised, but by breaking them down like you have here. Thanks, man.

  21. from the product rule: will it always be possible to substitute the derivative of a function with the actual function (d(x^2) <=> 2x)? it seems a little weird for me that you could easily do df = sin(x)dx^2+x^2*dsin(x) = sin(x)2x+x^2*cos(x): so i'm wondering if there is some underlying rule to follow.

  22. Was lucky enough to have a teacher in high school who explain the chain rule and other calculus topics the way they are here. Not as visually appealing, but with the same inquisitive approach to learning. It makes a whole lot of difference on having kids love the course and not dread it.

  23. Thank you so much for this series! This has helped me understand calculus on an intuitive level, not just memorizing formulas and patterns. This is awesome, you're awesome, and keep it up!!

  24. The product rule of derivative is so enlightening. Never knew, neither imagined! This is not just calculus. This is more like a realization through some philosophy! Pranamam GURU (Namaskara in a deepest, revered sense)

  25. Hi. I am a CS undergrad. These vids are so interesting.
    Other youtubers of nearly same level are OAlabs, Practical Engineers, Live Overflow, Kurzgesta in nutshell, Vsauce, Veritasium.

  26. I still cannot visualize the chain rule with this explanation, my way of visualizing it is:

    Imagine a curve drawn on a paper, let's call that curve f and each point of the curve f(t), the f(0) y f(1) being the endpoints and moving t from 0 to 1 traverses the curve.
    Imagine then that we put the paper over a rock surface with different engravings, bending the paper to fit the rock surface perfectly, let's call the surface S and S(p) the height of the surface at the point p.
    In this way, if we move 0 to 1 and check the value of S(f(t)) it would follow the curve in the paper that is being bended by the rock surface.
    The derivative of f is the direction of the curve when the paper is flat, and the derivative of S(f) would be the direction of the curve when it's on the bended paper.
    ¿How could we know the derivative of S(f), specifically (S(f))'(t) ?
    First we check how is the lope of the surface S around the point being asked, that means, the derivative of S evaluated on f(t), or S'(f(t))
    Then, what's the direction of the curve in the paper of that point, that is f'(t)
    Then project the direction on the slope of S, that is the dot product between S'(f(t)) and f'(t). And then you have it: the chain rule.

    The downside: it requires vector calculus, so it's not suitable for someone learning calculus for the first time.

  27. American notation:

    f(x) = g(x) * h(x)
    df/dx = g(x)*dh/dx + h(x)*dg/dx

    European notation:

    f(x) = g(x) * h(x)
    f'(x) = g'(x)*h(x) + g(x)*h'(x)

  28. Did I understand ?
    Well yes but actually no
    The "dx" that we write in derivatives and integrals still feels like a notation to me… I really have to beat this thought and think of it as a tiny little change of x. It should be easy, but it's actually quite difficult when the mathematical notations are so confusing. You can't really express those whole words just with letters (especially something like "tiny little change over something") – I mean if you just see "dx" without context you might think it's the product of d by x or something like that.
    If there is ONE thing that I want to improve, it's the mathematical notations. I loved your triangle of power for example.
    But it's even more difficult, because notations have to be concise (lots of informations in few space), meaningful (symbols reflecting what they mean), unique (avoid different symbols meaning the same thing, like what we have for product or division), precise (avoid confusion), promote generalization (similar ideas = similar notations), and finally, its format has to reflect organization (meaning symbols, but also graphical placement). Those words are not from me, I don't remember where I read this though. Maybe I'll find it one day.

  29. I suppose if we perform calculations in a certain algorithm, a program where the differential does not really approach zero, but is equal to some specific small value that the hardware platform can allow us, then this differential can not be ignored in the calculation of the derivative?

  30. Please explain lntegral by geometry as you do with derivative
    In amazing view and how itss graph equation write down. it would be milestone.l hope you under stand my expectation.

  31. Other people discover the rules, and I learn the rules to take tests? What's the point of learning maths? The whole system is so huge and never ending

  32. Exceptionally exceptional clarity in concepts. If wish I had found you in 2000 . All higher engineering , science topics use to go over the head due to poor grasp of math topics. U r doing exceptional service to academics as well as to future advancement of science and technology

  33. Your videos really help me with understanding the uses and applications of calculus in a real-world setting. Thx for doing what you do.

  34. Please make a video on how to plot graphs of mixed functions like how x² and sinx mixed up to give sinx²(X) graphically..🙏

  35. standard notation is hiding something that dramatically simplifies working with differentials algebraically: (a + b), (a * b), (a / b), (a ^ b), (log_a[b]) … we usually don't think of x^y as meaning that this is a binary operator where both x and y vary, just like with addition. x^y with y constant (ie: y=2) is a partial with respect to x. x^y with x constant is a partial with respect to y. It took me forever to figure out what log_a[b] should be, when both vary, including a(!!). d[x^y] = x^y (y/x) d[x] + x^y log_e[x] d[y] …. See Johnathan Bartlett's "refactoring calculus" to clean up other notational messes such as a way to write higher derivatives so that differentials always work algebraically.

    These sorts of notational tweaks are necessary to write code that implements the algebra without hand-jamming in special cases. It's a lot like dealing with a freaky-old code base that needs some refactoring to eliminate useless duplication.

  36. Hey just a minor animation mistake at 5:05 I guess. Since 3 is going from 0 to 1 the value of x^2 would be decreasing. Therefore, x^2 edge should be getting smaller until x reaches the value of 1

Leave a Reply

Your email address will not be published. Required fields are marked *