Total Differential & the Fundamental Lemma
§2.6 - Kaplan, 5th Edition
Along the $x$-axis $f(x,0)=0$ for all $x$, so $f_x(0,0)=0$. Along the $y$-axis $f(0,y)=0$, so $f_y(0,0)=0$. Both partials exist. But along the line $y=x$ we have $f(x,x) = \tfrac{1}{2}$ for every $x \neq 0$, so $f$ does not approach $f(0,0)=0$ as $(x,y)\to(0,0)$. The function is discontinuous.
So “has partial derivatives” is a weaker condition than we want. It does not even imply continuity, let alone the chain rule. We need a stronger notion - one that captures “the function is well-approximated by a linear map.” That is the total differential.
The right notion: differentiability
Fix a point $(x,y)$ and consider a small step $(\Delta x, \Delta y)$. The change in $z = f(x,y)$ is
We say $f$ has a total differential at $(x,y)$, or is differentiable there, if we can write
where $a, b$ are constants (independent of $\Delta x, \Delta y$), and $\varepsilon_1, \varepsilon_2$ are functions of $(\Delta x, \Delta y)$ with
Read this carefully. The first piece $a\,\Delta x + b\,\Delta y$ is a linear function of the increments. The second piece $\varepsilon_1 \Delta x + \varepsilon_2 \Delta y$ is the leftover error. The condition is not just that the error is small - it is that the error is smaller than the step itself: each $\varepsilon_i$ vanishes as we shrink the step, so the error shrinks faster than $\Delta x$ or $\Delta y$ alone.
When this works, the linear part is called the total differential:
The picture to hold: $\Delta z$ is what really happens; $dz$ is the tangent-plane prediction; the lemma says the gap between them is sub-linear in the step.
Watching the error vanish
Take $f(x,y) = x^2 - y^2$ (a saddle). Its partials are $f_x = 2x$, $f_y = -2y$, so the linear estimate at $(a,b)$ is $\;dz = 2a\,\Delta x - 2b\,\Delta y.$ The heatmap below plots the relative error
across a grid of step sizes $(\Delta x, \Delta y)$. The denominator is the length of the step. If $f$ is differentiable, this ratio must vanish as we approach the center - the error shrinks faster than the step.
Shrink the window: the entire heatmap fades toward zero. The center point - where the step is zero - is where the ratio approaches its limit, $0$. That is exactly the lemma’s content, made visible.
What differentiability buys you
Suppose $f$ is differentiable at $(x,y)$, so $\Delta z = a\,\Delta x + b\,\Delta y + \varepsilon_1 \Delta x + \varepsilon_2 \Delta y$ with $\varepsilon_i \to 0$.
(1) Set $\Delta y = 0$. Then $\Delta z = a\,\Delta x + \varepsilon_1 \Delta x$, so
By the symmetric argument $b = \partial f / \partial y$. So if $f$ is differentiable, the coefficients $a$ and $b$ are forced to be the partial derivatives - the differential is unique.
(2) As $(\Delta x, \Delta y) \to (0,0)$, every term in $\Delta z$ goes to zero, so $f(x+\Delta x, y+\Delta y) \to f(x,y)$. Differentiable implies continuous - the property the partials-only condition failed to give us.
So differentiability is the stronger condition we wanted. It implies continuity, it pins down the linear coefficients, and (next card) it is exactly what makes the chain rule work.
The Fundamental Lemma
Knowing that the partials must equal $a, b$ if $f$ is differentiable does not yet tell us when $f$ is differentiable. The Fundamental Lemma gives a clean sufficient condition:
The proof is a clever two-step trick. We change $x$ first with $y$ held fixed, then change $y$ with the new $x$ held fixed. Each step is a one-variable change, so the one-variable Mean Value Theorem applies.
Step 1 (vary $x$ alone). With $y$ held fixed, the function $x \mapsto f(x, y)$ is a differentiable function of one variable, and the Mean Value Theorem gives some $x_1$ between $x$ and $x + \Delta x$ with
Since $f_x$ is continuous, $f_x(x_1, y) \to f_x(x, y)$ as $\Delta x \to 0$. Define $\varepsilon_1 = f_x(x_1, y) - f_x(x, y)$; then $\varepsilon_1 \to 0$ and
Step 2 (vary $y$ alone). Now hold the new $x$-coordinate $x + \Delta x$ fixed. The same MVT argument on $y \mapsto f(x + \Delta x, y)$ gives some $y_1$ between $y$ and $y + \Delta y$ with
Continuity of $f_y$ now does double duty: $f_y(x + \Delta x, y_1) \to f_y(x, y)$ as both $\Delta x, \Delta y \to 0$. Define $\varepsilon_2 = f_y(x + \Delta x, y_1) - f_y(x, y)$; then $\varepsilon_2 \to 0$ and
Add. The total change $\Delta z$ telescopes through the intermediate point: it is the sum of the changes in $(*)$ and $(**)$. Adding gives exactly
which is the definition of differentiability with $a = f_x$, $b = f_y$. $\blacksquare$
The single line that does the work is the telescoping decomposition $\Delta z = [f(x+\Delta x, y) - f(x,y)] + [f(x+\Delta x, y+\Delta y) - f(x+\Delta x, y)]$. Each bracket is a one-variable difference, and the one-variable MVT handles each independently. This is a useful trick to remember - it is exactly the same idea that drives the proof of the chain rule in §2.9.
The differential, in notation
Once differentiability is established, we replace the increments $\Delta x, \Delta y$ with the symbols $dx, dy$ and write
A small honesty: $dx$ and $dy$ are not “infinitesimals.” They are independent real variables - the increments we plug in. The merit of the notation is that it generalizes: for $w = f(x_1, \ldots, x_n)$,
Example (Kaplan, p. 89). If $w = \dfrac{x + y}{z}$, then
The differential is just the partials, lined up against $dx, dy, dz$. Mechanical once you have it - but the lemma is what justifies treating it as a real approximation.
Pitfall: partials existing is not enough
Return to the hook. We had $f(x,y) = xy/(x^2 + y^2)$ with $f(0,0) = 0$, and $f_x(0,0) = f_y(0,0) = 0$.
Suppose, for contradiction, that $f$ were differentiable at $(0,0)$. Then by the theorem of two cards back, $f$ would be continuous at $(0,0)$. But it is not - $f(x,x) = \tfrac12$ along the diagonal. Contradiction. So $f$ is not differentiable at the origin, even though both partials exist there.
What hypothesis of the Fundamental Lemma fails? The partials are not continuous at $(0,0)$. Compute, for $(x,y) \neq (0,0)$,
Approaching $(0,0)$ along $y = x$ gives $f_x(x,x) = 0$. Approaching along $y = 2x$ gives $f_x(x, 2x) = \dfrac{2x \cdot 3x^2}{25 x^4} = \dfrac{6}{25 x}$, which blows up. So $f_x$ is discontinuous at the origin - the lemma’s hypothesis is violated, and the conclusion (differentiability) does not follow. Everything fits.
This is Kaplan’s Problem 7 after §2.6. The level curves of $f$ are lines through the origin (with $f$ taking different constant values on different lines), which makes the discontinuity at the origin geometrically obvious.
Practice Problems - §2.6
From Kaplan, problems after §2.6.
$\dfrac{\partial z}{\partial x} = y$, $\quad \dfrac{\partial z}{\partial y} = x$.
$dz = y\,dx + x\,dy.$
Both partials are continuous everywhere, so the Fundamental Lemma applies on the entire plane.
Use $\log\sqrt{u} = \tfrac12 \log u$, so $z = \tfrac12 \log(x^2 + y^2).$
$\dfrac{\partial z}{\partial x} = \dfrac{1}{2} \cdot \dfrac{2x}{x^2 + y^2} = \dfrac{x}{x^2 + y^2}$, and similarly $\dfrac{\partial z}{\partial y} = \dfrac{y}{x^2 + y^2}.$
$dz = \dfrac{x\,dx + y\,dy}{x^2 + y^2}.$
Valid on the punctured plane $(x,y) \neq (0,0)$, where the partials exist and are continuous.
$\Delta z = (1 + \Delta x)^2 + 2(1+\Delta x)(1+\Delta y) - (1 + 2)$
$= 1 + 2\Delta x + \Delta x^2 + 2(1 + \Delta x + \Delta y + \Delta x \Delta y) - 3$
$= 4\Delta x + 2\Delta y + \Delta x^2 + 2\Delta x \Delta y.$
$f_x = 2x + 2y$, so $f_x(1,1) = 4$. $f_y = 2x$, so $f_y(1,1) = 2$. Hence $dz = 4\Delta x + 2\Delta y.$
$\Delta z - dz = \Delta x^2 + 2\,\Delta x\,\Delta y = (\varepsilon_1)\Delta x + (\varepsilon_2)\Delta y$ with, say, $\varepsilon_1 = \Delta x$ and $\varepsilon_2 = 2\Delta x$. Both go to $0$ with the step, as the lemma demands.
Numerical check at $\Delta x = \Delta y = 0.01$: $\Delta z = 0.04 + 0.02 + 0.0001 + 0.0002 = 0.0603$ vs $dz = 0.06$. Error $0.0003$ - small relative to the $0.06$ change, and it shrinks quadratically as the step shrinks.
Near $(1,2)$, $f(x,y) \approx 3 + 2(x - 1) + 5(y - 2).$
$f(1.1, 1.8) \approx 3 + 2(0.1) + 5(-0.2) = 3 + 0.2 - 1.0 = 2.2.$
$f(1.2, 1.8) \approx 3 + 2(0.2) + 5(-0.2) = 3 + 0.4 - 1.0 = 2.4.$
$f(1.3, 1.8) \approx 3 + 2(0.3) + 5(-0.2) = 3 + 0.6 - 1.0 = 2.6.$
These are “reasonable” estimates because the lemma promises the linear approximation has sub-linear error - so for small steps like $(0.1, -0.2)$ they should be close to the true value, granted the partials are continuous near $(1,2)$.