diff options
| author | martial.simon <martial.simon@epita.fr> | 2025-04-13 19:54:19 +0200 |
|---|---|---|
| committer | martial.simon <martial.simon@epita.fr> | 2025-04-13 19:54:19 +0200 |
| commit | 66c3bbfa94d8a41e58adf154be25e6d86fee8e30 (patch) | |
| tree | 9c5e998f324f2f60c1717759144da3f996c5ae1a /PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb | |
init: initial commit
Diffstat (limited to 'PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb')
| -rw-r--r-- | PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb | 405 |
1 files changed, 405 insertions, 0 deletions
diff --git a/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb b/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb new file mode 100644 index 0000000..7b0068b --- /dev/null +++ b/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb @@ -0,0 +1,405 @@ +{ + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "This lesson has 3 virtues:\n", + "\n", + "* manipulate bases of subspaces\n", + "* mix partial derivatives and vectors\n", + "* find an algorithm more efficient than the gradient method to solve $A \\, {\\bf x} = {\\bf b}$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "# Conjugate gradient method\n", + "\n", + "We have seen that it is possible to compute the optimal µ of the gradiant method to\n", + "go directly to the minimum of the $J$ functional in along the line\n", + "${\\bf x}^k \\; + µ \\, \\nabla J({\\bf x}^k)$ with µ varying from $- \\infty$\n", + "at $+ \\infty$ (see ma33).\n", + "\n", + "We have also seen that if we have the optimal µ, then the next steepest slope,\n", + "$\\nabla J ({\\bf x}^{k+1})$, will be orthogonal to the slope which defines the line on which we seek µ. So we have\n", + "\n", + "$$\\nabla J ({\\bf x}^{k+1})^T \\, . \\, \\nabla J ({\\bf x}^k) = 0$$\n", + "\n", + "This is good because it means that the next minimum, ${\\bf x}^{k+2}$, will be the minimum of the space generated by $\\nabla J ({\\bf x}^{k+1})$ and $\\nabla J ({\\bf x}^k)$\n", + "(which is better than just being the minimum along a line).\n", + "\n", + "Unfortunately nothing tells me that ${\\bf x}^{k+3}$ will not be calculated along our\n", + "first direction, $\\nabla J({\\bf x}^k)$. We have seen in the case\n", + "of the gradient without searching for the optimal µ that we can oscillate between 2 values. That\n", + "is not possible with the optimal µ but we can go around in circles on a few values.\n", + "\n", + "Even if we don't go around in circles and we converge, is it likely that we are looking for\n", + "again a minimum in a part of the $ℝ^n$ space that we have already explored.\n", + "\n", + "An optimal search for the minimum of a convex function in a $ℝ^n$ space does not\n", + "should not take more than $n$ iteration if we are still able to calculate\n", + "the optimal µ in the chosen direction. To be convinced of this, it suffices to choose\n", + "as directions the vectors of the basis of our space $ℝ^n$. Once we\n", + "has found the minimum in all directions of the base, we have the global minimum.\n", + "There is no longer any possible direction of search which would not be the linear combination of the vectors\n", + "already used." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "#### 2D case\n", + "<table><tr>\n", + " <th>\n", + " \n", + "If we look for the point ${\\bf x} = (x_1, x_2)$ which minimizes $J({\\bf x})$ (convex)\n", + "and that we know how to calculate the optimal µ of our gradient method,\n", + "then we find the global minimum in 2 iterations at most.\n", + "\n", + "But id we\n", + "take a too large µ which makes us pass above the minimum\n", + "global, then at the next iteration the largest slope will be exactly\n", + "the opposite of the one we just used and we will oscillate.\n", + "\n", + "On the drawing we can see that we find the global minimum in 2 iterations\n", + "but we say to ourselves that it is a stroke of luck and besides is it really\n", + "in 2 iterations since it is written ${\\bf x}^k$? That's the limit\n", + "of a drawing that wants to explain what is happening in $ℝ^n$ to humans\n", + "accustomed to $ℝ^3$ (see this video on <a href=\"https://www.youtube.com/watch?v=LQFkUjYzOn8\">the 4th dimension</a> outside\n", + "classes).\n", + "\n", + "</th><th>\n", + "<img src=\"images/gradient.png\" width = \"400px\"/>\n", + "</th></tr></table>" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "## Generate a base of $ℝ^n$\n", + "\n", + "If we want to find our global minimum in $n$ maximum iteration, we must\n", + "our directions is not redundant and therefore only the first $n$ directions\n", + "generate $ℝ^n$ or form a basis of $ℝ^n$.\n", + "\n", + "The new direction ${\\bf d}^k$ must therefore be orthogonal to all directions\n", + "preceding ones or that it makes it possible to find a base which generates a\n", + "space of dimension $k+1$ (we start with ${\\bf d}^0$).\n", + "\n", + "Is it possible ?\n", + "\n", + "### The $A {\\bf x} = {\\bf b}$ case\n", + "\n", + "In this case the functional to be minimized is\n", + "\n", + "$$\n", + "J({\\bf x}) = \\frac{1}{2} \\, {\\bf x}^T \\, A \\, {\\bf x} - {\\bf b}\\, . {\\bf x}\n", + "$$\n", + "\n", + "and its gradient is $\\nabla J({\\bf x}) = A \\; {\\bf x} - {\\bf b}$ if A is\n", + "symmetrical.\n", + "\n", + "\n", + "We know that if we calculate ${\\bf x}^k$ as before we only have orthogonality\n", + "from two successive directions. It happens when\n", + "µ is chosen to minimize $J$ along the direction line\n", + " $\\nabla J({\\bf x}^k)$.\n", + " \n", + "What happens if ${\\bf x}^{k+1}$ **minimizes $J$ in space\n", + "${\\bf G_k}$ generated by\n", + "all previous directions**?\n", + "\n", + "$$\n", + "J({\\bf x}^{k+1}) = \\min_{\\bf v \\in G_k} J({\\bf x}^k + {\\bf v})\n", + "$$\n", + "with ${\\bf G_k} = span\\{{\\bf d}^0, {\\bf d}^1,\\dots, {\\bf d}^k\\} \n", + " = \\left\\{ {\\bf v} = \\sum_{i=0}^{k} \\alpha_i \\, {\\bf d}^i \\quad \\forall \\alpha_i \\in ℝ \\right\\}$.\n", + "\n", + "Then all the partial derivatives with respect to the vectors of ${\\bf G_k}$\n", + "are zero (just like $\\partial J / \\partial x$ and $\\partial J / \\partial y$\n", + "are zero at the minimum of J in the drawing above). Vector\n", + "it is written like this:\n", + "\n", + "$$\n", + "\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf w} = 0 \\quad \\forall {\\bf w} \\in {\\bf G_k}\n", + "$$\n", + "\n", + "which is easily verified if ${\\bf w}$ is one of the base vectors:\n", + "\n", + "$$\n", + "\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf e}_i = \n", + "\\begin{bmatrix}\n", + "\\partial J / \\partial x_{1} \\\\\n", + "\\partial J / \\partial x_{2} \\\\\n", + "\\vdots \\\\\n", + "\\partial J / \\partial x_{i} \\\\\n", + "\\vdots \\\\\n", + "\\partial J / \\partial x_{n} \\\\\n", + "\\end{bmatrix}\n", + "\\, . \\,\n", + "\\begin{bmatrix}\n", + "0 \\\\\n", + "0 \\\\\n", + "\\vdots \\\\\n", + "1 \\\\\n", + "\\vdots \\\\\n", + "0 \\\\\n", + "\\end{bmatrix}\n", + "=\n", + "\\frac{\\partial J}{\\partial x_i}({\\bf x}^{k+1}) \n", + "$$\n", + "\n", + "Note that saying the partial derivative of $J$ in a direction ${\\bf w}$\n", + "of ${\\bf G_k}$ is null,\n", + "is equivalent to saying that $\\nabla J({\\bf x}^{k+1})$ is orthogonal to ${\\bf w}$\n", + "(or null if $\\nabla J({\\bf x}^{k+1})$ the same dimension as ${\\bf G_k}$).\n", + "\n", + "#### Let's generate directions ${\\bf d}^i$\n", + "\n", + "So our iterative formula becomes:\n", + "\n", + "$$\n", + "{\\bf x}^{k+1} = {\\bf x}^k - µ^k\\, {\\bf d}^k\n", + "$$\n", + "\n", + "To calculate the ${\\bf d}^k$ we use the formula which tells us that the\n", + "partial derivatives of $J$ with respect to a vector ${\\bf w \\in G_k}$ are\n", + "null. And as the ${\\bf d}^i$ directions generate the ${\\bf G_k}$ space\n", + "it is enough that the partial derivatives of $J$ with respect to all ${\\bf d}^i$\n", + "are zero:\n", + "\n", + "$$\n", + "\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf d}^i = 0 \\quad \\forall i \\le k \\qquad (1)\n", + "$$\n", + "\n", + "We perform the calculations\n", + "\n", + "$$\n", + "\\begin{align}\n", + "(A\\, {\\bf x}^{k+1} - b) \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n", + "(A\\, ({\\bf x}^{k} - µ^k \\, {\\bf d}^k) - b) \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n", + "(A\\, {\\bf x}^{k} - b) \\, . \\, {\\bf d}^i - µ^k \\, A \\, {\\bf d}^k \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n", + "\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^i - µ^k \\, A \\, {\\bf d}^k \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n", + "\\end{align}\n", + "$$\n", + "\n", + "If $i < k$ then the first term is zero which gives\n", + "\n", + "$$\n", + "A \\, {\\bf d}^k \\, . \\, {\\bf d}^i = 0 \\quad \\forall i < k \\qquad (2)\n", + "$$\n", + "\n", + "As A is symmetric, we also have $A \\, {\\bf d}^i \\, . \\, {\\bf d}^k = 0 \\quad \\forall i < k\\quad$ ($\\sum_{m=1}^N \\sum_{n=1}^N a_{m,n}\\, d^i_n\\, d^k_m = \\sum_{n=1}^N \\sum_{m=1}^N a_{n,m}\\, d^k_m\\, d^i_n$).\n", + "\n", + "So we have conditions to build the new direction ${\\bf d}^k$.\n", + "\n", + "We assume that the new direction is the gradient + a correction which is a linear combination of the previous directions:\n", + "$$\n", + "{\\bf d}^k = \\nabla J({\\bf x}^k) + \\sum_{j=0}^{k-1} β_j {\\bf d}^j\n", + "$$\n", + "then equation (2) becomes\n", + "$$\n", + "A \\, \\left( \\nabla J({\\bf x}^k) + \\sum_{j=0}^{k-1} β_j {\\bf d}^j \\right) . {\\bf d}^i = 0 \\quad \\forall i < k\n", + "$$\n", + "The sum will reveal $A\\, {\\bf d}^j {\\bf d}^i$ which are all null according to (2) and its symmetric version except $A\\, {\\bf d}^i {\\bf d}^i$. So\n", + "\n", + "$$\n", + "β_i = - \\frac{A\\, \\nabla J({\\bf x}^k) \\, . \\, {\\bf d}^i}{A\\, {\\bf d}^i \\, . \\, {\\bf d}^i} \n", + "$$\n", + "\n", + "and so\n", + "$$\n", + "{\\bf d}^k = \\nabla J({\\bf x}^k) \n", + " - \\sum_{i=0}^{k-1} \\frac{A\\, \\nabla J({\\bf x}^k) \\, . \\, {\\bf d}^i}\n", + " {A\\, {\\bf d}^i \\, . \\, {\\bf d}^i} \\; {\\bf d}^i\n", + "$$" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "### Calculation of $μ^k$\n", + "\n", + "We have seen that $\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf d}^i = 0$ gives\n", + "\n", + "$$\n", + "\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^i - µ^k \\, A \\, {\\bf d}^k \\, . \\, {\\bf d}^i = 0 \\quad \\forall i \\le k\n", + "$$\n", + "\n", + "so also for $i = k$ which allows to calculate $μ^k$:\n", + "\n", + "$$\n", + "µ^k = \\frac{\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^k}\n", + " {A \\, {\\bf d}^k \\, . \\, {\\bf d}^k}\n", + " = \\frac{(A \\, {\\bf x}^{k} - b) \\, . \\, {\\bf d}^k}\n", + " {A \\, {\\bf d}^k \\, . \\, {\\bf d}^k} \\\\\n", + "\\, \n", + "$$\n", + "\n", + "We therefore have the method to calculate ${\\bf d}^k$ and $µ^k$ which makes it possible to find\n", + "${\\bf x}^{k+1}$ from ${\\bf x}^k$.\n", + "\n", + "### Property\n", + "\n", + "$$\n", + "\\nabla J({\\bf x}^{k+1}) \\perp \\nabla J({\\bf x}^i) \\quad \\forall i \\le k\n", + "$$\n", + "\n", + "**Exercise** Prove this property by induction starting from (0), knowing that the first direction is\n", + "${\\bf d}^0 = \\nabla J({\\bf x}^0)$.\n", + "\n", + "This property shows that the set of gradients of $J$ at points\n", + "${\\bf x}^i$ form a basis of space $\\bf G_k$. So we are sure\n", + "that our algorithm converges in $n$ iterations at most." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "## 2nd attempt\n", + "\n", + "The method found is good because it shows that we can optimize the gradient descent but the calculation\n", + "of ${\\bf d}^k$ is way too heavy. So we're going to start all over again to find another more efficient way.\n", + "\n", + "### Let's work in the base of $\\nabla J({\\bf x}^i)$\n", + "\n", + "We start from equation (0) which allows us to\n", + "show that the $\\nabla J({\\bf x}^i), i \\le k$ form a basis of the subspace $\\bf G_k$. Also any vector is expressed in this basis and therefore\n", + "\n", + "$$\n", + "{\\bf x}^{k+1} = {\\bf x}^k + μ^k \\, {\\bf d}^k \\quad \\textrm{avec} \\quad {\\bf d}^k = - \\nabla J({\\bf x}^k) + \\sum_{i=0}^{k-1} γ^i \\, \\nabla J({\\bf x}^i)\n", + "$$\n", + "\n", + "Note that these ${\\bf d}^i, i \\le k$ also form a basis of $\\bf G_k$.\n", + "\n", + "Equation (2) tells us that $A \\, {\\bf d}^k \\, . \\, {\\bf d}^i = 0 \\quad \\forall i < k$, therefore\n", + "\n", + "$$\n", + "\\begin{align}\n", + "< A \\, {\\bf d}^k, {\\bf x}^{j+1} - {\\bf x}^j > &= 0 \\quad \\forall j < k \\\\ \n", + "< {\\bf d}^k, A \\, ({\\bf x}^{j+1} - {\\bf x}^j) > &= 0 \\quad \\textrm{since A is symetrical} \\\\\n", + "< {\\bf d}^k, A \\, {\\bf x}^{j+1} - {\\bf b} - A {\\bf x}^j + {\\bf b} > &= 0 \\\\\n", + "< {\\bf d}^k, \\nabla J({\\bf x}^{j+1}) - \\nabla J({\\bf x}^j) > &= 0 \\\\\n", + "< - \\nabla J({\\bf x}^k) + \\sum_{i=0}^{k-1} γ^i \\, \\nabla J({\\bf x}^i) , \\nabla J({\\bf x}^{j+1}) - \\nabla J({\\bf x}^j) > &= 0 \\quad (3) \\\\\n", + "\\end{align}\n", + "$$\n", + "\n", + "If j = k-1 then\n", + "$$\n", + "\\begin{align}\n", + "< - \\nabla J({\\bf x}^k), \\nabla J({\\bf x}^k) > + \\, γ^{k-1} < \\nabla J({\\bf x}^{k-1}), -\\nabla J({\\bf x}^{k-1}) > &= 0 \\\\\n", + "γ^{k-1} = -\\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n", + " {< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >}\n", + "\\end{align}\n", + "$$\n", + "\n", + "If j = k-2 then\n", + "$$\n", + "\\begin{align}\n", + "γ^{k-1} \\, < \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) > + \\, γ^{k-2} < \\nabla J({\\bf x}^{k-2}), -\\nabla J({\\bf x}^{k-2}) > &= 0 \\\\\n", + "γ^{k-2} = γ^{k-1} \\frac{< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \n", + " {< \\nabla J({\\bf x}^{k-2}), \\nabla J({\\bf x}^{k-2}) >}\n", + " = \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n", + " {< \\nabla J({\\bf x}^{k-2}), \\nabla J({\\bf x}^{k-2}) >}\n", + "\\end{align}\n", + "$$\n", + "and continuing the calculations, we have $\\forall j < k-1$\n", + "$$\n", + " γ^j = \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n", + " {< \\nabla J({\\bf x}^{j}), \\nabla J({\\bf x}^{j}) >}\n", + "$$\n", + "\n", + "The general formula of ${\\bf d}^k$ is therefore\n", + "$$\n", + "\\begin{align}\n", + "{\\bf d}^k &= -\\nabla J({\\bf x}^k) \n", + " - \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n", + " {< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \\, \\nabla J({\\bf x}^{k-1})\n", + " + \\sum_{i=0}^{k-2} \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n", + " {< \\nabla J({\\bf x}^{i}), \\nabla J({\\bf x}^{i}) >} \\, \\nabla J({\\bf x}^i) \\\\\n", + " &= -\\nabla J({\\bf x}^k) + β \\left( -\\nabla J({\\bf x}^{k-1}) \n", + " + \\sum_{i=0}^{k-2} \\frac{< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \n", + " {< \\nabla J({\\bf x}^{i}), \\nabla J({\\bf x}^{i}) >} \\, \\nabla J({\\bf x}^i) \\right) \\\\\n", + " &= -\\nabla J({\\bf x}^k) + β \\, {\\bf d}^{k-1} \\quad \\textrm{with} \\quad β = \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n", + " {< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \n", + "\\end{align}\n", + "$$\n", + "\n", + "We have therefore succeeded in expressing the new direction only as a function of the current gradient and the previous direction." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "lang": "en" + }, + "source": [ + "### New calculation of μ\n", + "\n", + "The previous calculation of μ remains valid since this calculation does not depend on the way of expressing ${\\bf d}^k$.\n", + "\n", + "$$\n", + "µ^k = -\\frac{\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^k}\n", + " {A \\, {\\bf d}^k \\, . \\, {\\bf d}^k}\n", + "\\, \n", + "$$\n", + "\n", + "With our choice of ${\\bf d}^k$ this gives\n", + "\n", + "$$\n", + "µ^k = -\\frac{<\\nabla J({\\bf x}^{k}), - \\nabla J({\\bf x}^{k}) + β \\, {\\bf d}^{k-1}>}\n", + " {<A \\, {\\bf d}^k, {\\bf d}^k>}\n", + " = \\frac{<\\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k})>}\n", + " {<A \\, {\\bf d}^k, {\\bf d}^k>}\n", + "$$\n", + "since ${\\bf d}^{k-1} \\perp \\nabla J({\\bf x}^{k})$." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ] +}
\ No newline at end of file |
