summaryrefslogtreecommitdiff
path: root/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb
diff options
context:
space:
mode:
authormartial.simon <martial.simon@epita.fr>2025-04-13 19:54:19 +0200
committermartial.simon <martial.simon@epita.fr>2025-04-13 19:54:19 +0200
commit66c3bbfa94d8a41e58adf154be25e6d86fee8e30 (patch)
tree9c5e998f324f2f60c1717759144da3f996c5ae1a /PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb
init: initial commit
Diffstat (limited to 'PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb')
-rw-r--r--PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb405
1 files changed, 405 insertions, 0 deletions
diff --git a/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb b/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb
new file mode 100644
index 0000000..7b0068b
--- /dev/null
+++ b/PVCM/cama/en/ma60 Méthode du gradient conjugué.ipynb
@@ -0,0 +1,405 @@
+{
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4,
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "This lesson has 3 virtues:\n",
+ "\n",
+ "* manipulate bases of subspaces\n",
+ "* mix partial derivatives and vectors\n",
+ "* find an algorithm more efficient than the gradient method to solve $A \\, {\\bf x} = {\\bf b}$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "# Conjugate gradient method\n",
+ "\n",
+ "We have seen that it is possible to compute the optimal µ of the gradiant method to\n",
+ "go directly to the minimum of the $J$ functional in along the line\n",
+ "${\\bf x}^k \\; + µ \\, \\nabla J({\\bf x}^k)$ with µ varying from $- \\infty$\n",
+ "at $+ \\infty$ (see ma33).\n",
+ "\n",
+ "We have also seen that if we have the optimal µ, then the next steepest slope,\n",
+ "$\\nabla J ({\\bf x}^{k+1})$, will be orthogonal to the slope which defines the line on which we seek µ. So we have\n",
+ "\n",
+ "$$\\nabla J ({\\bf x}^{k+1})^T \\, . \\, \\nabla J ({\\bf x}^k) = 0$$\n",
+ "\n",
+ "This is good because it means that the next minimum, ${\\bf x}^{k+2}$, will be the minimum of the space generated by $\\nabla J ({\\bf x}^{k+1})$ and $\\nabla J ({\\bf x}^k)$\n",
+ "(which is better than just being the minimum along a line).\n",
+ "\n",
+ "Unfortunately nothing tells me that ${\\bf x}^{k+3}$ will not be calculated along our\n",
+ "first direction, $\\nabla J({\\bf x}^k)$. We have seen in the case\n",
+ "of the gradient without searching for the optimal µ that we can oscillate between 2 values. That\n",
+ "is not possible with the optimal µ but we can go around in circles on a few values.\n",
+ "\n",
+ "Even if we don't go around in circles and we converge, is it likely that we are looking for\n",
+ "again a minimum in a part of the $ℝ^n$ space that we have already explored.\n",
+ "\n",
+ "An optimal search for the minimum of a convex function in a $ℝ^n$ space does not\n",
+ "should not take more than $n$ iteration if we are still able to calculate\n",
+ "the optimal µ in the chosen direction. To be convinced of this, it suffices to choose\n",
+ "as directions the vectors of the basis of our space $ℝ^n$. Once we\n",
+ "has found the minimum in all directions of the base, we have the global minimum.\n",
+ "There is no longer any possible direction of search which would not be the linear combination of the vectors\n",
+ "already used."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "#### 2D case\n",
+ "<table><tr>\n",
+ " <th>\n",
+ " \n",
+ "If we look for the point ${\\bf x} = (x_1, x_2)$ which minimizes $J({\\bf x})$ (convex)\n",
+ "and that we know how to calculate the optimal µ of our gradient method,\n",
+ "then we find the global minimum in 2 iterations at most.\n",
+ "\n",
+ "But id we\n",
+ "take a too large µ which makes us pass above the minimum\n",
+ "global, then at the next iteration the largest slope will be exactly\n",
+ "the opposite of the one we just used and we will oscillate.\n",
+ "\n",
+ "On the drawing we can see that we find the global minimum in 2 iterations\n",
+ "but we say to ourselves that it is a stroke of luck and besides is it really\n",
+ "in 2 iterations since it is written ${\\bf x}^k$? That's the limit\n",
+ "of a drawing that wants to explain what is happening in $ℝ^n$ to humans\n",
+ "accustomed to $ℝ^3$ (see this video on <a href=\"https://www.youtube.com/watch?v=LQFkUjYzOn8\">the 4th dimension</a> outside\n",
+ "classes).\n",
+ "\n",
+ "</th><th>\n",
+ "<img src=\"images/gradient.png\" width = \"400px\"/>\n",
+ "</th></tr></table>"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "## Generate a base of $ℝ^n$\n",
+ "\n",
+ "If we want to find our global minimum in $n$ maximum iteration, we must\n",
+ "our directions is not redundant and therefore only the first $n$ directions\n",
+ "generate $ℝ^n$ or form a basis of $ℝ^n$.\n",
+ "\n",
+ "The new direction ${\\bf d}^k$ must therefore be orthogonal to all directions\n",
+ "preceding ones or that it makes it possible to find a base which generates a\n",
+ "space of dimension $k+1$ (we start with ${\\bf d}^0$).\n",
+ "\n",
+ "Is it possible ?\n",
+ "\n",
+ "### The $A {\\bf x} = {\\bf b}$ case\n",
+ "\n",
+ "In this case the functional to be minimized is\n",
+ "\n",
+ "$$\n",
+ "J({\\bf x}) = \\frac{1}{2} \\, {\\bf x}^T \\, A \\, {\\bf x} - {\\bf b}\\, . {\\bf x}\n",
+ "$$\n",
+ "\n",
+ "and its gradient is $\\nabla J({\\bf x}) = A \\; {\\bf x} - {\\bf b}$ if A is\n",
+ "symmetrical.\n",
+ "\n",
+ "\n",
+ "We know that if we calculate ${\\bf x}^k$ as before we only have orthogonality\n",
+ "from two successive directions. It happens when\n",
+ "µ is chosen to minimize $J$ along the direction line\n",
+ " $\\nabla J({\\bf x}^k)$.\n",
+ " \n",
+ "What happens if ${\\bf x}^{k+1}$ **minimizes $J$ in space\n",
+ "${\\bf G_k}$ generated by\n",
+ "all previous directions**?\n",
+ "\n",
+ "$$\n",
+ "J({\\bf x}^{k+1}) = \\min_{\\bf v \\in G_k} J({\\bf x}^k + {\\bf v})\n",
+ "$$\n",
+ "with ${\\bf G_k} = span\\{{\\bf d}^0, {\\bf d}^1,\\dots, {\\bf d}^k\\} \n",
+ " = \\left\\{ {\\bf v} = \\sum_{i=0}^{k} \\alpha_i \\, {\\bf d}^i \\quad \\forall \\alpha_i \\in ℝ \\right\\}$.\n",
+ "\n",
+ "Then all the partial derivatives with respect to the vectors of ${\\bf G_k}$\n",
+ "are zero (just like $\\partial J / \\partial x$ and $\\partial J / \\partial y$\n",
+ "are zero at the minimum of J in the drawing above). Vector\n",
+ "it is written like this:\n",
+ "\n",
+ "$$\n",
+ "\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf w} = 0 \\quad \\forall {\\bf w} \\in {\\bf G_k}\n",
+ "$$\n",
+ "\n",
+ "which is easily verified if ${\\bf w}$ is one of the base vectors:\n",
+ "\n",
+ "$$\n",
+ "\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf e}_i = \n",
+ "\\begin{bmatrix}\n",
+ "\\partial J / \\partial x_{1} \\\\\n",
+ "\\partial J / \\partial x_{2} \\\\\n",
+ "\\vdots \\\\\n",
+ "\\partial J / \\partial x_{i} \\\\\n",
+ "\\vdots \\\\\n",
+ "\\partial J / \\partial x_{n} \\\\\n",
+ "\\end{bmatrix}\n",
+ "\\, . \\,\n",
+ "\\begin{bmatrix}\n",
+ "0 \\\\\n",
+ "0 \\\\\n",
+ "\\vdots \\\\\n",
+ "1 \\\\\n",
+ "\\vdots \\\\\n",
+ "0 \\\\\n",
+ "\\end{bmatrix}\n",
+ "=\n",
+ "\\frac{\\partial J}{\\partial x_i}({\\bf x}^{k+1}) \n",
+ "$$\n",
+ "\n",
+ "Note that saying the partial derivative of $J$ in a direction ${\\bf w}$\n",
+ "of ${\\bf G_k}$ is null,\n",
+ "is equivalent to saying that $\\nabla J({\\bf x}^{k+1})$ is orthogonal to ${\\bf w}$\n",
+ "(or null if $\\nabla J({\\bf x}^{k+1})$ the same dimension as ${\\bf G_k}$).\n",
+ "\n",
+ "#### Let's generate directions ${\\bf d}^i$\n",
+ "\n",
+ "So our iterative formula becomes:\n",
+ "\n",
+ "$$\n",
+ "{\\bf x}^{k+1} = {\\bf x}^k - µ^k\\, {\\bf d}^k\n",
+ "$$\n",
+ "\n",
+ "To calculate the ${\\bf d}^k$ we use the formula which tells us that the\n",
+ "partial derivatives of $J$ with respect to a vector ${\\bf w \\in G_k}$ are\n",
+ "null. And as the ${\\bf d}^i$ directions generate the ${\\bf G_k}$ space\n",
+ "it is enough that the partial derivatives of $J$ with respect to all ${\\bf d}^i$\n",
+ "are zero:\n",
+ "\n",
+ "$$\n",
+ "\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf d}^i = 0 \\quad \\forall i \\le k \\qquad (1)\n",
+ "$$\n",
+ "\n",
+ "We perform the calculations\n",
+ "\n",
+ "$$\n",
+ "\\begin{align}\n",
+ "(A\\, {\\bf x}^{k+1} - b) \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n",
+ "(A\\, ({\\bf x}^{k} - µ^k \\, {\\bf d}^k) - b) \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n",
+ "(A\\, {\\bf x}^{k} - b) \\, . \\, {\\bf d}^i - µ^k \\, A \\, {\\bf d}^k \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n",
+ "\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^i - µ^k \\, A \\, {\\bf d}^k \\, . \\, {\\bf d}^i &= 0 \\quad \\forall i \\le k \\\\\n",
+ "\\end{align}\n",
+ "$$\n",
+ "\n",
+ "If $i < k$ then the first term is zero which gives\n",
+ "\n",
+ "$$\n",
+ "A \\, {\\bf d}^k \\, . \\, {\\bf d}^i = 0 \\quad \\forall i < k \\qquad (2)\n",
+ "$$\n",
+ "\n",
+ "As A is symmetric, we also have $A \\, {\\bf d}^i \\, . \\, {\\bf d}^k = 0 \\quad \\forall i < k\\quad$ ($\\sum_{m=1}^N \\sum_{n=1}^N a_{m,n}\\, d^i_n\\, d^k_m = \\sum_{n=1}^N \\sum_{m=1}^N a_{n,m}\\, d^k_m\\, d^i_n$).\n",
+ "\n",
+ "So we have conditions to build the new direction ${\\bf d}^k$.\n",
+ "\n",
+ "We assume that the new direction is the gradient + a correction which is a linear combination of the previous directions:\n",
+ "$$\n",
+ "{\\bf d}^k = \\nabla J({\\bf x}^k) + \\sum_{j=0}^{k-1} β_j {\\bf d}^j\n",
+ "$$\n",
+ "then equation (2) becomes\n",
+ "$$\n",
+ "A \\, \\left( \\nabla J({\\bf x}^k) + \\sum_{j=0}^{k-1} β_j {\\bf d}^j \\right) . {\\bf d}^i = 0 \\quad \\forall i < k\n",
+ "$$\n",
+ "The sum will reveal $A\\, {\\bf d}^j {\\bf d}^i$ which are all null according to (2) and its symmetric version except $A\\, {\\bf d}^i {\\bf d}^i$. So\n",
+ "\n",
+ "$$\n",
+ "β_i = - \\frac{A\\, \\nabla J({\\bf x}^k) \\, . \\, {\\bf d}^i}{A\\, {\\bf d}^i \\, . \\, {\\bf d}^i} \n",
+ "$$\n",
+ "\n",
+ "and so\n",
+ "$$\n",
+ "{\\bf d}^k = \\nabla J({\\bf x}^k) \n",
+ " - \\sum_{i=0}^{k-1} \\frac{A\\, \\nabla J({\\bf x}^k) \\, . \\, {\\bf d}^i}\n",
+ " {A\\, {\\bf d}^i \\, . \\, {\\bf d}^i} \\; {\\bf d}^i\n",
+ "$$"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "### Calculation of $μ^k$\n",
+ "\n",
+ "We have seen that $\\nabla J({\\bf x}^{k+1}) \\, . \\, {\\bf d}^i = 0$ gives\n",
+ "\n",
+ "$$\n",
+ "\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^i - µ^k \\, A \\, {\\bf d}^k \\, . \\, {\\bf d}^i = 0 \\quad \\forall i \\le k\n",
+ "$$\n",
+ "\n",
+ "so also for $i = k$ which allows to calculate $μ^k$:\n",
+ "\n",
+ "$$\n",
+ "µ^k = \\frac{\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^k}\n",
+ " {A \\, {\\bf d}^k \\, . \\, {\\bf d}^k}\n",
+ " = \\frac{(A \\, {\\bf x}^{k} - b) \\, . \\, {\\bf d}^k}\n",
+ " {A \\, {\\bf d}^k \\, . \\, {\\bf d}^k} \\\\\n",
+ "\\, \n",
+ "$$\n",
+ "\n",
+ "We therefore have the method to calculate ${\\bf d}^k$ and $µ^k$ which makes it possible to find\n",
+ "${\\bf x}^{k+1}$ from ${\\bf x}^k$.\n",
+ "\n",
+ "### Property\n",
+ "\n",
+ "$$\n",
+ "\\nabla J({\\bf x}^{k+1}) \\perp \\nabla J({\\bf x}^i) \\quad \\forall i \\le k\n",
+ "$$\n",
+ "\n",
+ "**Exercise** Prove this property by induction starting from (0), knowing that the first direction is\n",
+ "${\\bf d}^0 = \\nabla J({\\bf x}^0)$.\n",
+ "\n",
+ "This property shows that the set of gradients of $J$ at points\n",
+ "${\\bf x}^i$ form a basis of space $\\bf G_k$. So we are sure\n",
+ "that our algorithm converges in $n$ iterations at most."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "## 2nd attempt\n",
+ "\n",
+ "The method found is good because it shows that we can optimize the gradient descent but the calculation\n",
+ "of ${\\bf d}^k$ is way too heavy. So we're going to start all over again to find another more efficient way.\n",
+ "\n",
+ "### Let's work in the base of $\\nabla J({\\bf x}^i)$\n",
+ "\n",
+ "We start from equation (0) which allows us to\n",
+ "show that the $\\nabla J({\\bf x}^i), i \\le k$ form a basis of the subspace $\\bf G_k$. Also any vector is expressed in this basis and therefore\n",
+ "\n",
+ "$$\n",
+ "{\\bf x}^{k+1} = {\\bf x}^k + μ^k \\, {\\bf d}^k \\quad \\textrm{avec} \\quad {\\bf d}^k = - \\nabla J({\\bf x}^k) + \\sum_{i=0}^{k-1} γ^i \\, \\nabla J({\\bf x}^i)\n",
+ "$$\n",
+ "\n",
+ "Note that these ${\\bf d}^i, i \\le k$ also form a basis of $\\bf G_k$.\n",
+ "\n",
+ "Equation (2) tells us that $A \\, {\\bf d}^k \\, . \\, {\\bf d}^i = 0 \\quad \\forall i < k$, therefore\n",
+ "\n",
+ "$$\n",
+ "\\begin{align}\n",
+ "< A \\, {\\bf d}^k, {\\bf x}^{j+1} - {\\bf x}^j > &= 0 \\quad \\forall j < k \\\\ \n",
+ "< {\\bf d}^k, A \\, ({\\bf x}^{j+1} - {\\bf x}^j) > &= 0 \\quad \\textrm{since A is symetrical} \\\\\n",
+ "< {\\bf d}^k, A \\, {\\bf x}^{j+1} - {\\bf b} - A {\\bf x}^j + {\\bf b} > &= 0 \\\\\n",
+ "< {\\bf d}^k, \\nabla J({\\bf x}^{j+1}) - \\nabla J({\\bf x}^j) > &= 0 \\\\\n",
+ "< - \\nabla J({\\bf x}^k) + \\sum_{i=0}^{k-1} γ^i \\, \\nabla J({\\bf x}^i) , \\nabla J({\\bf x}^{j+1}) - \\nabla J({\\bf x}^j) > &= 0 \\quad (3) \\\\\n",
+ "\\end{align}\n",
+ "$$\n",
+ "\n",
+ "If j = k-1 then\n",
+ "$$\n",
+ "\\begin{align}\n",
+ "< - \\nabla J({\\bf x}^k), \\nabla J({\\bf x}^k) > + \\, γ^{k-1} < \\nabla J({\\bf x}^{k-1}), -\\nabla J({\\bf x}^{k-1}) > &= 0 \\\\\n",
+ "γ^{k-1} = -\\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n",
+ " {< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >}\n",
+ "\\end{align}\n",
+ "$$\n",
+ "\n",
+ "If j = k-2 then\n",
+ "$$\n",
+ "\\begin{align}\n",
+ "γ^{k-1} \\, < \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) > + \\, γ^{k-2} < \\nabla J({\\bf x}^{k-2}), -\\nabla J({\\bf x}^{k-2}) > &= 0 \\\\\n",
+ "γ^{k-2} = γ^{k-1} \\frac{< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \n",
+ " {< \\nabla J({\\bf x}^{k-2}), \\nabla J({\\bf x}^{k-2}) >}\n",
+ " = \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n",
+ " {< \\nabla J({\\bf x}^{k-2}), \\nabla J({\\bf x}^{k-2}) >}\n",
+ "\\end{align}\n",
+ "$$\n",
+ "and continuing the calculations, we have $\\forall j < k-1$\n",
+ "$$\n",
+ " γ^j = \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n",
+ " {< \\nabla J({\\bf x}^{j}), \\nabla J({\\bf x}^{j}) >}\n",
+ "$$\n",
+ "\n",
+ "The general formula of ${\\bf d}^k$ is therefore\n",
+ "$$\n",
+ "\\begin{align}\n",
+ "{\\bf d}^k &= -\\nabla J({\\bf x}^k) \n",
+ " - \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n",
+ " {< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \\, \\nabla J({\\bf x}^{k-1})\n",
+ " + \\sum_{i=0}^{k-2} \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n",
+ " {< \\nabla J({\\bf x}^{i}), \\nabla J({\\bf x}^{i}) >} \\, \\nabla J({\\bf x}^i) \\\\\n",
+ " &= -\\nabla J({\\bf x}^k) + β \\left( -\\nabla J({\\bf x}^{k-1}) \n",
+ " + \\sum_{i=0}^{k-2} \\frac{< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \n",
+ " {< \\nabla J({\\bf x}^{i}), \\nabla J({\\bf x}^{i}) >} \\, \\nabla J({\\bf x}^i) \\right) \\\\\n",
+ " &= -\\nabla J({\\bf x}^k) + β \\, {\\bf d}^{k-1} \\quad \\textrm{with} \\quad β = \\frac{< \\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k}) >} \n",
+ " {< \\nabla J({\\bf x}^{k-1}), \\nabla J({\\bf x}^{k-1}) >} \n",
+ "\\end{align}\n",
+ "$$\n",
+ "\n",
+ "We have therefore succeeded in expressing the new direction only as a function of the current gradient and the previous direction."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "lang": "en"
+ },
+ "source": [
+ "### New calculation of μ\n",
+ "\n",
+ "The previous calculation of μ remains valid since this calculation does not depend on the way of expressing ${\\bf d}^k$.\n",
+ "\n",
+ "$$\n",
+ "µ^k = -\\frac{\\nabla J({\\bf x}^{k}) \\, . \\, {\\bf d}^k}\n",
+ " {A \\, {\\bf d}^k \\, . \\, {\\bf d}^k}\n",
+ "\\, \n",
+ "$$\n",
+ "\n",
+ "With our choice of ${\\bf d}^k$ this gives\n",
+ "\n",
+ "$$\n",
+ "µ^k = -\\frac{<\\nabla J({\\bf x}^{k}), - \\nabla J({\\bf x}^{k}) + β \\, {\\bf d}^{k-1}>}\n",
+ " {<A \\, {\\bf d}^k, {\\bf d}^k>}\n",
+ " = \\frac{<\\nabla J({\\bf x}^{k}), \\nabla J({\\bf x}^{k})>}\n",
+ " {<A \\, {\\bf d}^k, {\\bf d}^k>}\n",
+ "$$\n",
+ "since ${\\bf d}^{k-1} \\perp \\nabla J({\\bf x}^{k})$."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ]
+} \ No newline at end of file