{
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 2,
"cells": [
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"[Array manipulations](https://docs.scipy.org/doc/numpy-1.15.1/reference/routines.array-manipulation.html) include:\n",
"\n",
"* the reorganization of the table (reindexing)\n",
"* the aggregation of 2 or more arrays\n",
"* the division of a table in 2 or more\n",
"\n",
"Before looking at these points, let's look at how Numpy presents the\n",
"dimensions of a multidimensional array with the notion of axes."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(2, 4, 3)"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"\n",
"# An array of marks of 3 exams for 4 students in two subjects \n",
"# (therefore 6 marks per students or 12 per subjects)\n",
"\n",
"# stud.1 stud.2 stud.3 stud.4\n",
"marks = np.array([[[7,13,11], [7,7,13], [5,9,11], [7,17,15]], # subject 1\n",
" [[8,12,14], [8,12,12], [8,12,10], [12,16,12]]]) # subject 2\n",
"marks.shape"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"## The axes\n",
"\n",
"A table has axes which correspond to the axes of a coordinate system in space. The order of the axes\n",
"is that of the inclusion of square brackets. In 2D an array of array is an array of rows with\n",
"each row which is a 1D array of values. The order is therefore rows then columns (unlike the $(x,y)$ axis in\n",
"space). In 3D the order is line, column, depth if you want to have an image, otherwise it's 0, 1 and 2.\n",
"\n",
"Many operations on tables are done along one of the axes of the table so it is important to\n",
"understand what axes are.\n",
"\n",
"Let's look at the example notes above. The axes are\n",
"\n",
"0. materials\n",
"1. students\n",
"2. exams"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"Making the average of the values along axis 1 means to take data along axis 1 and performing the calculations on it, so here outputting an average."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 6.5, 11.5, 12.5],\n",
" [ 9. , 13. , 12. ]])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"marks.mean(axis=1) # give means for each exam in each subject"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"Another way to look at axes is to think of them as __projection axes__. If I project a 3D object\n",
"along the $y$ axis, the result is a 2D object in $(x,z)$. There is thus a reduction in dimension.\n",
"\n",
"If I sum on the 0 axis an array of dimension (2,4,3) as is our marks array, this means that I lose the 0 dimension and therefore the dimension of the result is (4,3)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4, 3)"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"marks.mean(axis=0).shape # mean along axis 0 (subjects) therefore this axis disapears"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"#### Some functions that support axes\n",
"\n",
"All functions that apply a set of values to produce a result\n",
"should be able to use axis concept (I don't have them all\n",
"checked but do not hesitate to give me a counter-example). We have the following mathematical functions:\n",
"\n",
"* arithmetic: `sum`, `prod`, `cumsum`, `cumprod`\n",
"* statistics: `min`, `max`, `argmin`, `argmax`, `mean` (average), `average` (weighted average), `std` (standard deviation), `var`, `median `, `percentile`, `quantile`\n",
"* others: `gradient`, `diff`, `fft`\n",
"\n",
"Moreover, it is possible to sort the values of an array according to the axis of your choice with `sort`.\n",
"However, they can be mixed, with [`shuffle`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.shuffle.html#numpy.random.shuffle ), only along the 0 axis.\n",
"\n",
"#### Apply a function along an axis\n",
"\n",
"The function [`apply_along_axis`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.apply_along_axis.html)\n",
"allows to apply a 1D functionto a table along an axis. This is the axis that will disappear in the result:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-> [ 7 13 11] 6\n",
"-> [ 7 7 13] 6\n",
"-> [ 5 9 11] 6\n",
"-> [ 7 17 15] 10\n",
"-> [ 8 12 14] 6\n",
"-> [ 8 12 12] 4\n",
"-> [ 8 12 10] 4\n",
"-> [12 16 12] 4\n"
]
},
{
"data": {
"text/plain": [
"array([[ 6, 6, 6, 10],\n",
" [ 6, 4, 4, 4]])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def diff_min_max(a):\n",
" print('->', a, a.max() - a.min())\n",
" return a.max() - a.min()\n",
"\n",
"np.apply_along_axis(diff_min_max, axis=-1, arr=marks) # -1 is the last axis, marks in our case"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"Question: this is the difference between the marks, but which ones?"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"#### Apply a function along several axes\n",
"\n",
"Some operations may take a list of axes and not a single axis."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"a.max \n",
" [17 16] \n",
"\n",
"a.max keepdim \n",
" [[[17]]\n",
"\n",
" [[16]]] \n",
"\n"
]
}
],
"source": [
"print('a.max \\n', marks.max(axis=(1,2)), '\\n') \n",
"print('a.max keepdim \\n', marks.max(axis=(1,2), keepdims=True), '\\n') "
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"Question: what do the 2 output values correspond to?"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"One can also use the function [`apply_over_axes`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.apply_over_axes.html#numpy.apply_over_axes) to apply a\n",
"function along the given axes.\n",
"\n",
"Beware, the function given in argument will receive the whole table and the axis on which it must\n",
"work, the axes being given one after the other and the table being modified at each stage."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Apply over axis 1\n",
"[[[ 7 13 11]\n",
" [ 7 7 13]\n",
" [ 5 9 11]\n",
" [ 7 17 15]]\n",
"\n",
" [[ 8 12 14]\n",
" [ 8 12 12]\n",
" [ 8 12 10]\n",
" [12 16 12]]] \n",
"\n",
"Apply over axis 2\n",
"[[[ 7 17 15]]\n",
"\n",
" [[12 16 14]]] \n",
"\n"
]
},
{
"data": {
"text/plain": [
"array([[[17]],\n",
"\n",
" [[16]]])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def mymax(array, axis):\n",
" print('Apply over axis', axis)\n",
" print(array, '\\n')\n",
" return array.max(axis)\n",
"\n",
"np.apply_over_axes(mymax, marks, axes=(1,2))"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"## Arranging a table\n",
"\n",
"We have already seen `reshape` to change the shape of an array, `flatten` to flatten it in 1 dimension, let's have a look at\n",
"other array manipulation functions.\n",
"\n",
"### Reorder axes\n",
"\n",
"#### `moveaxis` moves an axis\n",
"\n",
"In our marks example, the 3 axes are subjects, students and exams.\n",
"The `moveaxis` function is used to move an axis. If so I want the exams to become the first axis\n",
"in order to make the marks stand out, I move axis 2 to position 0 and the other axes slide to make room, axis 0 becomes axis 1 and axis 1 becomes axis 2:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"marks.shape = (2, 4, 3) \n",
"\n",
"b.shape = (3, 2, 4)\n"
]
},
{
"data": {
"text/plain": [
"array([[[ 7, 7, 5, 7],\n",
" [ 8, 8, 8, 12]],\n",
"\n",
" [[13, 7, 9, 17],\n",
" [12, 12, 12, 16]],\n",
"\n",
" [[11, 13, 11, 15],\n",
" [14, 12, 10, 12]]])"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print('marks.shape = ',marks.shape, '\\n')\n",
"b = np.moveaxis(marks, 2, 0) \n",
"print('b.shape = ', b.shape)\n",
"b"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"It is easier to see that the first examination was difficult."
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"#### `swapaxes` 2 axis swap\n",
"\n",
"Rather than inserting one axis at a new position and dragging the others, you may want to swap two of them.\n",
"Here is how to get the marks for each subject and for each exam:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[ 7, 7, 5, 7],\n",
" [13, 7, 9, 17],\n",
" [11, 13, 11, 15]],\n",
"\n",
" [[ 8, 8, 8, 12],\n",
" [12, 12, 12, 16],\n",
" [14, 12, 10, 12]]])"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"marks.swapaxes(1,2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"#### `transpose` to do everything\n",
"\n",
"Finally `transpose` allows you to reorder all the axes as you want, thus: `transpose((2,0,1))` puts\n",
"\n",
"* axis 2 in position 0,\n",
"* axis 0 in place 1\n",
"* axis 1 in place 2.\n",
"\n",
"#### A simpler and faster apply over axis\n",
"\n",
"Unfortunately the `apply_over_axis` function is not optimized, so in some cases it may\n",
"be preferable to make a loop on its table, which means to put the axes which will remain at the beginning and those on which we make our reduction at the end:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Means per students [10.833333333333334, 9.833333333333334, 9.166666666666666, 13.166666666666666]\n",
"24 µs ± 2.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n",
"29.6 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n"
]
}
],
"source": [
"print(\"Means per students\", [m.mean() for m in marks.transpose((1,0,2))])\n",
"\n",
"%timeit [m.mean() for m in marks.transpose((1,0,2))]\n",
"%timeit np.apply_over_axes(np.mean, marks, axes=(0,2))"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"### Changing the order of array elements\n",
"\n",
"You can flip the values of an array along an axis with [`flip`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.flip.html#numpy.flip )\n",
"which can also be done by indicating it at the level of the indices. Thus `np.flip(a, n)` is equivalent to\n",
"`a[:,:,..,::-1,:,..,:]` with `::-1` in $n$-th position."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[12, 13, 14, 15],\n",
" [16, 17, 18, 19],\n",
" [20, 21, 22, 23]],\n",
"\n",
" [[ 0, 1, 2, 3],\n",
" [ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11]]])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = np.arange(24).reshape([2,3,4])\n",
"np.flip(a,0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"You can roll values along an axis with `roll` by specifying how much to slide them:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[ 4, 5, 6, 7],\n",
" [ 8, 9, 10, 11],\n",
" [ 0, 1, 2, 3]],\n",
"\n",
" [[16, 17, 18, 19],\n",
" [20, 21, 22, 23],\n",
" [12, 13, 14, 15]]])"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.roll(a, 2, axis=1) # roll elements by 2 along axis 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"The transpose also applies regardless of the dimension. By default it reverses the order of the axes but you can\n",
"specify the desired output order."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[ 0, 12],\n",
" [ 4, 16],\n",
" [ 8, 20]],\n",
"\n",
" [[ 1, 13],\n",
" [ 5, 17],\n",
" [ 9, 21]],\n",
"\n",
" [[ 2, 14],\n",
" [ 6, 18],\n",
" [10, 22]],\n",
"\n",
" [[ 3, 15],\n",
" [ 7, 19],\n",
" [11, 23]]])"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a.T"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[ 0, 4, 8],\n",
" [ 1, 5, 9],\n",
" [ 2, 6, 10],\n",
" [ 3, 7, 11]],\n",
"\n",
" [[12, 16, 20],\n",
" [13, 17, 21],\n",
" [14, 18, 22],\n",
" [15, 19, 23]]])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.transpose(a, (0,2,1))"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"## Aggregation\n",
"\n",
"### Concatenation\n",
"\n",
"The basic function is [`concatenate`](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.concatenate.html) indicating the axis chosen for concatenation. This is, in my opinion, the method\n",
"the safest and it works whatever the size.\n",
"\n",
"However, for 2D or 3D arrays, we can use:\n",
"\n",
"* `vstack` or `row_stack` for vertical concatenation\n",
"* `hstack` or `column_stack` for horizontal concatenation\n",
"* `dstack` for deep concatenation\n",
"\n",
"All of these functions take a list of arrays to concatenate as an argument. Of course the sizes of the tables must be compatible."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[0. 0. 0.]\n",
" [0. 0. 0.]\n",
" [1. 1. 1.]\n",
" [1. 1. 1.]] \n",
"\n",
"[[0. 0. 0. 1. 1. 1.]\n",
" [0. 0. 0. 1. 1. 1.]]\n"
]
}
],
"source": [
"a = np.zeros((2,3))\n",
"b = np.ones((2,3))\n",
"\n",
"print(np.concatenate((a,b), axis=0), '\\n') # same than vstack\n",
"print(np.hstack((a,b))) # same than concatenate with axis=1"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"### Stacking\n",
"\n",
"Unlike concatenation, stacking adds dimension.\n",
"Stack is useful for storing a bunch of 2D arrays, images for example, in a 3D array.\n",
"We use the function [`stack`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.stack.html)."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[[0., 0., 0.],\n",
" [0., 0., 0.]],\n",
"\n",
" [[1., 1., 1.],\n",
" [1., 1., 1.]]])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"c = np.stack((a,b)) # c[0] is a\n",
"c"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"Note that `stack` has an `axis` option to indicate the direction in which one wishes to store the given arrays."
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"## Splitting\n",
"\n",
"The inverse function of concatenation is splitting with [`split`](https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.split.html#numpy.split) which asks as arguments:\n",
"\n",
"* the array to split\n",
"* in how many pieces or at what indices\n",
"* the direction (the axis)\n",
"\n",
"To find our two tables that generated the result of the previous cell, we cut in 2 along the 0 axis. We can also cut along another axis."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"split part 1\n",
" [[[0. 0. 0.]]\n",
"\n",
" [[1. 1. 1.]]] \n",
"\n",
"split part 2\n",
" [[[0. 0. 0.]]\n",
"\n",
" [[1. 1. 1.]]]\n"
]
}
],
"source": [
"e,f = np.split(c, 2, 1) # splits in 2 along axis 1\n",
"print(\"split part 1\\n\", e, '\\n')\n",
"print(\"split part 2\\n\", f)"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"There are also `hsplit`, `vsplit` and `dsplit` to split along axes 0, 1 and 2.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"## From Python to Numpy\n",
"\n",
"If you want to dig and look at many examples, you can read N. Rougier's book [From Python to Numpy](https://www.labri.fr/perso/nrougier/from-python-to-numpy/)."
]
},
{
"cell_type": "markdown",
"metadata": {
"lang": "en"
},
"source": [
"## Pandas too\n",
"\n",
"We will find these manipulations with Pandas which is the super spreadsheet of Python. It also works on\n",
"array-like structures but without the constraint that all values are of the same type."
]
},
{
"cell_type": "markdown",
"metadata": {
"variables": {
" PreviousNext(\"np02 Filtres.ipynb\", \"np04 Xarray.ipynb\")": "