numpy.rst 16 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438
  1. .. _numpy:
  2. NumPy
  3. #####
  4. Buffer protocol
  5. ===============
  6. Python supports an extremely general and convenient approach for exchanging
  7. data between plugin libraries. Types can expose a buffer view [#f2]_, which
  8. provides fast direct access to the raw internal data representation. Suppose we
  9. want to bind the following simplistic Matrix class:
  10. .. code-block:: cpp
  11. class Matrix {
  12. public:
  13. Matrix(size_t rows, size_t cols) : m_rows(rows), m_cols(cols) {
  14. m_data = new float[rows*cols];
  15. }
  16. float *data() { return m_data; }
  17. size_t rows() const { return m_rows; }
  18. size_t cols() const { return m_cols; }
  19. private:
  20. size_t m_rows, m_cols;
  21. float *m_data;
  22. };
  23. The following binding code exposes the ``Matrix`` contents as a buffer object,
  24. making it possible to cast Matrices into NumPy arrays. It is even possible to
  25. completely avoid copy operations with Python expressions like
  26. ``np.array(matrix_instance, copy = False)``.
  27. .. code-block:: cpp
  28. py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
  29. .def_buffer([](Matrix &m) -> py::buffer_info {
  30. return py::buffer_info(
  31. m.data(), /* Pointer to buffer */
  32. sizeof(float), /* Size of one scalar */
  33. py::format_descriptor<float>::format(), /* Python struct-style format descriptor */
  34. 2, /* Number of dimensions */
  35. { m.rows(), m.cols() }, /* Buffer dimensions */
  36. { sizeof(float) * m.cols(), /* Strides (in bytes) for each index */
  37. sizeof(float) }
  38. );
  39. });
  40. Supporting the buffer protocol in a new type involves specifying the special
  41. ``py::buffer_protocol()`` tag in the ``py::class_`` constructor and calling the
  42. ``def_buffer()`` method with a lambda function that creates a
  43. ``py::buffer_info`` description record on demand describing a given matrix
  44. instance. The contents of ``py::buffer_info`` mirror the Python buffer protocol
  45. specification.
  46. .. code-block:: cpp
  47. struct buffer_info {
  48. void *ptr;
  49. py::ssize_t itemsize;
  50. std::string format;
  51. py::ssize_t ndim;
  52. std::vector<py::ssize_t> shape;
  53. std::vector<py::ssize_t> strides;
  54. };
  55. To create a C++ function that can take a Python buffer object as an argument,
  56. simply use the type ``py::buffer`` as one of its arguments. Buffers can exist
  57. in a great variety of configurations, hence some safety checks are usually
  58. necessary in the function body. Below, you can see a basic example on how to
  59. define a custom constructor for the Eigen double precision matrix
  60. (``Eigen::MatrixXd``) type, which supports initialization from compatible
  61. buffer objects (e.g. a NumPy matrix).
  62. .. code-block:: cpp
  63. /* Bind MatrixXd (or some other Eigen type) to Python */
  64. typedef Eigen::MatrixXd Matrix;
  65. typedef Matrix::Scalar Scalar;
  66. constexpr bool rowMajor = Matrix::Flags & Eigen::RowMajorBit;
  67. py::class_<Matrix>(m, "Matrix", py::buffer_protocol())
  68. .def(py::init([](py::buffer b) {
  69. typedef Eigen::Stride<Eigen::Dynamic, Eigen::Dynamic> Strides;
  70. /* Request a buffer descriptor from Python */
  71. py::buffer_info info = b.request();
  72. /* Some sanity checks ... */
  73. if (info.format != py::format_descriptor<Scalar>::format())
  74. throw std::runtime_error("Incompatible format: expected a double array!");
  75. if (info.ndim != 2)
  76. throw std::runtime_error("Incompatible buffer dimension!");
  77. auto strides = Strides(
  78. info.strides[rowMajor ? 0 : 1] / (py::ssize_t)sizeof(Scalar),
  79. info.strides[rowMajor ? 1 : 0] / (py::ssize_t)sizeof(Scalar));
  80. auto map = Eigen::Map<Matrix, 0, Strides>(
  81. static_cast<Scalar *>(info.ptr), info.shape[0], info.shape[1], strides);
  82. return Matrix(map);
  83. }));
  84. For reference, the ``def_buffer()`` call for this Eigen data type should look
  85. as follows:
  86. .. code-block:: cpp
  87. .def_buffer([](Matrix &m) -> py::buffer_info {
  88. return py::buffer_info(
  89. m.data(), /* Pointer to buffer */
  90. sizeof(Scalar), /* Size of one scalar */
  91. py::format_descriptor<Scalar>::format(), /* Python struct-style format descriptor */
  92. 2, /* Number of dimensions */
  93. { m.rows(), m.cols() }, /* Buffer dimensions */
  94. { sizeof(Scalar) * (rowMajor ? m.cols() : 1),
  95. sizeof(Scalar) * (rowMajor ? 1 : m.rows()) }
  96. /* Strides (in bytes) for each index */
  97. );
  98. })
  99. For a much easier approach of binding Eigen types (although with some
  100. limitations), refer to the section on :doc:`/advanced/cast/eigen`.
  101. .. seealso::
  102. The file :file:`tests/test_buffers.cpp` contains a complete example
  103. that demonstrates using the buffer protocol with pybind11 in more detail.
  104. .. [#f2] http://docs.python.org/3/c-api/buffer.html
  105. Arrays
  106. ======
  107. By exchanging ``py::buffer`` with ``py::array`` in the above snippet, we can
  108. restrict the function so that it only accepts NumPy arrays (rather than any
  109. type of Python object satisfying the buffer protocol).
  110. In many situations, we want to define a function which only accepts a NumPy
  111. array of a certain data type. This is possible via the ``py::array_t<T>``
  112. template. For instance, the following function requires the argument to be a
  113. NumPy array containing double precision values.
  114. .. code-block:: cpp
  115. void f(py::array_t<double> array);
  116. When it is invoked with a different type (e.g. an integer or a list of
  117. integers), the binding code will attempt to cast the input into a NumPy array
  118. of the requested type. This feature requires the :file:`pybind11/numpy.h`
  119. header to be included. Note that :file:`pybind11/numpy.h` does not depend on
  120. the NumPy headers, and thus can be used without declaring a build-time
  121. dependency on NumPy; NumPy>=1.7.0 is a runtime dependency.
  122. Data in NumPy arrays is not guaranteed to packed in a dense manner;
  123. furthermore, entries can be separated by arbitrary column and row strides.
  124. Sometimes, it can be useful to require a function to only accept dense arrays
  125. using either the C (row-major) or Fortran (column-major) ordering. This can be
  126. accomplished via a second template argument with values ``py::array::c_style``
  127. or ``py::array::f_style``.
  128. .. code-block:: cpp
  129. void f(py::array_t<double, py::array::c_style | py::array::forcecast> array);
  130. The ``py::array::forcecast`` argument is the default value of the second
  131. template parameter, and it ensures that non-conforming arguments are converted
  132. into an array satisfying the specified requirements instead of trying the next
  133. function overload.
  134. Structured types
  135. ================
  136. In order for ``py::array_t`` to work with structured (record) types, we first
  137. need to register the memory layout of the type. This can be done via
  138. ``PYBIND11_NUMPY_DTYPE`` macro, called in the plugin definition code, which
  139. expects the type followed by field names:
  140. .. code-block:: cpp
  141. struct A {
  142. int x;
  143. double y;
  144. };
  145. struct B {
  146. int z;
  147. A a;
  148. };
  149. // ...
  150. PYBIND11_MODULE(test, m) {
  151. // ...
  152. PYBIND11_NUMPY_DTYPE(A, x, y);
  153. PYBIND11_NUMPY_DTYPE(B, z, a);
  154. /* now both A and B can be used as template arguments to py::array_t */
  155. }
  156. The structure should consist of fundamental arithmetic types, ``std::complex``,
  157. previously registered substructures, and arrays of any of the above. Both C++
  158. arrays and ``std::array`` are supported. While there is a static assertion to
  159. prevent many types of unsupported structures, it is still the user's
  160. responsibility to use only "plain" structures that can be safely manipulated as
  161. raw memory without violating invariants.
  162. Vectorizing functions
  163. =====================
  164. Suppose we want to bind a function with the following signature to Python so
  165. that it can process arbitrary NumPy array arguments (vectors, matrices, general
  166. N-D arrays) in addition to its normal arguments:
  167. .. code-block:: cpp
  168. double my_func(int x, float y, double z);
  169. After including the ``pybind11/numpy.h`` header, this is extremely simple:
  170. .. code-block:: cpp
  171. m.def("vectorized_func", py::vectorize(my_func));
  172. Invoking the function like below causes 4 calls to be made to ``my_func`` with
  173. each of the array elements. The significant advantage of this compared to
  174. solutions like ``numpy.vectorize()`` is that the loop over the elements runs
  175. entirely on the C++ side and can be crunched down into a tight, optimized loop
  176. by the compiler. The result is returned as a NumPy array of type
  177. ``numpy.dtype.float64``.
  178. .. code-block:: pycon
  179. >>> x = np.array([[1, 3],[5, 7]])
  180. >>> y = np.array([[2, 4],[6, 8]])
  181. >>> z = 3
  182. >>> result = vectorized_func(x, y, z)
  183. The scalar argument ``z`` is transparently replicated 4 times. The input
  184. arrays ``x`` and ``y`` are automatically converted into the right types (they
  185. are of type ``numpy.dtype.int64`` but need to be ``numpy.dtype.int32`` and
  186. ``numpy.dtype.float32``, respectively).
  187. .. note::
  188. Only arithmetic, complex, and POD types passed by value or by ``const &``
  189. reference are vectorized; all other arguments are passed through as-is.
  190. Functions taking rvalue reference arguments cannot be vectorized.
  191. In cases where the computation is too complicated to be reduced to
  192. ``vectorize``, it will be necessary to create and access the buffer contents
  193. manually. The following snippet contains a complete example that shows how this
  194. works (the code is somewhat contrived, since it could have been done more
  195. simply using ``vectorize``).
  196. .. code-block:: cpp
  197. #include <pybind11/pybind11.h>
  198. #include <pybind11/numpy.h>
  199. namespace py = pybind11;
  200. py::array_t<double> add_arrays(py::array_t<double> input1, py::array_t<double> input2) {
  201. py::buffer_info buf1 = input1.request(), buf2 = input2.request();
  202. if (buf1.ndim != 1 || buf2.ndim != 1)
  203. throw std::runtime_error("Number of dimensions must be one");
  204. if (buf1.size != buf2.size)
  205. throw std::runtime_error("Input shapes must match");
  206. /* No pointer is passed, so NumPy will allocate the buffer */
  207. auto result = py::array_t<double>(buf1.size);
  208. py::buffer_info buf3 = result.request();
  209. double *ptr1 = static_cast<double *>(buf1.ptr);
  210. double *ptr2 = static_cast<double *>(buf2.ptr);
  211. double *ptr3 = static_cast<double *>(buf3.ptr);
  212. for (size_t idx = 0; idx < buf1.shape[0]; idx++)
  213. ptr3[idx] = ptr1[idx] + ptr2[idx];
  214. return result;
  215. }
  216. PYBIND11_MODULE(test, m) {
  217. m.def("add_arrays", &add_arrays, "Add two NumPy arrays");
  218. }
  219. .. seealso::
  220. The file :file:`tests/test_numpy_vectorize.cpp` contains a complete
  221. example that demonstrates using :func:`vectorize` in more detail.
  222. Direct access
  223. =============
  224. For performance reasons, particularly when dealing with very large arrays, it
  225. is often desirable to directly access array elements without internal checking
  226. of dimensions and bounds on every access when indices are known to be already
  227. valid. To avoid such checks, the ``array`` class and ``array_t<T>`` template
  228. class offer an unchecked proxy object that can be used for this unchecked
  229. access through the ``unchecked<N>`` and ``mutable_unchecked<N>`` methods,
  230. where ``N`` gives the required dimensionality of the array:
  231. .. code-block:: cpp
  232. m.def("sum_3d", [](py::array_t<double> x) {
  233. auto r = x.unchecked<3>(); // x must have ndim = 3; can be non-writeable
  234. double sum = 0;
  235. for (py::ssize_t i = 0; i < r.shape(0); i++)
  236. for (py::ssize_t j = 0; j < r.shape(1); j++)
  237. for (py::ssize_t k = 0; k < r.shape(2); k++)
  238. sum += r(i, j, k);
  239. return sum;
  240. });
  241. m.def("increment_3d", [](py::array_t<double> x) {
  242. auto r = x.mutable_unchecked<3>(); // Will throw if ndim != 3 or flags.writeable is false
  243. for (py::ssize_t i = 0; i < r.shape(0); i++)
  244. for (py::ssize_t j = 0; j < r.shape(1); j++)
  245. for (py::ssize_t k = 0; k < r.shape(2); k++)
  246. r(i, j, k) += 1.0;
  247. }, py::arg().noconvert());
  248. To obtain the proxy from an ``array`` object, you must specify both the data
  249. type and number of dimensions as template arguments, such as ``auto r =
  250. myarray.mutable_unchecked<float, 2>()``.
  251. If the number of dimensions is not known at compile time, you can omit the
  252. dimensions template parameter (i.e. calling ``arr_t.unchecked()`` or
  253. ``arr.unchecked<T>()``. This will give you a proxy object that works in the
  254. same way, but results in less optimizable code and thus a small efficiency
  255. loss in tight loops.
  256. Note that the returned proxy object directly references the array's data, and
  257. only reads its shape, strides, and writeable flag when constructed. You must
  258. take care to ensure that the referenced array is not destroyed or reshaped for
  259. the duration of the returned object, typically by limiting the scope of the
  260. returned instance.
  261. The returned proxy object supports some of the same methods as ``py::array`` so
  262. that it can be used as a drop-in replacement for some existing, index-checked
  263. uses of ``py::array``:
  264. - ``r.ndim()`` returns the number of dimensions
  265. - ``r.data(1, 2, ...)`` and ``r.mutable_data(1, 2, ...)``` returns a pointer to
  266. the ``const T`` or ``T`` data, respectively, at the given indices. The
  267. latter is only available to proxies obtained via ``a.mutable_unchecked()``.
  268. - ``itemsize()`` returns the size of an item in bytes, i.e. ``sizeof(T)``.
  269. - ``ndim()`` returns the number of dimensions.
  270. - ``shape(n)`` returns the size of dimension ``n``
  271. - ``size()`` returns the total number of elements (i.e. the product of the shapes).
  272. - ``nbytes()`` returns the number of bytes used by the referenced elements
  273. (i.e. ``itemsize()`` times ``size()``).
  274. .. seealso::
  275. The file :file:`tests/test_numpy_array.cpp` contains additional examples
  276. demonstrating the use of this feature.
  277. Ellipsis
  278. ========
  279. Python 3 provides a convenient ``...`` ellipsis notation that is often used to
  280. slice multidimensional arrays. For instance, the following snippet extracts the
  281. middle dimensions of a tensor with the first and last index set to zero.
  282. In Python 2, the syntactic sugar ``...`` is not available, but the singleton
  283. ``Ellipsis`` (of type ``ellipsis``) can still be used directly.
  284. .. code-block:: python
  285. a = # a NumPy array
  286. b = a[0, ..., 0]
  287. The function ``py::ellipsis()`` function can be used to perform the same
  288. operation on the C++ side:
  289. .. code-block:: cpp
  290. py::array a = /* A NumPy array */;
  291. py::array b = a[py::make_tuple(0, py::ellipsis(), 0)];
  292. .. versionchanged:: 2.6
  293. ``py::ellipsis()`` is now also avaliable in Python 2.
  294. Memory view
  295. ===========
  296. For a case when we simply want to provide a direct accessor to C/C++ buffer
  297. without a concrete class object, we can return a ``memoryview`` object. Suppose
  298. we wish to expose a ``memoryview`` for 2x4 uint8_t array, we can do the
  299. following:
  300. .. code-block:: cpp
  301. const uint8_t buffer[] = {
  302. 0, 1, 2, 3,
  303. 4, 5, 6, 7
  304. };
  305. m.def("get_memoryview2d", []() {
  306. return py::memoryview::from_buffer(
  307. buffer, // buffer pointer
  308. { 2, 4 }, // shape (rows, cols)
  309. { sizeof(uint8_t) * 4, sizeof(uint8_t) } // strides in bytes
  310. );
  311. })
  312. This approach is meant for providing a ``memoryview`` for a C/C++ buffer not
  313. managed by Python. The user is responsible for managing the lifetime of the
  314. buffer. Using a ``memoryview`` created in this way after deleting the buffer in
  315. C++ side results in undefined behavior.
  316. We can also use ``memoryview::from_memory`` for a simple 1D contiguous buffer:
  317. .. code-block:: cpp
  318. m.def("get_memoryview1d", []() {
  319. return py::memoryview::from_memory(
  320. buffer, // buffer pointer
  321. sizeof(uint8_t) * 8 // buffer size
  322. );
  323. })
  324. .. note::
  325. ``memoryview::from_memory`` is not available in Python 2.
  326. .. versionchanged:: 2.6
  327. ``memoryview::from_memory`` added.