hwlocality_memattrs.3 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276
  1. .TH "hwlocality_memattrs" 3 "Thu Sep 7 2023" "Version 2.9.3" "Hardware Locality (hwloc)" \" -*- nroff -*-
  2. .ad l
  3. .nh
  4. .SH NAME
  5. hwlocality_memattrs \- Comparing memory node attributes for finding where to allocate on
  6. .SH SYNOPSIS
  7. .br
  8. .PP
  9. .SS "Data Structures"
  10. .in +1c
  11. .ti -1c
  12. .RI "struct \fBhwloc_location\fP"
  13. .br
  14. .in -1c
  15. .SS "Typedefs"
  16. .in +1c
  17. .ti -1c
  18. .RI "typedef unsigned \fBhwloc_memattr_id_t\fP"
  19. .br
  20. .in -1c
  21. .SS "Enumerations"
  22. .in +1c
  23. .ti -1c
  24. .RI "enum \fBhwloc_memattr_id_e\fP { \fBHWLOC_MEMATTR_ID_CAPACITY\fP, \fBHWLOC_MEMATTR_ID_LOCALITY\fP, \fBHWLOC_MEMATTR_ID_BANDWIDTH\fP, \fBHWLOC_MEMATTR_ID_READ_BANDWIDTH\fP, \fBHWLOC_MEMATTR_ID_WRITE_BANDWIDTH\fP, \fBHWLOC_MEMATTR_ID_LATENCY\fP, \fBHWLOC_MEMATTR_ID_READ_LATENCY\fP, \fBHWLOC_MEMATTR_ID_WRITE_LATENCY\fP, \fBHWLOC_MEMATTR_ID_MAX\fP }"
  25. .br
  26. .ti -1c
  27. .RI "enum \fBhwloc_location_type_e\fP { \fBHWLOC_LOCATION_TYPE_CPUSET\fP, \fBHWLOC_LOCATION_TYPE_OBJECT\fP }"
  28. .br
  29. .ti -1c
  30. .RI "enum \fBhwloc_local_numanode_flag_e\fP { \fBHWLOC_LOCAL_NUMANODE_FLAG_LARGER_LOCALITY\fP, \fBHWLOC_LOCAL_NUMANODE_FLAG_SMALLER_LOCALITY\fP, \fBHWLOC_LOCAL_NUMANODE_FLAG_ALL\fP }"
  31. .br
  32. .in -1c
  33. .SS "Functions"
  34. .in +1c
  35. .ti -1c
  36. .RI "int \fBhwloc_memattr_get_by_name\fP (\fBhwloc_topology_t\fP topology, const char *name, \fBhwloc_memattr_id_t\fP *id)"
  37. .br
  38. .ti -1c
  39. .RI "int \fBhwloc_get_local_numanode_objs\fP (\fBhwloc_topology_t\fP topology, struct \fBhwloc_location\fP *location, unsigned *nr, \fBhwloc_obj_t\fP *nodes, unsigned long flags)"
  40. .br
  41. .ti -1c
  42. .RI "int \fBhwloc_memattr_get_value\fP (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target_node, struct \fBhwloc_location\fP *initiator, unsigned long flags, hwloc_uint64_t *value)"
  43. .br
  44. .ti -1c
  45. .RI "int \fBhwloc_memattr_get_best_target\fP (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, struct \fBhwloc_location\fP *initiator, unsigned long flags, \fBhwloc_obj_t\fP *best_target, hwloc_uint64_t *value)"
  46. .br
  47. .ti -1c
  48. .RI "int \fBhwloc_memattr_get_best_initiator\fP (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target, unsigned long flags, struct \fBhwloc_location\fP *best_initiator, hwloc_uint64_t *value)"
  49. .br
  50. .in -1c
  51. .SH "Detailed Description"
  52. .PP
  53. Platforms with heterogeneous memory require ways to decide whether a buffer should be allocated on 'fast' memory (such as HBM), 'normal' memory (DDR) or even 'slow' but large-capacity memory (non-volatile memory)\&. These memory nodes are called 'Targets' while the CPU accessing them is called the 'Initiator'\&. Access performance depends on their locality (NUMA platforms) as well as the intrinsic performance of the targets (heterogeneous platforms)\&.
  54. .PP
  55. The following attributes describe the performance of memory accesses from an Initiator to a memory Target, for instance their latency or bandwidth\&. Initiators performing these memory accesses are usually some PUs or Cores (described as a CPU set)\&. Hence a Core may choose where to allocate a memory buffer by comparing the attributes of different target memory nodes nearby\&.
  56. .PP
  57. There are also some attributes that are system-wide\&. Their value does not depend on a specific initiator performing an access\&. The memory node Capacity is an example of such attribute without initiator\&.
  58. .PP
  59. One way to use this API is to start with a cpuset describing the Cores where a program is bound\&. The best target NUMA node for allocating memory in this program on these Cores may be obtained by passing this cpuset as an initiator to \fBhwloc_memattr_get_best_target()\fP with the relevant memory attribute\&. For instance, if the code is latency limited, use the Latency attribute\&.
  60. .PP
  61. A more flexible approach consists in getting the list of local NUMA nodes by passing this cpuset to \fBhwloc_get_local_numanode_objs()\fP\&. Attribute values for these nodes, if any, may then be obtained with \fBhwloc_memattr_get_value()\fP and manually compared with the desired criteria\&.
  62. .PP
  63. \fBSee also\fP
  64. .RS 4
  65. An example is available in doc/examples/memory-attributes\&.c in the source tree\&.
  66. .RE
  67. .PP
  68. \fBNote\fP
  69. .RS 4
  70. The API also supports specific objects as initiator, but it is currently not used internally by hwloc\&. Users may for instance use it to provide custom performance values for host memory accesses performed by GPUs\&.
  71. .PP
  72. The interface actually also accepts targets that are not NUMA nodes\&.
  73. .RE
  74. .PP
  75. .SH "Typedef Documentation"
  76. .PP
  77. .SS "typedef unsigned \fBhwloc_memattr_id_t\fP"
  78. .PP
  79. A memory attribute identifier\&. May be either one of \fBhwloc_memattr_id_e\fP or a new id returned by \fBhwloc_memattr_register()\fP\&.
  80. .SH "Enumeration Type Documentation"
  81. .PP
  82. .SS "enum \fBhwloc_local_numanode_flag_e\fP"
  83. .PP
  84. Flags for selecting target NUMA nodes\&.
  85. .PP
  86. \fBEnumerator\fP
  87. .in +1c
  88. .TP
  89. \fB\fIHWLOC_LOCAL_NUMANODE_FLAG_LARGER_LOCALITY \fP\fP
  90. Select NUMA nodes whose locality is larger than the given cpuset\&. For instance, if a single PU (or its cpuset) is given in \fCinitiator\fP, select all nodes close to the package that contains this PU\&.
  91. .TP
  92. \fB\fIHWLOC_LOCAL_NUMANODE_FLAG_SMALLER_LOCALITY \fP\fP
  93. Select NUMA nodes whose locality is smaller than the given cpuset\&. For instance, if a package (or its cpuset) is given in \fCinitiator\fP, also select nodes that are attached to only a half of that package\&.
  94. .TP
  95. \fB\fIHWLOC_LOCAL_NUMANODE_FLAG_ALL \fP\fP
  96. Select all NUMA nodes in the topology\&. The initiator \fCinitiator\fP is ignored\&.
  97. .SS "enum \fBhwloc_location_type_e\fP"
  98. .PP
  99. Type of location\&.
  100. .PP
  101. \fBEnumerator\fP
  102. .in +1c
  103. .TP
  104. \fB\fIHWLOC_LOCATION_TYPE_CPUSET \fP\fP
  105. Location is given as a cpuset, in the location cpuset union field\&.
  106. .TP
  107. \fB\fIHWLOC_LOCATION_TYPE_OBJECT \fP\fP
  108. Location is given as an object, in the location object union field\&.
  109. .SS "enum \fBhwloc_memattr_id_e\fP"
  110. .PP
  111. Memory node attributes\&.
  112. .PP
  113. \fBEnumerator\fP
  114. .in +1c
  115. .TP
  116. \fB\fIHWLOC_MEMATTR_ID_CAPACITY \fP\fP
  117. The "Capacity" is returned in bytes (local_memory attribute in objects)\&. Best capacity nodes are nodes with \fBhigher capacity\fP\&.
  118. .PP
  119. No initiator is involved when looking at this attribute\&. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP\&.
  120. .TP
  121. \fB\fIHWLOC_MEMATTR_ID_LOCALITY \fP\fP
  122. The "Locality" is returned as the number of PUs in that locality (e\&.g\&. the weight of its cpuset)\&. Best locality nodes are nodes with \fBsmaller locality\fP (nodes that are local to very few PUs)\&. Poor locality nodes are nodes with larger locality (nodes that are local to the entire machine)\&.
  123. .PP
  124. No initiator is involved when looking at this attribute\&. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP\&.
  125. .TP
  126. \fB\fIHWLOC_MEMATTR_ID_BANDWIDTH \fP\fP
  127. The "Bandwidth" is returned in MiB/s, as seen from the given initiator location\&. Best bandwidth nodes are nodes with \fBhigher bandwidth\fP\&.
  128. .PP
  129. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&.
  130. .PP
  131. This is the average bandwidth for read and write accesses\&. If the platform provides individual read and write bandwidths but no explicit average value, hwloc computes and returns the average\&.
  132. .TP
  133. \fB\fIHWLOC_MEMATTR_ID_READ_BANDWIDTH \fP\fP
  134. The "ReadBandwidth" is returned in MiB/s, as seen from the given initiator location\&. Best bandwidth nodes are nodes with \fBhigher bandwidth\fP\&.
  135. .PP
  136. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&.
  137. .TP
  138. \fB\fIHWLOC_MEMATTR_ID_WRITE_BANDWIDTH \fP\fP
  139. The "WriteBandwidth" is returned in MiB/s, as seen from the given initiator location\&. Best bandwidth nodes are nodes with \fBhigher bandwidth\fP\&.
  140. .PP
  141. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_HIGHER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&.
  142. .TP
  143. \fB\fIHWLOC_MEMATTR_ID_LATENCY \fP\fP
  144. The "Latency" is returned as nanoseconds, as seen from the given initiator location\&. Best latency nodes are nodes with \fBsmaller latency\fP\&.
  145. .PP
  146. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_LOWER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&.
  147. .PP
  148. This is the average latency for read and write accesses\&. If the platform provides individual read and write latencies but no explicit average value, hwloc computes and returns the average\&.
  149. .TP
  150. \fB\fIHWLOC_MEMATTR_ID_READ_LATENCY \fP\fP
  151. The "ReadLatency" is returned as nanoseconds, as seen from the given initiator location\&. Best latency nodes are nodes with \fBsmaller latency\fP\&.
  152. .PP
  153. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_LOWER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&.
  154. .TP
  155. \fB\fIHWLOC_MEMATTR_ID_WRITE_LATENCY \fP\fP
  156. The "WriteLatency" is returned as nanoseconds, as seen from the given initiator location\&. Best latency nodes are nodes with \fBsmaller latency\fP\&.
  157. .PP
  158. The corresponding attribute flags are \fBHWLOC_MEMATTR_FLAG_LOWER_FIRST\fP and \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP\&.
  159. .SH "Function Documentation"
  160. .PP
  161. .SS "int hwloc_get_local_numanode_objs (\fBhwloc_topology_t\fP topology, struct \fBhwloc_location\fP * location, unsigned * nr, \fBhwloc_obj_t\fP * nodes, unsigned long flags)"
  162. .PP
  163. Return an array of local NUMA nodes\&. By default only select the NUMA nodes whose locality is exactly the given \fClocation\fP\&. More nodes may be selected if additional flags are given as a OR'ed set of \fBhwloc_local_numanode_flag_e\fP\&.
  164. .PP
  165. If \fClocation\fP is given as an explicit object, its CPU set is used to find NUMA nodes with the corresponding locality\&. If the object does not have a CPU set (e\&.g\&. I/O object), the CPU parent (where the I/O object is attached) is used\&.
  166. .PP
  167. On input, \fCnr\fP points to the number of nodes that may be stored in the \fCnodes\fP array\&. On output, \fCnr\fP will be changed to the number of stored nodes, or the number of nodes that would have been stored if there were enough room\&.
  168. .PP
  169. \fBReturns\fP
  170. .RS 4
  171. 0 on success or -1 on error\&.
  172. .RE
  173. .PP
  174. \fBNote\fP
  175. .RS 4
  176. Some of these NUMA nodes may not have any memory attribute values and hence not be reported as actual targets in other functions\&.
  177. .PP
  178. The number of NUMA nodes in the topology (obtained by \fBhwloc_bitmap_weight()\fP on the root object nodeset) may be used to allocate the \fCnodes\fP array\&.
  179. .PP
  180. When an object CPU set is given as locality, for instance a Package, and when flags contain both \fBHWLOC_LOCAL_NUMANODE_FLAG_LARGER_LOCALITY\fP and \fBHWLOC_LOCAL_NUMANODE_FLAG_SMALLER_LOCALITY\fP, the returned array corresponds to the nodeset of that object\&.
  181. .RE
  182. .PP
  183. .SS "int hwloc_memattr_get_best_initiator (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target, unsigned long flags, struct \fBhwloc_location\fP * best_initiator, hwloc_uint64_t * value)"
  184. .PP
  185. Return the best initiator for the given attribute and target NUMA node\&. If \fCvalue\fP is non \fCNULL\fP, the corresponding value is returned there\&.
  186. .PP
  187. If multiple initiators have the same attribute values, only one is returned (and there is no way to clarify how that one is chosen)\&. Applications that want to detect initiators with identical/similar values, or that want to look at values for multiple attributes, should rather get all values using \fBhwloc_memattr_get_value()\fP and manually select the initiator they consider the best\&.
  188. .PP
  189. The returned initiator should not be modified or freed, it belongs to the topology\&.
  190. .PP
  191. \fCflags\fP must be \fC0\fP for now\&.
  192. .PP
  193. \fBReturns\fP
  194. .RS 4
  195. 0 on success\&.
  196. .PP
  197. -1 with errno set to \fCENOENT\fP if there are no matching initiators\&.
  198. .PP
  199. -1 with errno set to \fCEINVAL\fP if the attribute does not relate to a specific initiator (it does not have the flag \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP)\&.
  200. .RE
  201. .PP
  202. .SS "int hwloc_memattr_get_best_target (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, struct \fBhwloc_location\fP * initiator, unsigned long flags, \fBhwloc_obj_t\fP * best_target, hwloc_uint64_t * value)"
  203. .PP
  204. Return the best target NUMA node for the given attribute and initiator\&. If the attribute does not relate to a specific initiator (it does not have the flag \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP), location \fCinitiator\fP is ignored and may be \fCNULL\fP\&.
  205. .PP
  206. If \fCvalue\fP is non \fCNULL\fP, the corresponding value is returned there\&.
  207. .PP
  208. If multiple targets have the same attribute values, only one is returned (and there is no way to clarify how that one is chosen)\&. Applications that want to detect targets with identical/similar values, or that want to look at values for multiple attributes, should rather get all values using \fBhwloc_memattr_get_value()\fP and manually select the target they consider the best\&.
  209. .PP
  210. \fCflags\fP must be \fC0\fP for now\&.
  211. .PP
  212. \fBReturns\fP
  213. .RS 4
  214. 0 on success\&.
  215. .PP
  216. -1 with errno set to \fCENOENT\fP if there are no matching targets\&.
  217. .PP
  218. -1 with errno set to \fCEINVAL\fP if flags are invalid, or no such attribute exists\&.
  219. .RE
  220. .PP
  221. \fBNote\fP
  222. .RS 4
  223. The initiator \fCinitiator\fP should be of type \fBHWLOC_LOCATION_TYPE_CPUSET\fP when refering to accesses performed by CPU cores\&. \fBHWLOC_LOCATION_TYPE_OBJECT\fP is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs\&.
  224. .RE
  225. .PP
  226. .SS "int hwloc_memattr_get_by_name (\fBhwloc_topology_t\fP topology, const char * name, \fBhwloc_memattr_id_t\fP * id)"
  227. .PP
  228. Return the identifier of the memory attribute with the given name\&.
  229. .PP
  230. \fBReturns\fP
  231. .RS 4
  232. 0 on success\&.
  233. .PP
  234. -1 with errno set to \fCEINVAL\fP if no such attribute exists\&.
  235. .RE
  236. .PP
  237. .SS "int hwloc_memattr_get_value (\fBhwloc_topology_t\fP topology, \fBhwloc_memattr_id_t\fP attribute, \fBhwloc_obj_t\fP target_node, struct \fBhwloc_location\fP * initiator, unsigned long flags, hwloc_uint64_t * value)"
  238. .PP
  239. Return an attribute value for a specific target NUMA node\&. If the attribute does not relate to a specific initiator (it does not have the flag \fBHWLOC_MEMATTR_FLAG_NEED_INITIATOR\fP), location \fCinitiator\fP is ignored and may be \fCNULL\fP\&.
  240. .PP
  241. \fCflags\fP must be \fC0\fP for now\&.
  242. .PP
  243. \fBReturns\fP
  244. .RS 4
  245. 0 on success\&.
  246. .PP
  247. -1 on error, for instance with errno set to \fCEINVAL\fP if flags are invalid or no such attribute exists\&.
  248. .RE
  249. .PP
  250. \fBNote\fP
  251. .RS 4
  252. The initiator \fCinitiator\fP should be of type \fBHWLOC_LOCATION_TYPE_CPUSET\fP when refering to accesses performed by CPU cores\&. \fBHWLOC_LOCATION_TYPE_OBJECT\fP is currently unused internally by hwloc, but users may for instance use it to provide custom information about host memory accesses performed by GPUs\&.
  253. .RE
  254. .PP
  255. .SH "Author"
  256. .PP
  257. Generated automatically by Doxygen for Hardware Locality (hwloc) from the source code\&.