hwloc-calc.1 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498
  1. .\" -*- nroff -*-
  2. .\" Copyright © 2010-2023 Inria. All rights reserved.
  3. .\" Copyright © 2009-2020 Cisco Systems, Inc. All rights reserved.
  4. .\" See COPYING in top-level directory.
  5. .TH HWLOC-CALC "1" "Sep 07, 2023" "2.9.3" "hwloc"
  6. .SH NAME
  7. hwloc-calc \- Operate on cpu mask strings and objects
  8. .
  9. .\" **************************
  10. .\" Synopsis Section
  11. .\" **************************
  12. .SH SYNOPSIS
  13. .
  14. .B hwloc-calc
  15. [\fItopology options\fR] [\fIoptions\fR] \fI<location1> [<location2> [...] ]
  16. .
  17. .PP
  18. Note that hwloc(7) provides a detailed explanation of the hwloc system
  19. and of valid <location> formats;
  20. it should be read before reading this man page.
  21. .
  22. .\" **************************
  23. .\" Options Section
  24. .\" **************************
  25. .SH TOPOLOGY OPTIONS
  26. .
  27. All topology options must be given before all other options.
  28. .
  29. .TP 10
  30. \fB\-\-no\-smt\fR, \fB\-\-no\-smt=<N>\fR
  31. Only keep the first PU per core in the input locations.
  32. If \fI<N>\fR is specified, keep the <N>-th instead, if any.
  33. PUs are ordered by physical index during this filtering.
  34. Note that this option is applied after searching locations.
  35. Hence \fB\-\-no\-smt pu:2-5\fR will first select the PUs #2
  36. to #5 in the machine before keeping one of them per core.
  37. To rather get PUs #2 to #5 after filtering one per core,
  38. you should combine invocations:
  39. hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5
  40. .TP
  41. \fB\-\-cpukind\fR <n>, \fB\-\-cpukind\fR <infoname>=<infovalue>
  42. Only keep PUs whose CPU kind match.
  43. Either a single CPU kind is specified as an index,
  44. or the info attribute name-value will select matching kinds.
  45. When specified by index, it corresponds to hwloc ranking of CPU kinds
  46. which returns energy-efficient cores first, and high-performance
  47. power-hungry cores last.
  48. The full list of CPU kinds may be seen with \fIlstopo --cpukinds\fR.
  49. Note that this option is applied after searching locations.
  50. Hence \fB\-\-cpukind 0 core:1\fR will return the second core of
  51. the machine if it is of kind 0, and nothing otherwise.
  52. To rather get the second core among those of kind 0, you should
  53. combine invocations:
  54. hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1
  55. .TP
  56. \fB\-\-restrict\fR <cpuset>
  57. Restrict the topology to the given cpuset.
  58. This removes some PUs and their now-child-less parents.
  59. This is useful when combining invocations to filter some objects
  60. before selecting among them.
  61. Beware that restricting the PUs in a topology may change the
  62. logical indexes of many objects, including NUMA nodes.
  63. .TP
  64. \fB\-\-restrict\fR nodeset=<nodeset>
  65. Restrict the topology to the given nodeset
  66. (unless \fB\-\-restrict\-flags\fR specifies something different).
  67. This removes some NUMA nodes and their now-child-less parents.
  68. Beware that restricting the NUMA nodes in a topology may change the
  69. logical indexes of many objects, including PUs.
  70. .TP
  71. \fB\-\-restrict\-flags\fR <flags>
  72. Enforce flags when restricting the topology.
  73. Flags may be given as numeric values or as a comma-separated list of flag names
  74. that are passed to \fIhwloc_topology_restrict()\fR.
  75. Those names may be substrings of actual flag names as long as a single one matches,
  76. for instance \fBbynodeset,memless\fR.
  77. The default is \fB0\fR (or \fBnone\fR).
  78. .TP
  79. \fB\-\-disallowed\fR
  80. Include objects disallowed by administrative limitations.
  81. .TP
  82. \fB\-i\fR <path>, \fB\-\-input\fR <path>
  83. Read the topology from <path> instead of discovering the topology of the local machine.
  84. If <path> is a file,
  85. it may be a XML file exported by a previous hwloc program.
  86. If <path> is "\-", the standard input may be used as a XML file.
  87. On Linux, <path> may be a directory containing the topology files
  88. gathered from another machine topology with hwloc-gather-topology.
  89. On x86, <path> may be a directory containing a cpuid dump gathered
  90. with hwloc-gather-cpuid.
  91. When the archivemount program is available, <path> may also be a tarball
  92. containing such Linux or x86 topology files.
  93. .TP
  94. \fB\-i\fR <specification>, \fB\-\-input\fR <specification>
  95. Simulate a fake hierarchy (instead of discovering the topology on the
  96. local machine). If <specification> is "node:2 pu:3", the topology will
  97. contain two NUMA nodes with 3 processing units in each of them.
  98. The <specification> string must end with a number of PUs.
  99. .TP
  100. \fB\-\-if\fR <format>, \fB\-\-input\-format\fR <format>
  101. Enforce the input in the given format, among \fBxml\fR, \fBfsroot\fR,
  102. \fBcpuid\fR and \fBsynthetic\fR.
  103. .
  104. .SH OPTIONS
  105. .
  106. All these options must be given after all topology options above.
  107. .
  108. .TP 10
  109. \fB\-p\fR \fB\-\-physical\fR
  110. Use OS/physical indexes instead of logical indexes for both input and output.
  111. .TP
  112. \fB\-l\fR \fB\-\-logical\fR
  113. Use logical indexes instead of physical/OS indexes for both input and output (default).
  114. .TP
  115. \fB\-\-pi\fR \fB\-\-physical\-input\fR
  116. Use OS/physical indexes instead of logical indexes for input.
  117. .TP
  118. \fB\-\-li\fR \fB\-\-logical\-input\fR
  119. Use logical indexes instead of physical/OS indexes for input (default).
  120. .TP
  121. \fB\-\-po\fR \fB\-\-physical\-output\fR
  122. Use OS/physical indexes instead of logical indexes for output.
  123. .TP
  124. \fB\-\-lo\fR \fB\-\-logical\-output\fR
  125. Use logical indexes instead of physical/OS indexes for output (default, except for cpusets which are always physical).
  126. .TP
  127. \fB\-n\fR \fB\-\-nodeset\fR
  128. Interpret both input and output sets as nodesets instead of CPU sets.
  129. See \fB\-\-nodeset\-output\fR and \fB\-\-nodeset\-input\fR below for details.
  130. .TP
  131. \fB\-\-no\fR \fB\-\-nodeset\-output\fR
  132. Report nodesets instead of CPU sets.
  133. This output is more precise than the default CPU set output when memory
  134. locality matters because it properly describes CPU-less NUMA nodes,
  135. as well as NUMA-nodes that are local to multiple CPUs.
  136. .TP
  137. \fB\-\-ni\fR \fB\-\-nodeset\-input\fR
  138. Interpret input sets as nodesets instead of CPU sets.
  139. .TP
  140. \fB\-N \-\-number\-of <type|depth>\fR
  141. Report the number of objects of the given type or depth that intersect the CPU set.
  142. This is convenient for finding how many cores, NUMA nodes or PUs are available
  143. in a machine.
  144. When combined with \fB\-\-nodeset\fR or \fB\-\-nodeset-output\fR,
  145. the nodeset is considered instead of the CPU set for finding matching objects.
  146. This is useful when reporting the output as a number or set of NUMA nodes.
  147. If an OS device subtype such as \fIgpu\fR is given instead of \fIosdev\fR,
  148. only the os devices of that subtype will be counted.
  149. .TP
  150. \fB\-I \-\-intersect <type|depth>\fR
  151. Find the list of objects of the given type or depth that intersect the CPU set and
  152. report the comma-separated list of their indexes instead of the cpu mask string.
  153. This may be used for determining the list of objects above or below the input
  154. objects.
  155. When combined with \fB\-\-physical\fR, the list is convenient to pass to external
  156. tools such as taskset or numactl \fB\-\-physcpubind\fR or \fB\-\-membind\fR.
  157. This is different from \-\-largest since the latter requires that all reported
  158. objects are strictly included inside the input objects.
  159. When combined with \fB\-\-nodeset\fR or \fB\-\-nodeset-output\fR,
  160. the nodeset is considered instead of the CPU set for finding matching objects.
  161. This is useful when reporting the output as a number or set of NUMA nodes.
  162. If an OS device subtype such as \fIgpu\fR is given instead of \fIosdev\fR,
  163. only the os devices of that subtype will be returned.
  164. .TP
  165. \fB\-H \-\-hierarchical <type1>.<type2>...\fR
  166. Find the list of objects of type <type2> that intersect the CPU set and
  167. report the space-separated list of their hierarchical indexes with respect
  168. to <type1>, <type2>, etc.
  169. For instance, if \fIpackage.core\fR is given, the output would be
  170. \fIPackage:1.Core:2 Package:2.Core:3\fR if the input contains the third
  171. core of the second package and the fourth core of the third package.
  172. Only normal CPU-side object types should be used.
  173. NUMA nodes may be used but they may cause redundancy in the output
  174. on heterogeneous memory platform. For instance, on a platform with both
  175. DRAM and HBM memory on a package, the first core will be considered both
  176. as first core of first NUMA node (DRAM) and
  177. as first core of second NUMA node (HBM).
  178. .TP
  179. \fB\-\-largest\fR
  180. Report (in a human readable format) the list of largest objects which exactly
  181. include all input objects (by looking at their CPU sets).
  182. None of these output objects intersect each other, and the sum of them is
  183. exactly equivalent to the input. No larger object is included in the input.
  184. This is different from \-\-intersect where reported objects may not be
  185. strictly included in the input.
  186. .TP
  187. \fB\-\-local\-memory\fR
  188. Report the list of NUMA nodes that are local to the input objects.
  189. This option is similar to \fB\-I numa\fR but the way nodes are selected
  190. is different:
  191. The selection performed by \fB\-\-local\-memory\fR may be precisely
  192. configured with \fB\-\-local\-memory\-flags\fR,
  193. while \fB\-I numa\fR just selects all nodes that are somehow local to
  194. any of the input objects.
  195. .TP
  196. \fB\-\-local\-memory\-flags\fR
  197. Change the flags used to select local NUMA nodes.
  198. Flags may be given as numeric values or as a comma-separated list of flag names
  199. that are passed to \fIhwloc_get_local_numanode_objs()\fR.
  200. Those names may be substrings of actual flag names as long as a single one matches.
  201. The default is \fB3\fR (or \fBsmaller,larger\fR)
  202. which means NUMA nodes are displayed
  203. if their locality either contains or is contained
  204. in the locality of the given object.
  205. This option enables \fB\-\-local\-memory\fR.
  206. .TP
  207. \fB\-\-best\-memattr\fR <name>
  208. Enable the listing of local memory nodes with \fB\-\-local\-memory\fR,
  209. but only display the local node that has the best value for the memory
  210. attribute given by \fI<name>\fR (or as an index).
  211. If the memory attribute values depend on the initiator, the hwloc-calc
  212. input objects are used as the initiator.
  213. Standard attribute names are \fICapacity\fR, \fILocality\fR,
  214. \fIBandwidth\fR, and \fILatency\fR.
  215. All existing attributes in the current topology may be listed with
  216. $ lstopo --memattrs
  217. .TP
  218. \fB\-\-sep <sep>\fR
  219. Change the field separator in the output.
  220. By default, a space is used to separate output objects
  221. (for instance when \fB\-\-hierarchical\fR or \fB\-\-largest\fR is given)
  222. while a comma is used to separate indexes
  223. (for instance when \fB\-\-intersect\fR is given).
  224. .TP
  225. \fB\-\-single\fR
  226. Singlify the output to a single CPU.
  227. .TP
  228. \fB\-\-taskset\fR
  229. Display CPU set strings in the format recognized by the taskset command-line
  230. program instead of hwloc-specific CPU set string format.
  231. This option has no impact on the format of input CPU set strings,
  232. both formats are always accepted.
  233. .TP
  234. \fB\-q\fR \fB\-\-quiet\fR
  235. Hide non-fatal error messages.
  236. It mostly includes locations pointing to non-existing objects.
  237. .TP
  238. \fB\-v\fR \fB\-\-verbose\fR
  239. Verbose output.
  240. .TP
  241. \fB\-\-version\fR
  242. Report version and exit.
  243. .TP
  244. \fB\-h\fR \fB\-\-help\fR
  245. Display help message and exit.
  246. .
  247. .\" **************************
  248. .\" Description Section
  249. .\" **************************
  250. .SH DESCRIPTION
  251. .
  252. hwloc-calc generates and manipulates CPU mask strings or objects.
  253. Both input and output may be either objects (with physical or logical
  254. indexes), CPU lists (with physical or logical indexes), or CPU mask strings
  255. (always physically indexed).
  256. Input location specification is described in hwloc(7).
  257. .
  258. .PP
  259. If objects or CPU mask strings are given on the command-line,
  260. they are combined and a single output is printed.
  261. If no object or CPU mask strings are given on the command-line,
  262. the program will read the standard input.
  263. It will combine multiple objects or CPU mask strings that are
  264. given on the same line of the standard input line with spaces
  265. as separators.
  266. Different input lines will be processed separately.
  267. .
  268. .PP
  269. Command-line arguments and options are processed in order.
  270. First topology configuration options should be given.
  271. Then, for instance, changing the type of input indexes
  272. with \fB\-\-li\fR or changing the input topology with \fB\-i\fR
  273. only affects the processing the following arguments.
  274. .
  275. .PP
  276. .B NOTE:
  277. It is highly recommended that you read the hwloc(7) overview page
  278. before reading this man page. Most of the concepts described in
  279. hwloc(7) directly apply to the hwloc-calc utility.
  280. .
  281. .
  282. .\" **************************
  283. .\" Examples Section
  284. .\" **************************
  285. .SH EXAMPLES
  286. .PP
  287. hwloc-calc's operation is best described through several examples.
  288. .
  289. .PP
  290. To display the (physical) CPU mask corresponding to the second package:
  291. $ hwloc-calc package:1
  292. 0x000000f0
  293. To display the (physical) CPU mask corresponding to the third pacakge, excluding
  294. its even numbered logical processors:
  295. $ hwloc-calc package:2 ~PU:even
  296. 0x00000c00
  297. To convert a cpu mask to human-readable output, the -H option can be
  298. used to emit a space-delimited list of locations:
  299. $ echo 0x000000f0 | hwloc-calc -H package.core
  300. Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3
  301. To use some other character (e.g., a comma) instead of spaces in
  302. output, use the --sep option:
  303. $ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
  304. Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3
  305. To combine two (physical) CPU masks:
  306. $ hwloc-calc 0x0000ffff 0xff000000
  307. 0xff00ffff
  308. To display the list of logical numbers of processors included in the second
  309. package:
  310. $ hwloc-calc --intersect PU package:1
  311. 4,5,6,7
  312. To bind GNU OpenMP threads logically over the whole machine, we need to use
  313. physical number output instead:
  314. $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
  315. $ echo $GOMP_CPU_AFFINITY
  316. 0,4,1,5,2,6,3,7
  317. To display the list of NUMA nodes, by physical indexes, that intersect a given (physical) CPU mask:
  318. $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
  319. 0,2
  320. To find how many cores are in the second CPU kind
  321. (those cores are likely higher-performance and more power-hungry than cores of the first kind):
  322. $ hwloc-calc --cpukind 1 -N core all
  323. 4
  324. To display the list of NUMA nodes, by physical indexes,
  325. whose locality is exactly equal to a Package:
  326. $ hwloc-calc --local-memory-flags 0 pack:1
  327. 4,7
  328. To display the best-capacity NUMA node, by physical indexe,
  329. whose locality is exactly equal to a Package:
  330. $ hwloc-calc --local-memory-flags 0 --best-memattr capacity pack:1
  331. 4
  332. Converting object logical indexes (default) from/to physical/OS indexes
  333. may be performed with \fB--intersect\fR combined with either \fB--physical-output\fR
  334. (logical to physical conversion) or \fB--physical-input\fR (physical to logical):
  335. $ hwloc-calc --physical-output PU:2 --intersect PU
  336. 3
  337. $ hwloc-calc --physical-input PU:3 --intersect PU
  338. 2
  339. One should add \fB--nodeset\fR when converting indexes of memory objects
  340. to make sure a single NUMA node index is returned on platforms
  341. with heterogeneous memory:
  342. $ hwloc-calc --nodeset --physical-output node:2 --intersect node
  343. 3
  344. $ hwloc-calc --nodeset --physical-input node:3 --intersect node
  345. 2
  346. To display the set of CPUs near network interface eth0:
  347. $ hwloc-calc os=eth0
  348. 0x00005555
  349. To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:
  350. $ hwloc-calc pci=0000:01:02.0 --intersect Package
  351. 1
  352. To display the list of per-package cores that intersect the input:
  353. $ hwloc-calc 0x00003c00 --hierarchical package.core
  354. Package:2.Core:1 Package:3.Core:0
  355. To display the (physical) CPU mask of the entire topology except the third package:
  356. $ hwloc-calc all ~package:3
  357. 0x0000f0ff
  358. To combine both physical and logical indexes as input:
  359. $ hwloc-calc PU:2 --physical-input PU:3
  360. 0x0000000c
  361. To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:
  362. $ hwloc-calc core:0 --largest
  363. Core:0
  364. $ hwloc-calc core:0-1 --largest
  365. Package:0
  366. $ hwloc-calc core:4-7 --largest
  367. NUMANode:1
  368. $ hwloc-calc core:2-6 --largest
  369. Package:1 Package:2 Core:6
  370. $ hwloc-calc pack:2 --largest
  371. Package:2
  372. $ hwloc-calc package:2-3 --largest
  373. NUMANode:1
  374. To get the set of first threads of all cores:
  375. $ hwloc-calc core:all.pu:0
  376. $ hwloc-calc --no-smt all
  377. This can also be very useful in order to make GNU OpenMP use exactly one thread
  378. per core, and in logical core order:
  379. $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
  380. $ echo $OMP_NUM_THREADS
  381. 4
  382. $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU --no-smt all`
  383. $ echo $GOMP_CPU_AFFINITY
  384. 0,2,1,3
  385. To export bitmask in a format that is acceptable by the resctrl Linux subsystem
  386. (for configuring cache partitioning, etc), apply a sed regexp to the output of hwloc-calc:
  387. $ hwloc-calc pack:all.core:7-9.pu:0
  388. 0x00000380,,0x00000380 <this format cannot be given to resctrl>
  389. $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e 's/,,/,0,/g'
  390. 00000380,0,00000380
  391. # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
  392. # cat /sys/fs/resctrl/test/cpus
  393. 00000000,00000380,00000000,00000380 <the modified bitmask was corrected parsed by resctrl>
  394. OS devices may also be filtered by subtype. In this example, there are
  395. 8 OS devices in the system, 4 of them are near NUMA node #1, and only
  396. 2 of these are CoProcessors:
  397. $ utils/hwloc/hwloc-calc -I osdev all
  398. 0,1,2,3,4,5,6,7,8
  399. $ utils/hwloc/hwloc-calc -I osdev node:1
  400. 5,6,7,8
  401. $ utils/hwloc/hwloc-calc -I coproc node:1
  402. 7,8
  403. .
  404. .\" **************************
  405. .\" Return value section
  406. .\" **************************
  407. .SH RETURN VALUE
  408. Upon successful execution, hwloc-calc displays the (physical) CPU mask string,
  409. (physical or logical) object list, or (physical or logical) object number list.
  410. The return value is 0.
  411. .
  412. .
  413. .PP
  414. hwloc-calc will return nonzero if any kind of error occurs, such as
  415. (but not limited to): failure to parse the command line.
  416. .
  417. .\" **************************
  418. .\" See also section
  419. .\" **************************
  420. .SH SEE ALSO
  421. .
  422. .ft R
  423. hwloc(7), lstopo(1), hwloc-info(1)
  424. .sp