| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498 |
- .\" -*- nroff -*-
- .\" Copyright © 2010-2023 Inria. All rights reserved.
- .\" Copyright © 2009-2020 Cisco Systems, Inc. All rights reserved.
- .\" See COPYING in top-level directory.
- .TH HWLOC-CALC "1" "Sep 07, 2023" "2.9.3" "hwloc"
- .SH NAME
- hwloc-calc \- Operate on cpu mask strings and objects
- .
- .\" **************************
- .\" Synopsis Section
- .\" **************************
- .SH SYNOPSIS
- .
- .B hwloc-calc
- [\fItopology options\fR] [\fIoptions\fR] \fI<location1> [<location2> [...] ]
- .
- .PP
- Note that hwloc(7) provides a detailed explanation of the hwloc system
- and of valid <location> formats;
- it should be read before reading this man page.
- .
- .\" **************************
- .\" Options Section
- .\" **************************
- .SH TOPOLOGY OPTIONS
- .
- All topology options must be given before all other options.
- .
- .TP 10
- \fB\-\-no\-smt\fR, \fB\-\-no\-smt=<N>\fR
- Only keep the first PU per core in the input locations.
- If \fI<N>\fR is specified, keep the <N>-th instead, if any.
- PUs are ordered by physical index during this filtering.
- Note that this option is applied after searching locations.
- Hence \fB\-\-no\-smt pu:2-5\fR will first select the PUs #2
- to #5 in the machine before keeping one of them per core.
- To rather get PUs #2 to #5 after filtering one per core,
- you should combine invocations:
- hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5
- .TP
- \fB\-\-cpukind\fR <n>, \fB\-\-cpukind\fR <infoname>=<infovalue>
- Only keep PUs whose CPU kind match.
- Either a single CPU kind is specified as an index,
- or the info attribute name-value will select matching kinds.
- When specified by index, it corresponds to hwloc ranking of CPU kinds
- which returns energy-efficient cores first, and high-performance
- power-hungry cores last.
- The full list of CPU kinds may be seen with \fIlstopo --cpukinds\fR.
- Note that this option is applied after searching locations.
- Hence \fB\-\-cpukind 0 core:1\fR will return the second core of
- the machine if it is of kind 0, and nothing otherwise.
- To rather get the second core among those of kind 0, you should
- combine invocations:
- hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1
- .TP
- \fB\-\-restrict\fR <cpuset>
- Restrict the topology to the given cpuset.
- This removes some PUs and their now-child-less parents.
- This is useful when combining invocations to filter some objects
- before selecting among them.
- Beware that restricting the PUs in a topology may change the
- logical indexes of many objects, including NUMA nodes.
- .TP
- \fB\-\-restrict\fR nodeset=<nodeset>
- Restrict the topology to the given nodeset
- (unless \fB\-\-restrict\-flags\fR specifies something different).
- This removes some NUMA nodes and their now-child-less parents.
- Beware that restricting the NUMA nodes in a topology may change the
- logical indexes of many objects, including PUs.
- .TP
- \fB\-\-restrict\-flags\fR <flags>
- Enforce flags when restricting the topology.
- Flags may be given as numeric values or as a comma-separated list of flag names
- that are passed to \fIhwloc_topology_restrict()\fR.
- Those names may be substrings of actual flag names as long as a single one matches,
- for instance \fBbynodeset,memless\fR.
- The default is \fB0\fR (or \fBnone\fR).
- .TP
- \fB\-\-disallowed\fR
- Include objects disallowed by administrative limitations.
- .TP
- \fB\-i\fR <path>, \fB\-\-input\fR <path>
- Read the topology from <path> instead of discovering the topology of the local machine.
- If <path> is a file,
- it may be a XML file exported by a previous hwloc program.
- If <path> is "\-", the standard input may be used as a XML file.
- On Linux, <path> may be a directory containing the topology files
- gathered from another machine topology with hwloc-gather-topology.
- On x86, <path> may be a directory containing a cpuid dump gathered
- with hwloc-gather-cpuid.
- When the archivemount program is available, <path> may also be a tarball
- containing such Linux or x86 topology files.
- .TP
- \fB\-i\fR <specification>, \fB\-\-input\fR <specification>
- Simulate a fake hierarchy (instead of discovering the topology on the
- local machine). If <specification> is "node:2 pu:3", the topology will
- contain two NUMA nodes with 3 processing units in each of them.
- The <specification> string must end with a number of PUs.
- .TP
- \fB\-\-if\fR <format>, \fB\-\-input\-format\fR <format>
- Enforce the input in the given format, among \fBxml\fR, \fBfsroot\fR,
- \fBcpuid\fR and \fBsynthetic\fR.
- .
- .SH OPTIONS
- .
- All these options must be given after all topology options above.
- .
- .TP 10
- \fB\-p\fR \fB\-\-physical\fR
- Use OS/physical indexes instead of logical indexes for both input and output.
- .TP
- \fB\-l\fR \fB\-\-logical\fR
- Use logical indexes instead of physical/OS indexes for both input and output (default).
- .TP
- \fB\-\-pi\fR \fB\-\-physical\-input\fR
- Use OS/physical indexes instead of logical indexes for input.
- .TP
- \fB\-\-li\fR \fB\-\-logical\-input\fR
- Use logical indexes instead of physical/OS indexes for input (default).
- .TP
- \fB\-\-po\fR \fB\-\-physical\-output\fR
- Use OS/physical indexes instead of logical indexes for output.
- .TP
- \fB\-\-lo\fR \fB\-\-logical\-output\fR
- Use logical indexes instead of physical/OS indexes for output (default, except for cpusets which are always physical).
- .TP
- \fB\-n\fR \fB\-\-nodeset\fR
- Interpret both input and output sets as nodesets instead of CPU sets.
- See \fB\-\-nodeset\-output\fR and \fB\-\-nodeset\-input\fR below for details.
- .TP
- \fB\-\-no\fR \fB\-\-nodeset\-output\fR
- Report nodesets instead of CPU sets.
- This output is more precise than the default CPU set output when memory
- locality matters because it properly describes CPU-less NUMA nodes,
- as well as NUMA-nodes that are local to multiple CPUs.
- .TP
- \fB\-\-ni\fR \fB\-\-nodeset\-input\fR
- Interpret input sets as nodesets instead of CPU sets.
- .TP
- \fB\-N \-\-number\-of <type|depth>\fR
- Report the number of objects of the given type or depth that intersect the CPU set.
- This is convenient for finding how many cores, NUMA nodes or PUs are available
- in a machine.
- When combined with \fB\-\-nodeset\fR or \fB\-\-nodeset-output\fR,
- the nodeset is considered instead of the CPU set for finding matching objects.
- This is useful when reporting the output as a number or set of NUMA nodes.
- If an OS device subtype such as \fIgpu\fR is given instead of \fIosdev\fR,
- only the os devices of that subtype will be counted.
- .TP
- \fB\-I \-\-intersect <type|depth>\fR
- Find the list of objects of the given type or depth that intersect the CPU set and
- report the comma-separated list of their indexes instead of the cpu mask string.
- This may be used for determining the list of objects above or below the input
- objects.
- When combined with \fB\-\-physical\fR, the list is convenient to pass to external
- tools such as taskset or numactl \fB\-\-physcpubind\fR or \fB\-\-membind\fR.
- This is different from \-\-largest since the latter requires that all reported
- objects are strictly included inside the input objects.
- When combined with \fB\-\-nodeset\fR or \fB\-\-nodeset-output\fR,
- the nodeset is considered instead of the CPU set for finding matching objects.
- This is useful when reporting the output as a number or set of NUMA nodes.
- If an OS device subtype such as \fIgpu\fR is given instead of \fIosdev\fR,
- only the os devices of that subtype will be returned.
- .TP
- \fB\-H \-\-hierarchical <type1>.<type2>...\fR
- Find the list of objects of type <type2> that intersect the CPU set and
- report the space-separated list of their hierarchical indexes with respect
- to <type1>, <type2>, etc.
- For instance, if \fIpackage.core\fR is given, the output would be
- \fIPackage:1.Core:2 Package:2.Core:3\fR if the input contains the third
- core of the second package and the fourth core of the third package.
- Only normal CPU-side object types should be used.
- NUMA nodes may be used but they may cause redundancy in the output
- on heterogeneous memory platform. For instance, on a platform with both
- DRAM and HBM memory on a package, the first core will be considered both
- as first core of first NUMA node (DRAM) and
- as first core of second NUMA node (HBM).
- .TP
- \fB\-\-largest\fR
- Report (in a human readable format) the list of largest objects which exactly
- include all input objects (by looking at their CPU sets).
- None of these output objects intersect each other, and the sum of them is
- exactly equivalent to the input. No larger object is included in the input.
- This is different from \-\-intersect where reported objects may not be
- strictly included in the input.
- .TP
- \fB\-\-local\-memory\fR
- Report the list of NUMA nodes that are local to the input objects.
- This option is similar to \fB\-I numa\fR but the way nodes are selected
- is different:
- The selection performed by \fB\-\-local\-memory\fR may be precisely
- configured with \fB\-\-local\-memory\-flags\fR,
- while \fB\-I numa\fR just selects all nodes that are somehow local to
- any of the input objects.
- .TP
- \fB\-\-local\-memory\-flags\fR
- Change the flags used to select local NUMA nodes.
- Flags may be given as numeric values or as a comma-separated list of flag names
- that are passed to \fIhwloc_get_local_numanode_objs()\fR.
- Those names may be substrings of actual flag names as long as a single one matches.
- The default is \fB3\fR (or \fBsmaller,larger\fR)
- which means NUMA nodes are displayed
- if their locality either contains or is contained
- in the locality of the given object.
- This option enables \fB\-\-local\-memory\fR.
- .TP
- \fB\-\-best\-memattr\fR <name>
- Enable the listing of local memory nodes with \fB\-\-local\-memory\fR,
- but only display the local node that has the best value for the memory
- attribute given by \fI<name>\fR (or as an index).
- If the memory attribute values depend on the initiator, the hwloc-calc
- input objects are used as the initiator.
- Standard attribute names are \fICapacity\fR, \fILocality\fR,
- \fIBandwidth\fR, and \fILatency\fR.
- All existing attributes in the current topology may be listed with
- $ lstopo --memattrs
- .TP
- \fB\-\-sep <sep>\fR
- Change the field separator in the output.
- By default, a space is used to separate output objects
- (for instance when \fB\-\-hierarchical\fR or \fB\-\-largest\fR is given)
- while a comma is used to separate indexes
- (for instance when \fB\-\-intersect\fR is given).
- .TP
- \fB\-\-single\fR
- Singlify the output to a single CPU.
- .TP
- \fB\-\-taskset\fR
- Display CPU set strings in the format recognized by the taskset command-line
- program instead of hwloc-specific CPU set string format.
- This option has no impact on the format of input CPU set strings,
- both formats are always accepted.
- .TP
- \fB\-q\fR \fB\-\-quiet\fR
- Hide non-fatal error messages.
- It mostly includes locations pointing to non-existing objects.
- .TP
- \fB\-v\fR \fB\-\-verbose\fR
- Verbose output.
- .TP
- \fB\-\-version\fR
- Report version and exit.
- .TP
- \fB\-h\fR \fB\-\-help\fR
- Display help message and exit.
- .
- .\" **************************
- .\" Description Section
- .\" **************************
- .SH DESCRIPTION
- .
- hwloc-calc generates and manipulates CPU mask strings or objects.
- Both input and output may be either objects (with physical or logical
- indexes), CPU lists (with physical or logical indexes), or CPU mask strings
- (always physically indexed).
- Input location specification is described in hwloc(7).
- .
- .PP
- If objects or CPU mask strings are given on the command-line,
- they are combined and a single output is printed.
- If no object or CPU mask strings are given on the command-line,
- the program will read the standard input.
- It will combine multiple objects or CPU mask strings that are
- given on the same line of the standard input line with spaces
- as separators.
- Different input lines will be processed separately.
- .
- .PP
- Command-line arguments and options are processed in order.
- First topology configuration options should be given.
- Then, for instance, changing the type of input indexes
- with \fB\-\-li\fR or changing the input topology with \fB\-i\fR
- only affects the processing the following arguments.
- .
- .PP
- .B NOTE:
- It is highly recommended that you read the hwloc(7) overview page
- before reading this man page. Most of the concepts described in
- hwloc(7) directly apply to the hwloc-calc utility.
- .
- .
- .\" **************************
- .\" Examples Section
- .\" **************************
- .SH EXAMPLES
- .PP
- hwloc-calc's operation is best described through several examples.
- .
- .PP
- To display the (physical) CPU mask corresponding to the second package:
- $ hwloc-calc package:1
- 0x000000f0
- To display the (physical) CPU mask corresponding to the third pacakge, excluding
- its even numbered logical processors:
- $ hwloc-calc package:2 ~PU:even
- 0x00000c00
- To convert a cpu mask to human-readable output, the -H option can be
- used to emit a space-delimited list of locations:
- $ echo 0x000000f0 | hwloc-calc -H package.core
- Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3
- To use some other character (e.g., a comma) instead of spaces in
- output, use the --sep option:
- $ echo 0x000000f0 | hwloc-calc -H package.core --sep ,
- Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3
- To combine two (physical) CPU masks:
- $ hwloc-calc 0x0000ffff 0xff000000
- 0xff00ffff
- To display the list of logical numbers of processors included in the second
- package:
- $ hwloc-calc --intersect PU package:1
- 4,5,6,7
- To bind GNU OpenMP threads logically over the whole machine, we need to use
- physical number output instead:
- $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU all`
- $ echo $GOMP_CPU_AFFINITY
- 0,4,1,5,2,6,3,7
- To display the list of NUMA nodes, by physical indexes, that intersect a given (physical) CPU mask:
- $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
- 0,2
- To find how many cores are in the second CPU kind
- (those cores are likely higher-performance and more power-hungry than cores of the first kind):
- $ hwloc-calc --cpukind 1 -N core all
- 4
- To display the list of NUMA nodes, by physical indexes,
- whose locality is exactly equal to a Package:
- $ hwloc-calc --local-memory-flags 0 pack:1
- 4,7
- To display the best-capacity NUMA node, by physical indexe,
- whose locality is exactly equal to a Package:
- $ hwloc-calc --local-memory-flags 0 --best-memattr capacity pack:1
- 4
- Converting object logical indexes (default) from/to physical/OS indexes
- may be performed with \fB--intersect\fR combined with either \fB--physical-output\fR
- (logical to physical conversion) or \fB--physical-input\fR (physical to logical):
- $ hwloc-calc --physical-output PU:2 --intersect PU
- 3
- $ hwloc-calc --physical-input PU:3 --intersect PU
- 2
- One should add \fB--nodeset\fR when converting indexes of memory objects
- to make sure a single NUMA node index is returned on platforms
- with heterogeneous memory:
- $ hwloc-calc --nodeset --physical-output node:2 --intersect node
- 3
- $ hwloc-calc --nodeset --physical-input node:3 --intersect node
- 2
- To display the set of CPUs near network interface eth0:
- $ hwloc-calc os=eth0
- 0x00005555
- To display the indexes of packages near PCI device whose bus ID is 0000:01:02.0:
- $ hwloc-calc pci=0000:01:02.0 --intersect Package
- 1
- To display the list of per-package cores that intersect the input:
- $ hwloc-calc 0x00003c00 --hierarchical package.core
- Package:2.Core:1 Package:3.Core:0
- To display the (physical) CPU mask of the entire topology except the third package:
- $ hwloc-calc all ~package:3
- 0x0000f0ff
- To combine both physical and logical indexes as input:
- $ hwloc-calc PU:2 --physical-input PU:3
- 0x0000000c
- To synthetize a set of cores into largest objects on a 2-node 2-package 2-core machine:
- $ hwloc-calc core:0 --largest
- Core:0
- $ hwloc-calc core:0-1 --largest
- Package:0
- $ hwloc-calc core:4-7 --largest
- NUMANode:1
- $ hwloc-calc core:2-6 --largest
- Package:1 Package:2 Core:6
- $ hwloc-calc pack:2 --largest
- Package:2
- $ hwloc-calc package:2-3 --largest
- NUMANode:1
- To get the set of first threads of all cores:
- $ hwloc-calc core:all.pu:0
- $ hwloc-calc --no-smt all
- This can also be very useful in order to make GNU OpenMP use exactly one thread
- per core, and in logical core order:
- $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
- $ echo $OMP_NUM_THREADS
- 4
- $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output --intersect PU --no-smt all`
- $ echo $GOMP_CPU_AFFINITY
- 0,2,1,3
- To export bitmask in a format that is acceptable by the resctrl Linux subsystem
- (for configuring cache partitioning, etc), apply a sed regexp to the output of hwloc-calc:
- $ hwloc-calc pack:all.core:7-9.pu:0
- 0x00000380,,0x00000380 <this format cannot be given to resctrl>
- $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e 's/,,/,0,/g' -e 's/,,/,0,/g'
- 00000380,0,00000380
- # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
- # cat /sys/fs/resctrl/test/cpus
- 00000000,00000380,00000000,00000380 <the modified bitmask was corrected parsed by resctrl>
- OS devices may also be filtered by subtype. In this example, there are
- 8 OS devices in the system, 4 of them are near NUMA node #1, and only
- 2 of these are CoProcessors:
- $ utils/hwloc/hwloc-calc -I osdev all
- 0,1,2,3,4,5,6,7,8
- $ utils/hwloc/hwloc-calc -I osdev node:1
- 5,6,7,8
- $ utils/hwloc/hwloc-calc -I coproc node:1
- 7,8
- .
- .\" **************************
- .\" Return value section
- .\" **************************
- .SH RETURN VALUE
- Upon successful execution, hwloc-calc displays the (physical) CPU mask string,
- (physical or logical) object list, or (physical or logical) object number list.
- The return value is 0.
- .
- .
- .PP
- hwloc-calc will return nonzero if any kind of error occurs, such as
- (but not limited to): failure to parse the command line.
- .
- .\" **************************
- .\" See also section
- .\" **************************
- .SH SEE ALSO
- .
- .ft R
- hwloc(7), lstopo(1), hwloc-info(1)
- .sp
|