manpagez: man pages & more
man hwloc-calc(1)
Home | html | info | man
hwloc-calc(1)                        hwloc                       hwloc-calc(1)


NAME

       hwloc-calc - Operate on cpu mask strings and objects


SYNOPSIS

       hwloc-calc [topology options] [options] <location1> [<location2> [...]
       ]

       Note that hwloc(7) provides a detailed explanation of the hwloc system
       and of valid <location> formats; it should be read before reading this
       man page.


TOPOLOGY OPTIONS

       All topology options must be given before all other options.

       --no-smt, --no-smt=<N>
                 Only keep the first PU per core in the input locations.  If
                 <N> is specified, keep the <N>-th instead, if any.  PUs are
                 ordered by physical index during this filtering.

                 Note that this option is applied after searching locations.
                 Hence --no-smt pu:2-5 will first select the PUs #2 to #5 in
                 the machine before keeping one of them per core.  To rather
                 get PUs #2 to #5 after filtering one per core, you should
                 combine invocations:

                   hwloc-calc --restrict $(hwloc-calc --no-smt all) pu:2-5


       --cpukind <n>, --cpukind <infoname>=<infovalue>
                 Only keep PUs whose CPU kind match.  Either a single CPU kind
                 is specified as an index, or the info attribute name-value
                 will select matching kinds.

                 When specified by index, it corresponds to hwloc ranking of
                 CPU kinds which returns energy-efficient cores first, and
                 high-performance power-hungry cores last.  The full list of
                 CPU kinds may be seen with lstopo --cpukinds.

                 Note that this option is applied after searching locations.
                 Hence --cpukind 0 core:1 will return the second core of the
                 machine if it is of kind 0, and nothing otherwise.  To rather
                 get the second core among those of kind 0, you should combine
                 invocations:

                   hwloc-calc --restrict $(hwloc-calc --cpukind 0 all) core:1


       --restrict <cpuset>
                 Restrict the topology to the given cpuset.  This removes some
                 PUs and their now-child-less parents.

                 This is useful when combining invocations to filter some
                 objects before selecting among them.

                 Beware that restricting the PUs in a topology may change the
                 logical indexes of many objects, including NUMA nodes.

       --restrict nodeset=<nodeset>
                 Restrict the topology to the given nodeset (unless
                 --restrict-flags specifies something different).  This
                 removes some NUMA nodes and their now-child-less parents.

                 Beware that restricting the NUMA nodes in a topology may
                 change the logical indexes of many objects, including PUs.

       --restrict-flags <flags>
                 Enforce flags when restricting the topology.  Flags may be
                 given as numeric values or as a comma-separated list of flag
                 names that are passed to hwloc_topology_restrict().  Those
                 names may be substrings of actual flag names as long as a
                 single one matches, for instance bynodeset,memless.  The
                 default is 0 (or none).

       --disallowed
                 Include objects disallowed by administrative limitations.

       -i <path>, --input <path>
                 Read the topology from <path> instead of discovering the
                 topology of the local machine.

                 If <path> is a file, it may be a XML file exported by a
                 previous hwloc program.  If <path> is "-", the standard input
                 may be used as a XML file.

                 On Linux, <path> may be a directory containing the topology
                 files gathered from another machine topology with hwloc-
                 gather-topology.

                 On x86, <path> may be a directory containing a cpuid dump
                 gathered with hwloc-gather-cpuid.

                 When the archivemount program is available, <path> may also
                 be a tarball containing such Linux or x86 topology files.

       -i <specification>, --input <specification>
                 Simulate a fake hierarchy (instead of discovering the
                 topology on the local machine). If <specification> is "node:2
                 pu:3", the topology will contain two NUMA nodes with 3
                 processing units in each of them.  The <specification> string
                 must end with a number of PUs.

       --if <format>, --input-format <format>
                 Enforce the input in the given format, among xml, fsroot,
                 cpuid and synthetic.


OPTIONS

       All these options must be given after all topology options above.

       -p --physical
                 Use OS/physical indexes instead of logical indexes for both
                 input and output.

       -l --logical
                 Use logical indexes instead of physical/OS indexes for both
                 input and output (default).

       --pi --physical-input
                 Use OS/physical indexes instead of logical indexes for input.

       --li --logical-input
                 Use logical indexes instead of physical/OS indexes for input
                 (default).

       --po --physical-output
                 Use OS/physical indexes instead of logical indexes for
                 output.

       --lo --logical-output
                 Use logical indexes instead of physical/OS indexes for output
                 (default, except for cpusets which are always physical).

       -n --nodeset
                 Interpret both input and output sets as nodesets instead of
                 CPU sets.  See --nodeset-output and --nodeset-input below for
                 details.

       --no --nodeset-output
                 Report nodesets instead of CPU sets.  This output is more
                 precise than the default CPU set output when memory locality
                 matters because it properly describes CPU-less NUMA nodes, as
                 well as NUMA-nodes that are local to multiple CPUs.

       --ni --nodeset-input
                 Interpret input sets as nodesets instead of CPU sets.

       --oo --object-output
                 When reporting object indexes (e.g. with -I or
                 --local-memory), this option prefixes these indexes with
                 types (e.g. Core:0 instead of 0).

       -N --number-of <type|depth>
                 Report the number of objects of the given type or depth that
                 intersect the CPU set.  This is convenient for finding how
                 many cores, NUMA nodes or PUs are available in a machine.

                 When combined with --nodeset or --nodeset-output, the nodeset
                 is considered instead of the CPU set for finding matching
                 objects.  This is useful when reporting the output as a
                 number or set of NUMA nodes.

                 <type may contain a filter to select specific objects among
                 the type. For instance -N "numa[hbm]" counts NUMA nodes
                 marked with subtype "HBM", while -N "numa[mcdram]" only
                 counts MCDRAM NUMA nodes on KNL.

                 If an OS device subtype such as gpu  is given instead of
                 osdev, only the os devices of that subtype will be counted.

       -I --intersect <type|depth>
                 Find the list of objects of the given type or depth that
                 intersect the CPU set and report the comma-separated list of
                 their indexes instead of the cpu mask string.  This may be
                 used for determining the list of objects above or below the
                 input objects.

                 When combined with --physical, the list is convenient to pass
                 to external tools such as taskset or numactl --physcpubind or
                 --membind.  This is different from --largest since the latter
                 requires that all reported objects are strictly included
                 inside the input objects.

                 When combined with --nodeset or --nodeset-output, the nodeset
                 is considered instead of the CPU set for finding matching
                 objects.  This is useful when reporting the output as a
                 number or set of NUMA nodes.

                 <type may contain a filter to select specific objects among
                 the type. For instance -N "numa[hbm]" lists NUMA nodes marked
                 with subtype "HBM", while -N "numa[mcdram]" only lists MCDRAM
                 NUMA nodes on KNL.

                 If an OS device subtype such as gpu is given instead of
                 osdev, only the os devices of that subtype will be returned.

                 If combined with --object-output, object indexes are prefixed
                 with types (e.g. Core:0 instead of 0).

       -H --hierarchical <type1>.<type2>...
                 Find the list of objects of type <type2> that intersect the
                 CPU set and report the space-separated list of their
                 hierarchical indexes with respect to <type1>, <type2>, etc.
                 For instance, if package.core is given, the output would be
                 Package:1.Core:2 Package:2.Core:3 if the input contains the
                 third core of the second package and the fourth core of the
                 third package.

                 Only normal CPU-side object types should be used.

                 NUMA nodes may be used but they may cause redundancy in the
                 output on heterogeneous memory platform. For instance, on a
                 platform with both DRAM and HBM memory on a package, the
                 first core will be considered both as first core of first
                 NUMA node (DRAM) and as first core of second NUMA node (HBM).

       --largest Report (in a human readable format) the list of largest
                 objects which exactly include all input objects (by looking
                 at their CPU sets).  None of these output objects intersect
                 each other, and the sum of them is exactly equivalent to the
                 input. No larger object is included in the input.

                 This is different from --intersect where reported objects may
                 not be strictly included in the input.

       --local-memory
                 Report the list of NUMA nodes that are local to the input
                 objects.

                 This option is similar to -I numa but the way nodes are
                 selected is different: The selection performed by
                 --local-memory may be precisely configured with
                 --local-memory-flags, while -I numa just selects all nodes
                 that are somehow local to any of the input objects.

                 If combined with --object-output, object indexes are prefixed
                 with types (e.g. NUMANode:0 instead of 0).

       --local-memory-flags
                 Change the flags used to select local NUMA nodes.  Flags may
                 be given as numeric values or as a comma-separated list of
                 flag names that are passed to
                 hwloc_get_local_numanode_objs().  Those names may be
                 substrings of actual flag names as long as a single one
                 matches.  The default is 3 (or smaller,larger) which means
                 NUMA nodes are displayed if their locality either contains or
                 is contained in the locality of the given object.

                 This option enables --local-memory.

       --best-memattr <name>
                 Enable the listing of local memory nodes with --local-memory,
                 but only display the local nodes that have the best value for
                 the memory attribute given by <name> (or as an index).

                 If the memory attribute values depend on the initiator, the
                 hwloc-calc input objects are used as the initiator.

                 Standard attribute names are Capacity, Locality, Bandwidth,
                 and Latency.  All existing attributes in the current topology
                 may be listed with

                     $ lstopo --memattrs

                 If combined with --object-output, the object index is
                 prefixed with its type (e.g. NUMANode:0 instead of 0).

                 <name> may be suffixed with flags to tune the selection of
                 best nodes, for instance as bandwidth,strict,default.
                 default means that all local nodes are reported if no best
                 could be found.  strict means that nodes are selected only if
                 their performance is the best for all the input CPUs. On a
                 dual-socket machine with HBM in each socket, both HBMs are
                 the best for their local socket, but not for the remote
                 socket.  Hence both HBM are also considered best for the
                 entire machine by default, but none if strict.

       --sep <sep>
                 Change the field separator in the output.  By default, a
                 space is used to separate output objects (for instance when
                 --hierarchical or --largest is given) while a comma is used
                 to separate indexes (for instance when --intersect is given).

       --single  Singlify the output to a single CPU.

       --cpuset-output-format <hwloc|list|taskset> --cof <hwloc|list|taskset>
                 Change the format of displayed CPU set strings.  By default,
                 the hwloc-specific format is used.  If list is given, the
                 output is a comma-separated of numbers or ranges, e.g.
                 2,4-5,8 .  If taskset is given, the output is compatible with
                 the taskset program (replaces the former --taskset option).

                 This option has no impact on the format of input CPU set
                 strings, see --cpuset-input-format.

       --cpuset-input-format <hwloc|list|taskset> --cif <hwloc|list|taskset>
                 Change the format of input CPU set strings.  By default, the
                 tool tries to guess the type automatically between hwloc,
                 list or taskset formats.  This option forces the parsing
                 format to avoid ambiguity for instance when "1,3,5" may be
                 parsed as a hwloc cpuset "0x1,0x00000003,0x00000005" or as
                 list "1-1,3-3,5-5".

                 This option has no impact on the format of output CPU set
                 strings, see --cpuset-output-format.

       -q --quiet
                 Hide non-fatal error messages.  It mostly includes locations
                 pointing to non-existing objects.

       -v --verbose
                 Verbose output.

       --version Report version and exit.

       -h --help Display help message and exit.


DESCRIPTION

       hwloc-calc generates and manipulates CPU mask strings or objects.  Both
       input and output may be either objects (with physical or logical
       indexes), CPU lists (with physical or logical indexes), or CPU mask
       strings (always physically indexed).  Input location specification is
       described in hwloc(7).

       If objects or CPU mask strings are given on the command-line, they are
       combined and a single output is printed.  If no object or CPU mask
       strings are given on the command-line, the program will read the
       standard input.  It will combine multiple objects or CPU mask strings
       that are given on the same line of the standard input line with spaces
       as separators.  Different input lines will be processed separately.

       Command-line arguments and options are processed in order.  First
       topology configuration options should be given.  Then, for instance,
       changing the type of input indexes with --li or changing the input
       topology with -i only affects the processing the following arguments.

       NOTE: It is highly recommended that you read the hwloc(7) overview page
       before reading this man page.  Most of the concepts described in
       hwloc(7) directly apply to the hwloc-calc utility.


EXAMPLES

       hwloc-calc's operation is best described through several examples.

       To display the (physical) CPU mask corresponding to the second package:

           $ hwloc-calc package:1
           0x000000f0

       To display the (physical) CPU mask corresponding to the third pacakge,
       excluding its even numbered logical processors:

           $ hwloc-calc package:2 ~PU:even
           0x00000c00

       To convert a cpu mask to human-readable output, the -H option can be
       used to emit a space-delimited list of locations:

           $ echo 0x000000f0 | hwloc-calc -q -H package.core
           Package:1.Core1 Package:1.Core:1 Package:1.Core:2 Package:1.Core:3

       To use some other character (e.g., a comma) instead of spaces in
       output, use the --sep option:

           $ echo 0x000000f0 | hwloc-calc -q -H package.core --sep ,
           Package:1.Core1,Package:1.Core:1,Package:1.Core:2,Package:1.Core:3

       To combine two (physical) CPU masks:

           $ hwloc-calc 0x0000ffff 0xff000000
           0xff00ffff

       To display the list of logical numbers of processors included in the
       second package:

           $ hwloc-calc --intersect PU package:1
           4,5,6,7

       To bind GNU OpenMP threads logically over the whole machine, we need to
       use physical number output instead:

           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output
       --intersect PU all`
           $ echo $GOMP_CPU_AFFINITY
           0,4,1,5,2,6,3,7

       To display the list of NUMA nodes, by physical indexes, that intersect
       a given (physical) CPU mask:

           $ hwloc-calc --physical --intersect NUMAnode 0xf0f0f0f0
           0,2

       To find how many cores are in the second CPU kind (those cores are
       likely higher-performance and more power-hungry than cores of the first
       kind):

           $ hwloc-calc --cpukind 1 -N core all
           4

       To display the list of NUMA nodes, by physical indexes, whose locality
       is exactly equal to a Package:

           $ hwloc-calc --local-memory-flags 0 --physical-output pack:1
           4,7

       To display the best-capacity NUMA node(s), by physical indexes, whose
       locality is exactly equal to a Package:

           $ hwloc-calc --local-memory-flags 0 --best-memattr capacity
       --physical-output pack:1
           4

       To find the number of NUMA nodes with subtype "HBM":

           $ hwloc-calc -N "numa[hbm]" all
           4

       To find the number of NUMA nodes in memory tier 1 (DRAM nodes on a
       server with HBM and DRAM):

           $ hwloc-calc -N "numa[tier=1]" all
           4

       To find the NUMA node of subtype MCDRAM (on KNL) near a PU:

           $ hwloc-calc -I "numa[mcdram]" pu:157
           1

       Converting object logical indexes (default) from/to physical/OS indexes
       may be performed with --intersect combined with either --physical-
       output (logical to physical conversion) or --physical-input (physical
       to logical):

           $ hwloc-calc --physical-output PU:2 --intersect PU
           3
           $ hwloc-calc --physical-input PU:3 --intersect PU
           2

       One should add --nodeset when converting indexes of memory objects to
       make sure a single NUMA node index is returned on platforms with
       heterogeneous memory:

           $ hwloc-calc --nodeset --physical-output node:2 --intersect node
           3
           $ hwloc-calc --nodeset --physical-input node:3 --intersect node
           2

       To display the set of CPUs near network interface eth0:

           $ hwloc-calc os=eth0
           0x00005555

       To display the indexes of packages near PCI device whose bus ID is
       0000:01:02.0:

           $ hwloc-calc pci=0000:01:02.0 --intersect Package
           1

       To display the list of per-package cores that intersect the input:

           $ hwloc-calc 0x00003c00 --hierarchical package.core
           Package:2.Core:1 Package:3.Core:0

       To display the (physical) CPU mask of the entire topology except the
       third package:

           $ hwloc-calc all ~package:3
           0x0000f0ff

       To combine both physical and logical indexes as input:

           $ hwloc-calc PU:2 --physical-input PU:3
           0x0000000c

       To synthetize a set of cores into largest objects on a 2-node 2-package
       2-core machine:

           $ hwloc-calc core:0 --largest
           Core:0
           $ hwloc-calc core:0-1 --largest
           Package:0
           $ hwloc-calc core:4-7 --largest
           NUMANode:1
           $ hwloc-calc core:2-6 --largest
           Package:1 Package:2 Core:6
           $ hwloc-calc pack:2 --largest
           Package:2
           $ hwloc-calc package:2-3 --largest
           NUMANode:1

       To get the set of first threads of all cores:

           $ hwloc-calc core:all.pu:0
           $ hwloc-calc --no-smt all

       This can also be very useful in order to make GNU OpenMP use exactly
       one thread per core, and in logical core order:

           $ export OMP_NUM_THREADS=`hwloc-calc --number-of core all`
           $ echo $OMP_NUM_THREADS
           4
           $ export GOMP_CPU_AFFINITY=`hwloc-calc --physical-output
       --intersect PU --no-smt all`
           $ echo $GOMP_CPU_AFFINITY
           0,2,1,3

       To export bitmask in a format that is acceptable by the resctrl Linux
       subsystem (for configuring cache partitioning, etc), apply a sed regexp
       to the output of hwloc-calc:

           $ hwloc-calc pack:all.core:7-9.pu:0
           0x00000380,,0x00000380   <this format cannot be given to resctrl>
           $ hwloc-calc pack:all.core:7-9.pu:0 | sed -e 's/0x//g' -e
       's/,,/,0,/g' -e 's/,,/,0,/g'
           00000380,0,00000380
           # echo 00000380,0,00000380 > /sys/fs/resctrl/test/cpus
           # cat /sys/fs/resctrl/test/cpus
           00000000,00000380,00000000,00000380   <the modified bitmask was
       corrected parsed by resctrl>

       OS devices may also be filtered by subtype. In this example, there are
       8 OS devices in the system, 4 of them are near NUMA node #1, and only 2
       of these are CoProcessors:

           $ utils/hwloc/hwloc-calc -I osdev all
           0,1,2,3,4,5,6,7,8
           $ utils/hwloc/hwloc-calc -I osdev node:1
           5,6,7,8
           $ utils/hwloc/hwloc-calc -I coproc node:1
           7,8



RETURN VALUE

       Upon successful execution, hwloc-calc displays the (physical) CPU mask
       string, (physical or logical) object list, or (physical or logical)
       object number list.  The return value is 0.

       hwloc-calc will return nonzero if any kind of error occurs, such as
       (but not limited to): failure to parse the command line.


SEE ALSO

       hwloc(7), lstopo(1), hwloc-info(1)


2.11.0                           June 25, 2024                   hwloc-calc(1)

hwloc 2.11.0 - Generated Fri Jun 28 07:18:33 CDT 2024
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.