[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
11.2.2 Sorting Array Values and Indices with gawk
In most awk
implementations, sorting an array requires
writing a sort
function.
While this can be educational for exploring different sorting algorithms,
usually that’s not the point of the program.
gawk
provides the built-in asort()
and asorti()
functions
(see section String-Manipulation Functions)
for sorting arrays. For example:
populate the array data n = asort(data) for (i = 1; i <= n; i++) do something with data[i] |
After the call to asort()
, the array data
is indexed from 1
to some number n, the total number of elements in data
.
(This count is asort()
’s return value.)
data[1]
<= data[2]
<= data[3]
, and so on.
The array elements are compared as strings.
An important side effect of calling asort()
is that
the array’s original indices are irrevocably lost.
As this isn’t always desirable, asort()
accepts a
second argument:
populate the array source n = asort(source, dest) for (i = 1; i <= n; i++) do something with dest[i] |
In this case, gawk
copies the source
array into the
dest
array and then sorts dest
, destroying its indices.
However, the source
array is not affected.
asort()
accepts a third string argument
to control comparison of array elements.
As with PROCINFO["sorted_in"]
, this argument may be the
name of a user-defined function, or one of the predefined names
that gawk
provides
(see section Array Scanning Using A User-defined Function).
NOTE: In all cases, the sorted element values consist of the original array’s element values. The ability to control comparison merely affects the way in which they are sorted.
Often, what’s needed is to sort on the values of the indices
instead of the values of the elements.
To do that, use the
asorti()
function. The interface is identical to that of
asort()
, except that the index values are used for sorting, and
become the values of the result array:
{ source[$0] = some_func($0) } END { n = asorti(source, dest) for (i = 1; i <= n; i++) { Work with sorted indices directly: do something with dest[i] … Access original array via sorted indices: do something with source[dest[i]] } } |
Similar to asort()
,
in all cases, the sorted element values consist of the original
array’s indices. The ability to control comparison merely
affects the way in which they are sorted.
Sorting the array by replacing the indices provides maximal flexibility. To traverse the elements in decreasing order, use a loop that goes from n down to 1, either over the elements or over the indices.(60)
Copying array indices and elements isn’t expensive in terms of memory.
Internally, gawk
maintains reference counts to data.
For example, when asort()
copies the first array to the second one,
there is only one copy of the original array elements’ data, even though
both arrays use the values.
Because IGNORECASE
affects string comparisons, the value
of IGNORECASE
also affects sorting for both asort()
and asorti()
.
Note also that the locale’s sorting order does not
come into play; comparisons are based on character values only.(61)
Caveat Emptor.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |