| [ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
2.5 A Full Example
Let us consider the allocation problem of a Gaussian elimination, i.e. we want to distribute the various statement instances of the compute kernel onto different processors. The original code is the following:
for (i=1;j<=N-1;i++) {
for (j=i+1;j<=N;j++) {
c[i][j] = a[j][i]/a[i][i] ; /* S1 */
for (k=i+1;k<=N;k++) {
a[j][k] -= c[i][j]*a[i][k] ; /* S2 */
}
}
}
The best affine allocation functions can be found by any good automatic parallelizer like LooPo (see Gri04):
T_S1(i,j)^T = (i) T_S2(i,j,k)^T = (k)
To ensure that on each processor, the set of statement instances is executed according to the original ordering, we add as minor scattering dimensions the original scheduling (see section Defining a Scanning Order: Scattering Functions):
T_S1(i,j)^T = (i,0,i,0,j,0)^T T_S2(i,j,k)^T = (k,0,i,0,j,1,k,0)^T
To ensure that the scattering functions have the same dimensionality, we complete the first function with zeroes (this is a CLooG 0.18.0-UNKNOWN and previous versions requirement, it should be removed in a future version, don’t worry it’s absolutely legal !):
T_S1(i,j)^T = (i,0,i,0,j,0,0,0)^T T_S2(i,j,k)^T = (k,0,i,0,j,1,k,0)^T
The input file corresponding to this code generation problem
could be (this file is provided in the CLooG distribution
as test/manual_gauss.cloog:
# ---------------------- CONTEXT ----------------------
c # language is C
# Context (no constraints on one parameter)
1 3 # 1 line and 3 columns
# eq/in n 1
1 0 0 # 0 >= 0, always true
1 # We want to set manually the parameter name
n # parameter name
# --------------------- STATEMENTS --------------------
2 # Number of statements
1 # First statement: one domain
4 5 # 4 lines and 3 columns
# eq/in i j n 1
1 1 0 0 -1 # i >= 1
1 -1 0 1 -1 # i <= n-1
1 -1 1 0 -1 # j >= i+1
1 0 -1 1 0 # j <= n
0 0 0 # for future options
1
# Second statement: one domain
6 6 # 6 lines and 3 columns
# eq/in i j k n 1
1 1 0 0 0 -1 # i >= 1
1 -1 0 0 1 -1 # i <= n-1
1 -1 1 0 0 -1 # j >= i+1
1 0 -1 0 1 0 # j <= n
1 -1 0 1 0 -1 # k >= i+1
1 0 0 -1 1 0 # k <= n
0 0 0 # for future options
0 # We let CLooG set the iterator names
# --------------------- SCATTERING --------------------
2 # Scattering functions
# First function
8 13 # 3 lines and 3 columns
# eq/in p1 p2 p3 p4 p5 p6 p7 p8 i j n 1
0 1 0 0 0 0 0 0 0 -1 0 0 0 # p1 = i
0 0 1 0 0 0 0 0 0 0 0 0 0 # p2 = 0
0 0 0 1 0 0 0 0 0 -1 0 0 0 # p3 = i
0 0 0 0 1 0 0 0 0 0 0 0 0 # p4 = 0
0 0 0 0 0 1 0 0 0 0 -1 0 0 # p5 = j
0 0 0 0 0 0 1 0 0 0 0 0 0 # p6 = 0
0 0 0 0 0 0 0 1 0 0 0 0 0 # p7 = 0
0 0 0 0 0 0 0 0 1 0 0 0 0 # p8 = 0
# Second function
8 14 # 3 lines and 3 columns
# eq/in p1 p2 p3 p4 p5 p6 p7 p8 i j k n 1
0 1 0 0 0 0 0 0 0 0 0 -1 0 0 # p1 = k
0 0 1 0 0 0 0 0 0 0 0 0 0 0 # p2 = 0
0 0 0 1 0 0 0 0 0 -1 0 0 0 0 # p3 = i
0 0 0 0 1 0 0 0 0 0 0 0 0 0 # p4 = 0
0 0 0 0 0 1 0 0 0 0 -1 0 0 0 # p5 = j
0 0 0 0 0 0 1 0 0 0 0 0 0 -1 # p6 = 1
0 0 0 0 0 0 0 1 0 0 0 -1 0 0 # p7 = k
0 0 0 0 0 0 0 0 1 0 0 0 0 0 # p8 = 0
1 # We want to set manually the scattering dimension names
p1 p2 p3 p4 p5 p6 p7 p8 # scattering dimension names
Calling CLooG, with for instance the command line
cloog -fsp 2 gauss.cloog for a better view
of the allocation (the processor number is given by p1),
will result on the following target code that actually implements
the transformation. A minor processing on the dimension p1
to implement, e.g., MPI calls, which is not shown here may
result in dramatic speedups !
if (n >= 2) {
p1 = 1 ;
for (p5=2;p5<=n;p5++) {
S1(i = 1,j = p5) ;
}
}
for (p1=2;p1<=n-1;p1++) {
for (p3=1;p3<=p1-1;p3++) {
for (p5=p3+1;p5<=n;p5++) {
S2(i = p3,j = p5,k = p1) ;
}
}
for (p5=p1+1;p5<=n;p5++) {
S1(i = p1,j = p5) ;
}
}
if (n >= 2) {
p1 = n ;
for (p3=1;p3<=n-1;p3++) {
for (p5=p3+1;p5<=n;p5++) {
S2(i = p3,j = p5,k = n) ;
}
}
}
| [ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on August 20, 2013 using texi2html 5.0.
