info cloog

2.5 A Full Example

Let us consider the allocation problem of a Gaussian elimination, i.e. we want to distribute the various statement instances of the compute kernel onto different processors. The original code is the following:

for (i=1;j<=N-1;i++) {
  for (j=i+1;j<=N;j++) {
    c[i][j] = a[j][i]/a[i][i] ;    /* S1 */
    for (k=i+1;k<=N;k++) {
      a[j][k] -= c[i][j]*a[i][k] ; /* S2 */
    }
  }
}

The best affine allocation functions can be found by any good automatic parallelizer like LooPo (see Gri04):

T_S1(i,j)^T   = (i)
T_S2(i,j,k)^T = (k)

To ensure that on each processor, the set of statement instances is executed according to the original ordering, we add as minor scattering dimensions the original scheduling (see section Defining a Scanning Order: Scattering Functions):

T_S1(i,j)^T   = (i,0,i,0,j,0)^T
T_S2(i,j,k)^T = (k,0,i,0,j,1,k,0)^T

To ensure that the scattering functions have the same dimensionality, we complete the first function with zeroes (this is a CLooG 0.18.0-UNKNOWN and previous versions requirement, it should be removed in a future version, don’t worry it’s absolutely legal !):

T_S1(i,j)^T   = (i,0,i,0,j,0,0,0)^T
T_S2(i,j,k)^T = (k,0,i,0,j,1,k,0)^T

The input file corresponding to this code generation problem could be (this file is provided in the CLooG distribution as test/manual_gauss.cloog:

# ---------------------- CONTEXT ----------------------
c # language is C

# Context (no constraints on one parameter)
1 3                     # 1 line and 3 columns    
# eq/in n  1
    1   0  0            # 0 >= 0, always true

1 # We want to set manually the parameter name
n                       # parameter name

# --------------------- STATEMENTS --------------------
2 # Number of statements

1 # First statement: one domain
4 5                     # 4 lines and 3 columns
# eq/in i  j  n  1
    1   1  0  0 -1      # i >= 1
    1  -1  0  1 -1      # i <= n-1
    1  -1  1  0 -1      # j >= i+1
    1   0 -1  1  0      # j <= n
0  0  0                 # for future options
 
1
# Second statement: one domain
6 6                     # 6 lines and 3 columns
# eq/in i  j  k  n  1
    1   1  0  0  0 -1   # i >= 1
    1  -1  0  0  1 -1   # i <= n-1
    1  -1  1  0  0 -1   # j >= i+1
    1   0 -1  0  1  0   # j <= n
    1  -1  0  1  0 -1   # k >= i+1
    1   0  0 -1  1  0   # k <= n
0  0  0                 # for future options

0 # We let CLooG set the iterator names

# --------------------- SCATTERING --------------------
2 # Scattering functions
# First function
8 13                    # 3 lines and 3 columns
# eq/in p1 p2 p3 p4 p5 p6 p7 p8  i  j  n  1
    0    1  0  0  0  0  0  0  0 -1  0  0  0     # p1 = i
    0    0  1  0  0  0  0  0  0  0  0  0  0     # p2 = 0
    0    0  0  1  0  0  0  0  0 -1  0  0  0     # p3 = i
    0    0  0  0  1  0  0  0  0  0  0  0  0     # p4 = 0
    0    0  0  0  0  1  0  0  0  0 -1  0  0     # p5 = j
    0    0  0  0  0  0  1  0  0  0  0  0  0     # p6 = 0
    0    0  0  0  0  0  0  1  0  0  0  0  0     # p7 = 0
    0    0  0  0  0  0  0  0  1  0  0  0  0     # p8 = 0
# Second function
8 14                    # 3 lines and 3 columns
# eq/in p1 p2 p3 p4 p5 p6 p7 p8  i  j  k  n  1
    0    1  0  0  0  0  0  0  0  0  0 -1  0  0  # p1 = k
    0    0  1  0  0  0  0  0  0  0  0  0  0  0  # p2 = 0
    0    0  0  1  0  0  0  0  0 -1  0  0  0  0  # p3 = i
    0    0  0  0  1  0  0  0  0  0  0  0  0  0  # p4 = 0
    0    0  0  0  0  1  0  0  0  0 -1  0  0  0  # p5 = j
    0    0  0  0  0  0  1  0  0  0  0  0  0 -1  # p6 = 1
    0    0  0  0  0  0  0  1  0  0  0 -1  0  0  # p7 = k
    0    0  0  0  0  0  0  0  1  0  0  0  0  0  # p8 = 0

1 # We want to set manually the scattering dimension names
p1 p2 p3 p4 p5 p6 p7 p8 # scattering dimension names

Calling CLooG, with for instance the command line cloog -fsp 2 gauss.cloog for a better view of the allocation (the processor number is given by p1), will result on the following target code that actually implements the transformation. A minor processing on the dimension p1 to implement, e.g., MPI calls, which is not shown here may result in dramatic speedups !

if (n >= 2) {
  p1 = 1 ;
  for (p5=2;p5<=n;p5++) {
    S1(i = 1,j = p5) ;
  }
}
for (p1=2;p1<=n-1;p1++) {
  for (p3=1;p3<=p1-1;p3++) {
    for (p5=p3+1;p5<=n;p5++) {
      S2(i = p3,j = p5,k = p1) ;
    }
  }
  for (p5=p1+1;p5<=n;p5++) {
    S1(i = p1,j = p5) ;
  }
}
if (n >= 2) {
  p1 = n ;
  for (p3=1;p3<=n-1;p3++) {
    for (p5=p3+1;p5<=n;p5++) {
      S2(i = p3,j = p5,k = n) ;
    }
  }
}

This document was generated on August 20, 2013 using texi2html 5.0.