Loop Optimization Example

CSCI 410 [Bono]


Note: This example from lecture is a simplified version of the Pascal example given in the Parson's book in Section 6.1.3. We're assuming the array is stored in row-major order, and that all the arrays are the same size. Here we show step by step how each optimization could be applied to this code.

Original Code


for (j = 0; j <= 30; j++) {
  for (k = 0; k <= 30; k++) {
    x[j][k] = y[j][k] + z[j][k];
  }
}

Common-subexpression eliminations applied

The offsets for each of the arrays is the same, so we can eliminate that common subexpression (i.e., compute the value in t2 only once, and use it 3 times).

Computations for array indexing shown using pseudo-high-level-lang instead of 3AC. C1 is a contant representing W*NCOLS. W is the width of an element, and NCOLS is the number of columns in each of the arrays.


for (j = 0; j <= 30; j++) {
  for (k = 0; k <= 30; k++) {
    t2 = C1 * j + W * k;
    x[t2] = y[t2] + z[t2];
  }
}

Code-motion applied

The expression C1*j is constant in the inner loop and can be taken out:

for (j = 0; j <= 30; j++) {
  t1 = C1 * j
  for (k = 0; k <= 30; k++) {
    t2 = t1 + W * k;
    x[t2] = y[t2] + z[t2];
  }
}

Loop reduction in strength applied

We can replace mults by additions in the inner and outer loop, since t1 and t2 are induction variables.

t1 = 0;
for (j = 0; j <= 30; j++) {
  t2 = t1 + 0;
  for (k = 0; k <= 30; k++) {
    x[t2] = y[t2] + z[t2];
    t2 = t2 + W;
  }
  t1 = t1 + C1;
}

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees