2011-02-22

Optimizing the for loop for fun and profit

The three-expression, C style for loop is one of the most useful iteration statements in many programming languages including C#, javascript and perl. However it is commonly used in a sloppy manner and can often be optimized for better performance which is why I have set out to try and explain the inner workings of the for loop in this blog post. This is pretty basic stuff but it seems a fair share of developers has overseen the details of how the for loop really, really works when learning it.



As usual when discussing optimization in programming we talk about gaining microseconds here and there but considering a lot (perhaps most) of the execution time of a program is spent looping, those microseconds may build up to seconds and perhaps even minutes in large systems. I’ll be using C# and Common Intermediate Language (henceforth CIL) to illustrate my case.

Introduction


The three-expression for loop consists of three optional control parameters separated by semi colons:
for (int i = 0; i < 42; i++)
  • The first parameter is an initializing expression used for setting up new variables (that will be local in the scope of the for loop) or to set values to variables that already exist.
  • The second is the loop control expression, as long as it evaluates to true the code in the for loop will keep executing.
  • The third is commonly known as the counting expression and is usually used for changing the value of the variables set up in the initializing expression.
Now that you know the basics of the for loop (though you probably already did), let’s dive into the depths of it. Something in the lines of the following is what you’ll usually see:
for (int i = 0; i < 10000; i++)
{
    //do stuff
}
The above compiles to the following CIL:
IL_00: ldc.i4.0 //Push 0 to the stack
IL_01: stloc.0 //Pop value from the stack into local variable at index 0
IL_02: br.s IL_08 //Branch to a target instruction at IL_08
IL_04: ldloc.0 //Load local variable @ index 0 onto the evaluation stack
IL_05: ldc.i4.1 //Push 1 to the stack
IL_06: add  //Add two numeric values, return a new numeric value
IL_07: stloc.0  //Pop value from the stack into local variable at index 0
IL_08: ldloc.0 //Load local variable @ index 0 on the evaluation stack
IL_09: ldc.i4 10 27 00 00 //Push 10000 onto the stack
IL_0E: blt.s IL_04 //Branch to the target instruction at IL_04
                   //if the first value is less than the second value
Let’s try and translate the above, line by line (probably skip this if you were able to follow the above CIL):

Instruction label CIL instruction Explanation
IL_00
IL_01
ldc.i4.0
stloc.0
int i = 0 (initializing expression)
IL_02 br.s IL_08 When the loop starts executing, first jump to the end of the loop to evaluate whether the loop control expression is true just to make sure that the for loop is to be started at all.
IL_08
IL_09
IL_0E
ldloc.0
ldc.i4 10 27 00 00
blt.s IL_04
Is it true that i < 1000? If so, start executing the loop by jumping to the beginning of it (loop control expression)
//do stuff, execute code logic inside for loop
IL_04
IL_05
IL_06
IL_07
ldloc.0
ldc.i4.1
add
stloc.0
i++ (counting expression)
IL_08
IL_09
IL_0E
ldloc.0
ldc.i4 10 27 00 00
blt.s IL_04
Again, is it true that i < 1000? Keep jumping to the top of the loop (where the logic of the for loop lives) until this expression evaluates to false. (loop control expression)

As you can tell, the initializing expression of the for loop is executed once before the loop starts executing, the control expression is executed every repetition at the very end of the loop and the counting expression is executed once inside the loop body, on every repetition of the loop just before the loop control expression. The logic of the for loop lives at the very top of the loop body. Note that the first thing that happens after the initializing expression has been executed is that the loop control expression is executed at the very end of the loop body before jumping back to the top of the loop and any logic is actually executed.

The problem


One of the main usages of the for loop is to iterate through lists, tables or collections of various kinds and perform operations on the values or objects within. Using a hard coded value in the loop control expression is sometimes useful but most of the time the for loop will use the size of a table in its loop control expression. Consider the following example, counting the occurrences of the character ‘x’ in the string s:
string s = new String('x', 10000);
int counter = 0;
 
for (int i = 0; i < s.Length; i++)
{
    if (s[i].Equals('x'))
    {
        counter++;
    }
}
 
Console.Write("There are {0} occurrences of x in s.", counter);
Did you notice the opportunity for optimization in the for-expression of the above code example? As you may recall from the CIL listing in the introduction, the initializing expression is executed once before the loop starts: int i = 0; and the loop control is executed inside the loop, every iteration, at the very end of the loop body: i < s.Length;. Assuming checking the Length of the string s uses some CPU cycles (which it does) and that we know the length of the string will not change during the lifetime of the for loop since we are only reading from it, is it really desirable to check the length of the string every iteration? The answer is of course no. Let’s try and adjust the example code so it reads as follows:
string s = new String('x', 10000);
int counter = 0;
 
for (int i = s.Length - 1; i >= 0; --i)
{
    if (s[i].Equals('x'))
    {
        counter++;
    }
}
 
Console.Write("There are {0} occurrences of x in s.", counter);
In the adjusted example above we have moved s.Length to the initializing expression so that it is only executed once, before the loop starts executing. The loop control expression now checks to see if i is 0 or higher, that is, within the boundaries of the string. Since we turned the numbers backwards we count down in the counting expression instead of counting up. Every char in the string will now be evaluated from the end of the string to the front, starting at string index 9999 iterating back to index 0. Since the order in which the chars are checked are of no importance the end result will be exactly the same – only the result will be produced faster. My benchmark code for this example indicates that the backward for loop executes for about 70% of the time the forward loop needs to complete the exact same count. In my opinion, this is a number large enough not to ignore. So, why are people using the forward loop? The short answer is “because order matters (sometimes)”. For the long answer, consider the following example:
string s = "abcdefgh";
 
for (int i = 0; i < s.Length - 1; i++)
{
    Console.Write("In the alphabet {0} comes before {1}", s[i], s[i+1]);
}
The above can only produce the desired output if it is executed forwards (or if the order of the chars in the string comes in a backward order which would be a silly way of solving the problem) since the order of characters in the string are important for the resulting output.

Conclusion


The three-expression for loop takes three control parameters of which the first is executed once and the other two are executed each iteration of the loop body. Most of the time backward style loop logic is faster than a forward loop and works just as well and is usually not less readable for a third party. In systems that process a lot of data in loops there may be seconds to be gained from constructing for loops in a more optimized way alone. Upon implementing a for loop one should at least consider whether the order of things matters to the logic of the loop, if it is possible to place an expression that uses CPU cycles in the initializing expression of the loop since it will only run once and construct the loop so that it is as fast as possible for the given situation.

Have fun optimizing your sloppy for loops!

No comments:

Post a Comment