When listening to this week's embedded.fm episode ( chriswhite ), I heard this Arduino tip:
If you use downcounters in stead of upcounters in your loop(), you save a clock cycle each time,
because down counting is a core operation and takes 1 cycle. Up takes 2 cycles.
For those cases where you have to pinch that last cycle out of your loop().