Analysis of an Empty Arduino Sketch.
(originally posted in the Arduino Forums. http://www.arduino.cc/cgi-bin/yabb2/YaBB.pl?num=1283329855 )
This comes up relatively frequently. A trivial sketch, not even referencing any of the Arduino functions, ends up occupying over 400 bytes of flash memory. Is the compiler THAT inefficient, that a few lines of code that ought to generate single AVR instructions, generates 400 bytes instead?
Well, NO! Of course not. Most of the space is occupied by overhead functions that you might need in a sketch, whether or not you actually use them. The Arduino environment ends up doing a pretty good job of excluding unused function (if you don't use "digitalWrite", the code that implements it is not included), but it's not perfect.
I decided to compile an empty sketch and analyze the code produced, to see which pieces remained, and whether they looked like they were reasonably optimized. Here are the results.
First, the empty sketch:
void setup(){
}
void loop(){
}
Next, the breakdown. I'm not going to boggle everyone with all of the assembly language produced; just some analysis (I'll put the assembly in the next message in this thread.)
Let's get right to those empty setup() and loop() functions. They end up compiling to bare "return" instructions; two bytes each! Code:
;;; Empty setup() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment
;;; Empty loop() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment
The setup and loop functions in a user sketch are called by another function "main", that is the traditional C/C++ language main function. main() in turn is called by startup code generated by the C compiler. This string of function calls occupies another couple of bytes:
;;; Linkage to main() program
;;; length 8 bytes
;;; provided by: gcc compiler.
;;; main()
;;; length: 14 bytes
;;; provided by: Arduino environment
;;; required by: C language convention
;;; exit and __stop_program
;;; length: 4 bytes
;;; provided by: gcc C compiler
;;; required by: nothing.
;;; Comment pretty much unused in the arduino environment.
main() also calls init(), an Arduino environment function that sets up the AVR peripherals (Timers, A-D converter, etc) to the state that the rest of the Arduino functions are expecting. init() is 114 bytes of code, and is one of the few places where the code and/or the compiler did some obviously inefficient things. But it only executes once anyway, and 114 bytes is not a lot compared to the 7k of space you have even on an ATmega8, so it's not really worth optimizing. Really.
;;; init()
;;; Length: 114 bytes
;;; provided by: Arduino Environment
;;; Required by: Arduino Environment, user sketches
;;; Comments: initializes peripherals (especially timers) as expected by
;;; the ISR and PWM output, and so on.
;;; The compiler seems to do a particularly poor job of optimizing
;;; what ought to be straightforward code.
Now, before the startup code calls main(), it has to do some initialization as "required" by the C standards. It sets up a stack, and makes sure the CPU status register is in a known state. Initialized variables are copied from flash to RAM, and uninitialized variables are cleared to zero. This code will be present whether or not you have any variables at all.
;;; Basic core startup code;
;;; Length 12 bytes;
;;; Provided by: gcc Compiler
;;; required by: gcc Compiler
;;; Copy initialized data from Flash to RAM.
;;; Length 22 bytes
;;; Provided by: gcc compiler
;;; required by: any sketch using initialized data.
;;; Clear uninitialized data to 0s.
;;; length 16 bytes
;;; Provided by: gcc compiler
;;; required by: C language specification.
;;; Comments: not necessary if there are no uninitialized variables.
Now, the Arduino environment uses a timer interrupt to maintain the millis() clock, uses interrupts for the serial port, and allows users to attach interrupts to a couple of the pins. The AVR uses "vectored interrupts", which means that each potential source of interrupts has a function ("vector") registered to it. The AVR has 26 interrupt sources, and this table occupies 104 bytes.
;;; Table of interrupt vectors
;;; Size: 104 bytes.
;;; Provided by: gcc C compiler
;;; Required by: RESET, Timer, UART, etc.
;;; Comment: in theory, unused interrupt vectors could hold other data.
Finally, there is the timer interrupt service routine itself, which is present and running whether you use it or not. This is 142 bytes long, which is pretty long (especially for an interrupt service routine.) Unfortunately, this is already the "optimized" version; it ends up having to maintain TWO 32-bit counters in memory for the sake of backward compatibility, and you get to see firsthand just how inefficient 32bit math can be on an 8bit CPU. Each load/increment/store takes about 30 bytes, plus overhead for saving the registers used, plus the math to keep track of milliseconds when your interrupt happens every 1.024 ms...
;;; Timer0 interrupt service routine
;;; Length: 142 bytes
;;; Provided by: Arduino Environment
;;; Required by: Arduino Environment (millis(), delay(), etc)
;;; Comments: long due to several 32-bit variable modifications.
Raw Assembly Listing
Here's the actual assembly language code. Note that this is arranged somewhat differently than it was discussed in the previous posting. This message has it just as the compiler produced it, rather than having been re-ordered for clarity of explanation (hah!)
;;; This is the result of compiling an "empty" Arduino sketch (v 0018)
;;;
;;; void setup() {}
;;; void loop() {}
;;;
;;; The idea is to explain why doing nothing takes 400+ bytes.
Disassembly of section .text:
;;; Table of interrupt vectors
;;; Size: 104 bytes.
;;; Provided by: gcc C compiler
;;; Required by: RESET, Timer.
;;; Comment: in theory, unused interrupt vectors could hold other data.
VectorTable
{
0: 0c 94 34 00 jmp 0x68 ; 0x68 <__ctors_end>
4: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
8: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
10: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
14: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
18: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
1c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
20: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
24: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
28: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
2c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
30: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
34: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
38: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
3c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
40: 0c 94 5c 00 jmp 0xb8 ; 0xb8 <__vector_16>
44: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
48: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
4c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
50: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
54: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
58: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
5c: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
60: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
64: 0c 94 51 00 jmp 0xa2 ; 0xa2 <__bad_interrupt>
__startup:
;;; Basic core startup code;
;;; Length 12 bytes;
;;; Provided by: gcc Compiler
;;; required by: gcc Compiler
68: 11 24 eor r1, r1
6a: 1f be out 0x3f, r1 ; Initialize status reg = 0
6c: cf ef ldi r28, 0xFF
6e: d4 e0 ldi r29, 0x04
70: de bf out 0x3e, r29 ; Initialize Stack Pointer
72: cd bf out 0x3d, r28 ; ...
00000074 <__do_copy_data>:
;;; Copy initialized data from Flash to RAM.
;;; Length 22 bytes
;;; Provided by: gcc compiler
;;; required by: any sketch using initialized data.
74: 11 e0 ldi r17, 0x01 ; 1
76: a0 e0 ldi r26, 0x00 ; 0
78: b1 e0 ldi r27, 0x01 ; 1
7a: e0 ec ldi r30, 0xC0 ; 192
7c: f1 e0 ldi r31, 0x01 ; 1
7e: 02 c0 rjmp .+4 ; 0x84 <.do_copy_data_start>
00000080 <.do_copy_data_loop>:
80: 05 90 lpm r0, Z+
82: 0d 92 st X+, r0
00000084 <.do_copy_data_start>:
84: a0 30 cpi r26, 0x00 ; 0
86: b1 07 cpc r27, r17
88: d9 f7 brne .-10 ; 0x80 <.do_copy_data_loop>
0000008a <__do_clear_bss>:
;;; Clear uninitialized data to 0s.
;;; length 16 bytes
;;; Provided by: gcc compiler
;;; required by: C language specification.
;;; Comments: not necessary if there are no uninitialized variables.
8a: 11 e0 ldi r17, 0x01 ; 1
8c: a0 e0 ldi r26, 0x00 ; 0
8e: b1 e0 ldi r27, 0x01 ; 1
90: 01 c0 rjmp .+2 ; 0x94 <.do_clear_bss_start>
00000092 <.do_clear_bss_loop>:
92: 1d 92 st X+, r1
00000094 <.do_clear_bss_start>:
94: a9 30 cpi r26, 0x09 ; 9
96: b1 07 cpc r27, r17
98: e1 f7 brne .-8 ; 0x92 <.do_clear_bss_loop>
;;; Linkage to main() program
;;; length 8 bytes
;;; provided by: gcc compiler.
9a: 0e 94 55 00 call 0xaa ; 0xaa <main>
9e: 0c 94 de 00 jmp 0x1bc ; 0x1bc <_exit>
000000a2 <__bad_interrupt>:
a2: 0c 94 00 00 jmp 0 ; 0x0 <__vectors>
000000a6 <setup>:
void setup(){
;;; Empty setup() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment
}
a6: 08 95 ret
000000a8 <loop>:
void loop(){
;;; Empty loop() function
;;; length 2 bytes
;;; Provided by: user sketch
;;; required by: Arduino environment
}
a8: 08 95 ret
000000aa <main>:
;;; main()
;;; length: 14 bytes
;;; provided by: Arduino environment
;;; required by: C language convention
int main(void)
{
init();
aa: 0e 94 a4 00 call 0x148 ; 0x148 <init>
setup();
ae: 0e 94 53 00 call 0xa6 ; 0xa6 <setup>
for (;;)
loop();
b2: 0e 94 54 00 call 0xa8 ; 0xa8 <loop>
b6: fd cf rjmp .-6 ; 0xb2 <main+0x8>
;;; Timer0 interrupt service routine
;;; Length: 142 bytes
;;; Provided by: Arduino Environment
;;; Required by: Arduino Environment (millis(), delay(), etc)
;;; Comments: long due to several 32-bit variable modifications.
SIGNAL(TIMER0_OVF_vect)
{
000000b8 <__vector_16>:
b8: 1f 92 push r1
ba: 0f 92 push r0
bc: 0f b6 in r0, 0x3f ; 63
be: 0f 92 push r0
c0: 11 24 eor r1, r1
c2: 2f 93 push r18
c4: 3f 93 push r19
c6: 8f 93 push r24
c8: 9f 93 push r25
ca: af 93 push r26
cc: bf 93 push r27
// copy these to local variables so they can be stored in registers
// (volatile variables must be read from memory on every access)
unsigned long m = timer0_millis;
ce: 80 91 04 01 lds r24, 0x0104
d2: 90 91 05 01 lds r25, 0x0105
d6: a0 91 06 01 lds r26, 0x0106
da: b0 91 07 01 lds r27, 0x0107
unsigned char f = timer0_fract;
de: 30 91 08 01 lds r19, 0x0108
m += MILLIS_INC;
e2: 01 96 adiw r24, 0x01 ; 1
e4: a1 1d adc r26, r1
e6: b1 1d adc r27, r1
f += FRACT_INC;
e8: 23 2f mov r18, r19
ea: 2d 5f subi r18, 0xFD ; 253
if (f >= FRACT_MAX) {
ec: 2d 37 cpi r18, 0x7D ; 125
ee: 20 f0 brcs .+8 ; 0xf8 <__vector_16+0x40>
f -= FRACT_MAX;
f0: 2d 57 subi r18, 0x7D ; 125
m += 1;
f2: 01 96 adiw r24, 0x01 ; 1
f4: a1 1d adc r26, r1
f6: b1 1d adc r27, r1
}
timer0_fract = f;
f8: 20 93 08 01 sts 0x0108, r18
timer0_millis = m;
fc: 80 93 04 01 sts 0x0104, r24
100: 90 93 05 01 sts 0x0105, r25
104: a0 93 06 01 sts 0x0106, r26
108: b0 93 07 01 sts 0x0107, r27
timer0_overflow_count++;
10c: 80 91 00 01 lds r24, 0x0100
110: 90 91 01 01 lds r25, 0x0101
114: a0 91 02 01 lds r26, 0x0102
118: b0 91 03 01 lds r27, 0x0103
11c: 01 96 adiw r24, 0x01 ; 1
11e: a1 1d adc r26, r1
120: b1 1d adc r27, r1
122: 80 93 00 01 sts 0x0100, r24
126: 90 93 01 01 sts 0x0101, r25
12a: a0 93 02 01 sts 0x0102, r26
12e: b0 93 03 01 sts 0x0103, r27
}
132: bf 91 pop r27
134: af 91 pop r26
136: 9f 91 pop r25
138: 8f 91 pop r24
13a: 3f 91 pop r19
13c: 2f 91 pop r18
13e: 0f 90 pop r0
140: 0f be out 0x3f, r0 ; 63
142: 0f 90 pop r0
144: 1f 90 pop r1
146: 18 95 reti
;;; init()
;;; Length: 114 bytes
;;; provided by: Arduino Environment
;;; Required by: Arduino Environment, user sketches
;;; Comments: initializes peripherals (especially timers) as expected by
;;; the ISR and PWM output, and so on.
;;; The compiler seems to do a particularly poor job of optimizing
;;; what ought to be straightforward code.
00000148 <init>:
void init()
{
// this needs to be called before setup() or some functions won't
// work there
sei();
148: 78 94 sei
// on the ATmega168, timer 0 is also used for fast hardware pwm
// (using phase-correct PWM would mean that timer 0 overflowed half as often
// resulting in different millis() behavior on the ATmega8 and ATmega168)
#if !defined(__AVR_ATmega8__)
sbi(TCCR0A, WGM01);
14a: 84 b5 in r24, 0x24 ; 36
14c: 82 60 ori r24, 0x02 ; 2
14e: 84 bd out 0x24, r24 ; 36
sbi(TCCR0A, WGM00);
150: 84 b5 in r24, 0x24 ; 36
152: 81 60 ori r24, 0x01 ; 1
154: 84 bd out 0x24, r24 ; 36
// set timer 0 prescale factor to 64
#if defined(__AVR_ATmega8__)
sbi(TCCR0, CS01);
sbi(TCCR0, CS00);
#else
sbi(TCCR0B, CS01);
156: 85 b5 in r24, 0x25 ; 37
158: 82 60 ori r24, 0x02 ; 2
15a: 85 bd out 0x25, r24 ; 37
sbi(TCCR0B, CS00);
15c: 85 b5 in r24, 0x25 ; 37
15e: 81 60 ori r24, 0x01 ; 1
160: 85 bd out 0x25, r24 ; 37
#endif
// enable timer 0 overflow interrupt
#if defined(__AVR_ATmega8__)
sbi(TIMSK, TOIE0);
#else
sbi(TIMSK0, TOIE0);
162: ee e6 ldi r30, 0x6E ; 110
164: f0 e0 ldi r31, 0x00 ; 0
166: 80 81 ld r24, Z
168: 81 60 ori r24, 0x01 ; 1
16a: 80 83 st Z, r24
// this is better for motors as it ensures an even waveform
// note, however, that fast pwm mode can achieve a frequency of up
// 8 MHz (with a 16 MHz clock) at 50% duty cycle
// set timer 1 prescale factor to 64
sbi(TCCR1B, CS11);
16c: e1 e8 ldi r30, 0x81 ; 129
16e: f0 e0 ldi r31, 0x00 ; 0
170: 80 81 ld r24, Z
172: 82 60 ori r24, 0x02 ; 2
174: 80 83 st Z, r24
sbi(TCCR1B, CS10);
176: 80 81 ld r24, Z
178: 81 60 ori r24, 0x01 ; 1
17a: 80 83 st Z, r24
// put timer 1 in 8-bit phase correct pwm mode
sbi(TCCR1A, WGM10);
17c: e0 e8 ldi r30, 0x80 ; 128
17e: f0 e0 ldi r31, 0x00 ; 0
180: 80 81 ld r24, Z
182: 81 60 ori r24, 0x01 ; 1
184: 80 83 st Z, r24
// set timer 2 prescale factor to 64
#if defined(__AVR_ATmega8__)
sbi(TCCR2, CS22);
#else
sbi(TCCR2B, CS22);
186: e1 eb ldi r30, 0xB1 ; 177
188: f0 e0 ldi r31, 0x00 ; 0
18a: 80 81 ld r24, Z
18c: 84 60 ori r24, 0x04 ; 4
18e: 80 83 st Z, r24
#endif
// configure timer 2 for phase correct pwm (8-bit)
#if defined(__AVR_ATmega8__)
sbi(TCCR2, WGM20);
#else
sbi(TCCR2A, WGM20);
190: e0 eb ldi r30, 0xB0 ; 176
192: f0 e0 ldi r31, 0x00 ; 0
194: 80 81 ld r24, Z
196: 81 60 ori r24, 0x01 ; 1
198: 80 83 st Z, r24
// set a2d prescale factor to 128
// 16 MHz / 128 = 125 KHz, inside the desired 50-200 KHz range.
// XXX: this will not work properly for other clock speeds, and
// this code should use F_CPU to determine the prescale factor.
sbi(ADCSRA, ADPS2);
19a: ea e7 ldi r30, 0x7A ; 122
19c: f0 e0 ldi r31, 0x00 ; 0
19e: 80 81 ld r24, Z
1a0: 84 60 ori r24, 0x04 ; 4
1a2: 80 83 st Z, r24
sbi(ADCSRA, ADPS1);
1a4: 80 81 ld r24, Z
1a6: 82 60 ori r24, 0x02 ; 2
1a8: 80 83 st Z, r24
sbi(ADCSRA, ADPS0);
1aa: 80 81 ld r24, Z
1ac: 81 60 ori r24, 0x01 ; 1
1ae: 80 83 st Z, r24
// enable a2d conversions
sbi(ADCSRA, ADEN);
1b0: 80 81 ld r24, Z
1b2: 80 68 ori r24, 0x80 ; 128
1b4: 80 83 st Z, r24
// here so they can be used as normal digital i/o; they will be
// reconnected in Serial.begin()
#if defined(__AVR_ATmega8__)
UCSRB = 0;
#else
UCSR0B = 0;
1b6: 10 92 c1 00 sts 0x00C1, r1
#endif
1ba: 08 95 ret
;;; exit and __stop_program
;;; length: 4 bytes
;;; provided by: gcc C compiler
;;; required by: nothing.
;;; Comment pretty much unused in the arduino environment.
000001bc <_exit>:
1bc: f8 94 cli
000001be <__stop_program>:
1be: ff cf rjmp .-2 ; 0x1be <__stop_program>