Introduction: Programming STM32 ARMs under Linux: A Tutorial from Scratch

Nav:

[home] > [elec] > [pdev] > [arm] > [stm32]

Programming STM32 F2, F4 ARMs under Linux: A Tutorial from Scratch

You've got a (64 bit) Linux box running Debian (or similar e.g. Ubuntu) and want to get started with software development for the bare metal STM32 ARM microcontrollers from ST? Welcome.
You want to use the Cortex-M4 devices from the STM32F4 series with floating point (FPU)? Even better! Here's how to do it.

Introduction STM32, STM32F2, STM32F4

The STM32 family of microcontrollers from ST is an attractive family of ARM-based 32 bit microcontrollers due to the following reasons:

Cortex-M3 (STM32F2) or Cortex-M4 (STM32F4) core, the latter with FPU for 32-bit float. The F4 runs up to 168 MHz.
The whole family, especially the F2 and F4 and are pretty much pin-compatible. E.g. you can drop-in replace the 168-pin version of the F207 with the F407 by changing no more than 1 pin (the "reserved for future use" pin).
The family offers similar functionality in several different package sizes (64, 100, 144, 168 pins) and several different memory sizes (up to 1 Mbyte for the F4). Write code once, use often.
Lots of peripherals in the high end versions (F207, F407). (E.g. 17 timers, some up to 32 bit, 3x I²C, 3x SPI,...)

Obtaining the GNU Toolchain Software

GCC version 4.6 and later (maybe also 4.5) support the Cortex-M4 including floating point.

I've got a Debian 64bit (unstable) installation, so forget about precompiled binaries, they seem to be outdated or link to a different set of libraries or are 32bit versions. Luckily, in an awesome effort, some other people have put together a fully automated build script for the GNU toolchain. It's called summon-arm-toolchain. And it really works!

First, install required packages. Some might already be installed. Here's a line copied from the summon-arm-toolchain's README, however I usually locate the packages in aptitude and select them by hand.

bash# apt-get install flex bison libgmp3-dev libmpfr-dev libncurses5-dev libmpc-dev autoconf texinfo build-essential libftdi-dev libusb-1.0-0-dev
bash# apt-get build-dep gcc-4.5

So, here is what I do: Get the summon-arm-toolchain and compile the tools: I'm using the "dev" branch (development) because the "HEAD" branch did not compile successfully and used gcc-4.5 instead of gcc-4.6.

bash# git clone git://github.com/esden/summon-arm-toolchain
bash# cd summon-arm-toolchain
bash# git checkout remotes/origin/dev
bash# ./summon-arm-toolchain PREFIX=/opt/ARM/arm-linaro-eabi-4.6/  OOCD_EN=1 TARGET=arm-none-eabi LIBSTM32_EN=1 USE_LINARO=1 CPUS=2

Side note: I disabled OpenOCD (OOCD_EN=0) and built that separately from the GIT sources because at the time of writing, the STM32F4 support was missing. Only do that if you experience errors with the F4.

The script should complete without error installing binutils, GCC, GDB and several libs.

Note: The PREFIX is the installation directory which must be writable by the user. You can use a folder in you home dir (~/arm). It must be created before executing the commands above. What I do is the following: Create the install dir (as root) somewhere under /opt/ARM/ so that all users will be able to use it. Before compiling, I chown it to my primary user UID and GID. After compiling, I recursively chown -R all the PREFIX back to root.

Obtaining and Setting Up OpenOCD

You only need this chapter if you did not install OpenOCD via the summon-arm-toolchain above.

First, get the GIT tree of OpenOCD and compile it:

bash# git clone git://openocd.git.sourceforge.net/gitroot/openocd/openocd
bash# cd openocd
bash# ./bootstrap
bash# ./configure --prefix=/opt/openocd/ --enable-jlink --enable-amtjtagaccel --enable-ft2232_libftdi --enable-buspirate --enable-stlink --enable-ftdi
bash# make -j 2

For me, this fails at the end while making docu with openocd.info not found. Ignore the error with make -i, it's the last step anyway and I don't need the docu.

The compiled OpenOCD binary is in ./src/openocd. To "install" OpenOCD, all I actually do is set up a tiny shell script called ~/bin/openocd which executes the just-compiled binary:

#!/bin/sh
exec ~/path/to/openocd/src/openocd -s ~/path/to/openocd/tcl/ $*

The -s argument tells OpenOCD where to find the various configuration scripts. Alternatively, you can install it usind make install which also works as a normal user if the prefix dir (/opt/openocd/) is writable by the user.

NOTE: If you get errors during debugging complaining that the remote 'g' packet reply is too long, then you have to apply this patch to the OpenOCD source. The summon-arm-toolchain does this automatically.

Setting up the Hardware: External JTAG Emulator

STM32F4 Discovery connected to JTAG emulator. [42kb]

For testing, I have the STM32F4 Discovery board and the Amontec JTAGKey-2 USB JTAG "emulator". The latter is based upon the FT2232 and you can use one of the cheaper ones based on the same chip as well.

NOTE: Since I want to use the dedicated JTAG adapter, the on-board adapter of the STM32 Discovery has to be disabled by opening the 2 ST-LINK jumpers near the USB port.

Next, the JTAG connections between the FT2232-based emulator and the STM32 Discovery have to be made. You need 7 pins: GND, VDD, PA13 (TMS), PA14 (TCK), PA15 (TDI), PB3 (TDO), NRST. These are all available on the pin headers.

Connect the USB cable to the STM32 Discovery to power it up and also connect the JTAG emulator's USB to the host computer. (I prefer using a cheap USB hub in between for safety...)

The image on the left shows the STM32F4 Discovery board connected to the JTAG emulator.

I recommend setting up the udev daemon to allow user access to the FT2232-based emulator. You can do this by adding a file z70_ftdi_jtag.rules under /etc/udev/rules.d:

# FT2232 in Amontec JTAGkey2.
SUBSYSTEM=="usb", ATTR{idVendor}=="0403", ATTR{idProduct}=="cff8", GROUP="hackers", MODE="0660"

Of course, you need to replace hackers with your actual group name.

Note that you have to use the actual idVentor and idProduct of your JTAG emulator. The ones above are for the Amontec JTAGKey2. You can find out the IDs by calling dmesg after plugging in the emulator or by calling lsusb.

Here are the udev configuration lines for some others JTAG emulators:

# Segger J-Link JTAG.
SUBSYSTEM=="usb", ATTR{idVendor}=="1366", ATTR{idProduct}=="0101", MODE="660", GROUP="hackers"

# ST-Link/V2 as on STM32Discovery board.
SUBSYSTEM=="usb", ATTR{idVendor}=="0483", ATTR{idProduct}=="3748", MODE="660", GROUP="plugdev"

You may need the call following as root to make the rules effective:

bash# udevadm control --reload-rules
bash# udevadm trigger

Setting up the Hardware: STM32F4Discovery with integrated ST-Link/V2

Since September 2012, OpenOCD has integrated support for the ST-Link/V2 in the STM32F4Discovery. So, instead of connecting an external JTAG emulator as above, you can directly program and debug the STM32 via the USB connector on the STM32F4Discovery board.

All you have to do for that is use the following openocd.cfg file. Also, don't forget to close the two jumpers next to the text "ST-LINK" on the STM32F4Discovery board.

# openocd.cfg file for STM32F4Discovery board via integrated ST-Link/V2.
source [find interface/stlink-v2.cfg]
source [find target/stm32f4x_stlink.cfg]
reset_config srst_only srst_nogate

And, of course, you may need to add the ST-Link to the udev rules to grant access to normal users: Create a file z70_stlink_jtag.rules under /etc/udev/rules.d:

# ST-Link/V2 as on STM32Discovery board.
SUBSYSTEM=="usb", ATTR{idVendor}=="0483", ATTR{idProduct}=="3748", MODE="660", GROUP="plugdev"

The Minimalistic Hello-World Program

This is a fully functional minimalistic test program for a STM32. Most of it will work with any Cortex-M3 or M4 because it does not make use of any IOs. Since it does not even blink LEDs, the only way to see whether it executes is via the debugger.

main.c (source code)

stm32.ld (linker script)

// By Wolfgang Wieser, heavily based on:
// http://fun-tech.se/stm32/OlimexBlinky/mini.php

#define STACK_TOP 0x20000800   // just a tiny stack for demo

static void nmi_handler(void);
static void hardfault_handler(void);
int main(void);

// Define the vector table
unsigned int *myvectors[4]
__attribute__ ((section("vectors"))) = {
    (unsigned int *) STACK_TOP,         // stack pointer
    (unsigned int *) main,              // code entry point
    (unsigned int *) nmi_handler,       // NMI handler (not really)
    (unsigned int *) hardfault_handler  // hard fault handler
};


int main(void)
{
    int i=0;

    for(;;)
    {
        i++;
    }
}

void nmi_handler(void)
{
    for(;;);
}

void hardfault_handler(void)
{
    for(;;);
}

/* This will work with STM32 type of microcontrollers.    *
 * The sizes of RAM and flash are specified smaller than  *
 * what most of the STM32 provide to ensure that the demo *
 * program will run on ANY STM32.                         */
MEMORY
{
    ram (rwx) : ORIGIN = 0x20000000, LENGTH = 20K
    rom (rx)  : ORIGIN = 0x00000000, LENGTH = 128K
}

SECTIONS
{
    .  = 0x0;         /* From 0x00000000 */
    .text :
    {
        *(vectors)    /* Vector table */
        *(.text)      /* Program code */
        *(.rodata)    /* Read only data */
    } >rom

    .  = 0x20000000;  /* From 0x20000000 */
    .data :
    {
        *(.data)      /* Data memory */
    } >ram AT > rom

    .bss :
    {
        *(.bss)       /* Zero-filled run time allocate data memory */
    } >ram AT > rom
}

As you can see, the vectors section goes first into the text segment making sure the stack pointer and reset vector as well as the hard fault handlers are set up. This is the same for every Cortex-M. Of course, the memory map on top of the linker script must be adopted to the specific microcontroller in use.

Here is how to compile this. Make sure the PREFIX is in your PATH.

bash# arm-none-eabi-gcc -I. -c -fno-common -O0 -g -mcpu=cortex-m3 -mthumb main.c
bash# arm-none-eabi-ld -Tstm32.ld -nostartfiles -o main.elf main.o
bash# arm-none-eabi-objcopy -Obinary main.elf main.bin

Look! We have just compiled our first STM32 program. And it's sooooo tiny: Just 56 bytes in the main.bin.

The -mcpu=cortex-m3 will work with both M3 and M4. The 3 steps are compiling, linking and extracting the binary flash content from the ELF file. Of course, you will want to put this into a Makefile at some point.

Running the Minimalistic Hello-World Program

We've set up our hardware (the STM32F4 Discovery board) and we've compiled a minimalistic test program. Now, we are going to execute that program on the Cortex-M microcontroller. We will use OpenOCD to write the flash, start and debug the program. The debugger is necessary to see whether the controller actually runs our endless loop incrementing the counter.

First, we need an OpenOCD configuration file called openocd.cfg for our setup:

# Include config files found under /scripts.
source [find interface/jtagkey2.cfg]
source [find target/stm32f2x.cfg]

You can use the stm32f2x.cfg for the F4 as well. This makes up for some warnings and an error in OpenOCD but will work anyway. If you have an F4 and your OpenOCD is new enough, use the stm32f4x.cfg. This was also done in the following transcript.

We now need 2 terminals: One for OpenOCD and one for a telnet session to the running OpenOCD daemon. The red parts indicate what is typed into the terminals. Be sure to change to the directory with the minimalistic test program in both terminals.

OpenOCD session		Telnet session
bash# openocd Open On-Chip Debugger 0.6.0-dev-00497-ga6cf60c (2012-04-07-21:16) Licensed under GNU GPL v2 For bug reports, read http://openocd.sourceforge.net/doc/doxygen/bugs.html Info : only one transport option; autoselect 'jtag' 1000 kHz adapter_nsrst_delay: 100 jtag_ntrst_delay: 100 cortex_m3 reset_config sysresetreq Info : max TCK change to: 30000 kHz Info : clock speed 1000 kHz Info : JTAG tap: stm32f4x.cpu tap/device found: 0x4ba00477 (mfg: 0x23b, part: 0xba00, ver: 0x4) Info : JTAG tap: stm32f4x.bs tap/device found: 0x06413041 (mfg: 0x020, part: 0x6413, ver: 0x0) Info : stm32f4x.cpu: hardware has 6 breakpoints, 4 watchpoints Info : accepting 'telnet' connection from 4444 background polling: on TAP: stm32f4x.cpu (enabled) target state: halted target halted due to undefined, current mode: Thread xPSR: 00000000 pc: 00000000 msp: 00000000		bash# telnet localhost 4444 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Open On-Chip Debugger > poll background polling: on TAP: stm32f4x.cpu (enabled) target state: halted target halted due to undefined, current mode: Thread xPSR: 00000000 pc: 00000000 msp: 00000000
Now, we will flash the just-compiled binary main.bin onto the STM32 and start the program.
Info : JTAG tap: stm32f4x.cpu tap/device found: 0x4ba00477 (mfg: 0x23b, part: 0xba00, ver: 0x4) Info : JTAG tap: stm32f4x.bs tap/device found: 0x06413041 (mfg: 0x020, part: 0x6413, ver: 0x0) target state: halted target halted due to debug-request, current mode: Thread xPSR: 0x01000000 pc: 0x08001944 msp: 0x20020000 Info : stm32f4x errata detected - fixing incorrect MCU_IDCODE Info : device id = 0x10006413 Info : flash size = 1024kbytes Info : stm32f4x errata detected - fixing incorrect MCU_IDCODE Info : device id = 0x10006413 Info : flash size = 1024kbytes flash 'stm32f2x' found at 0x08000000 auto erase enabled wrote 16384 bytes from file main.bin in 0.799573s (20.011 KiB/s) Info : JTAG tap: stm32f4x.cpu tap/device found: 0x4ba00477 (mfg: 0x23b, part: 0xba00, ver: 0x4) Info : JTAG tap: stm32f4x.bs tap/device found: 0x06413041 (mfg: 0x020, part: 0x6413, ver: 0x0) Info : dropped 'telnet' connection		> reset halt JTAG tap: stm32f4x.cpu tap/device found: 0x4ba00477 (mfg: 0x23b, part: 0xba00, ver: 0x4) JTAG tap: stm32f4x.bs tap/device found: 0x06413041 (mfg: 0x020, part: 0x6413, ver: 0x0) target state: halted target halted due to debug-request, current mode: Thread xPSR: 0x01000000 pc: 0x08001944 msp: 0x20020000 > flash probe 0 stm32f4x errata detected - fixing incorrect MCU_IDCODE device id = 0x10006413 flash size = 1024kbytes stm32f4x errata detected - fixing incorrect MCU_IDCODE device id = 0x10006413 flash size = 1024kbytes flash 'stm32f2x' found at 0x08000000 > flash write_image erase main.bin 0x08000000 auto erase enabled wrote 16384 bytes from file main.bin in 0.799573s (20.011 KiB/s) > reset JTAG tap: stm32f4x.cpu tap/device found: 0x4ba00477 (mfg: 0x23b, part: 0xba00, ver: 0x4) JTAG tap: stm32f4x.bs tap/device found: 0x06413041 (mfg: 0x020, part: 0x6413, ver: 0x0) > exit Connection closed by foreign host.
(Note that the path to the main.bin is relative to the directory where we have launched OpenOCD.) That's it. The just-compiled minimalistic hello world program is executed by the microcontroller! To verify this, we continute the session with the debugger:
Info : accepting 'gdb' connection from 3333 Warn : acknowledgment received, but no packet pending Warn : WARNING! The target is already running. All changes GDB did to registers will be discarded! Waiting for target to halt. Info : dropped 'gdb' connection		bash# arm-none-eabi-gdb main.elf GNU gdb (Linaro GDB) 7.3-2011.10 Copyright (C) 2011 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-unknown-linux-gnu --target=arm-none-eabi". For bug reporting instructions, please see: <http://bugs.launchpad.net/gdb-linaro/>... Reading symbols from /home/blahblahblah/hello-world-stm32/main.elf...done. (gdb) target remote :3333 Remote debugging using :3333 0x08001944 in ?? () (gdb) cont Continuing. WARNING! The target is already running. All changes GDB did to registers will be discarded! Waiting for target to halt. ^C Program received signal SIGINT, Interrupt. main () at main.c:26 26 i++; (gdb) print i $1 = 1685559104 (gdb) quit
Yay, it's running and executing the endless loop! Note that for serious debugging, you would set some additional parameters like the number of hardware breakpoints; see below.

If you see an error complaining about remote 'g' packet reply is too long, then you need to manually compile a self-patched version of OpenOCD; see the chapter about OpenOCD above.

You can download all the files used here and a Makefile and the stm32_flash.pl utility described below via the following tarball:

Source:		stm32-minimalistic-hello-world-1.0.tar.gz [2kb gzipped source tarball]
Version:		1.0 (2012-04-08)
Author:		Wolfgang Wieser (report bugs here)
License:		GNU GPL (Version 2)
Requires:		ARM-GCC

Debugging

As you have seen above, debugging the program can be done at any time. If you feel it misbehaves or got trapped, simply call the debugger and pass the ELF file as argument. Then use target remote :3333 to connect to OpenOCD and start your debugging session.

In order to tell the debugger about available hardware breakpoints, I recommend that you use the following commands as the first commands in the session:

set remote hardware-breakpoint-limit 6
set remote hardware-watchpoint-limit 4

The valuesa are valid for the STM32F2 and STM32F4 microcontrollers and you also find them in the OpenOCD logs when connecting to the uC.

Furthermore, for the text console, there is a version of GDB with a text user interface displaying the current code line. It's called gdbtui and comes with the summon-arm-toolchain above.

Automating the Steps

OK, we have seen how things work, but typing all those commands into the telnet and gdb sessions is cumbersome. This is why the minimalistic hello world tarball above also contains a Makefile and the stm32_flash.pl utility.

For flashing, there is a simple way out as found on fun-tech.se: We can write a small perl script to do the work for us:

Put following script called stm32_flash.pl into a directory in your search path. I'm using ~/bin. Furthermore, use the following Makefile in your source directory.

stm32_flash.pl

Makefile

#!/usr/bin/perl
# NOTE: needs libnet-telnet-perl package.
use Net::Telnet;
use Cwd 'abs_path';
 
my $numArgs = $#ARGV + 1;
if($numArgs != 1) {
    die( "Usage ./stm32_flash.pl [main.bin] \n");
}

my $file = abs_path($ARGV[0]);

my $ip = "127.0.0.1";   # localhost
my $port = 4444;

my $telnet = new Net::Telnet (
    Port   => $port,
    Timeout=> 30,
    Errmode=> 'die',
    Prompt => '/>/');

$telnet->open($ip);

print $telnet->cmd('reset halt');
print $telnet->cmd('flash probe 0');
print $telnet->cmd('flash write_image erase '.$file.' 0x08000000');
print $telnet->cmd('reset');
print $telnet->cmd('exit');

print "\n";

TCPREFIX = /opt/ARM/arm-eabi/bin/arm-none-eabi-
CC      = $(TCPREFIX)gcc
LD      = $(TCPREFIX)ld -v
CP      = $(TCPREFIX)objcopy
OD      = $(TCPREFIX)objdump
GDBTUI  = $(TCPREFIX)gdbtui

STM32FLASH = ./stm32_flash.pl

# -mfix-cortex-m3-ldrd should be enabled by default for Cortex M3.
CFLAGS  =  -I. -c -fno-common -O0 -g -mcpu=cortex-m3 -mthumb
LFLAGS  = -Tstm32.ld -nostartfiles
CPFLAGS = -Obinary
ODFLAGS = -S

all: run

clean:
    -rm -f main.lst *.o main.elf main.lst main.bin

run: main.bin
    $(STM32FLASH) main.bin

main.bin: main.elf
    @echo "...copying"
    $(CP) $(CPFLAGS) main.elf main.bin
    $(OD) $(ODFLAGS) main.elf> main.lst

main.elf: main.o stm32.ld
    @echo "..linking"
    $(LD) $(LFLAGS) -o main.elf main.o

main.o: main.c
    @echo ".compiling"
    $(CC) $(CFLAGS) main.c

debug:
    $(GDBTUI) -ex "target remote localhost:3333" \ 
        -ex "set remote hardware-breakpoint-limit 6" \ 
        -ex "set remote hardware-watchpoint-limit 4" main.elf

This way, things are becoming much more fun: Just start OpenOCD in a terminal with the correct openocd.cfg file. It's a good idea to have the terminal somewhere on the screen to be able to watch it but you don't need it otherwise. Then, simply call one of the make targets to have everything done:

make all: Compile and link. This is the default action when calling make without arguments.
make run: Compile and link (if necessary), flash the program onto the microcontroller and execute it.
make debug: Launch a debugger to debug the current program as it is executed on the microcontroller.
make clean: Clean up all generated files. Forces re-compile next time.

After having changed something in the source code, you will find yourself just typing make run most of the times.

(Of course, you can use -mcpu=cortex-m4 if you have an STM32F4.)

Getting more seriously: Hardware Floating Point (FPU)

Ok, so this is where we absolutely need an STM32F4 Cortex-M4 with integrated FPU. When you want to use the FPU, there are a couple of gotchas:

You must enable the FPU first. This is done with the following C code:

SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2));  /* set CP10 and CP11 Full Access */

The FPU only supports 32 bit float and no 64 bit double. You have to make sure every constant stays a float. I recommend writing the "f" suffix consistently (e.g. pi=3.14f;) and also pass the compiler option -fsingle-precision-constant in case you do forget the suffix somewhere.
For math functions, you need to include math.h as usual, but only use the single precision versions with the "f" suffix (sqrtf(), sinf(), cosf(),...)
You need the appropriate compiler flags for hardware floating point. Use: -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 (The last one is not strictly required.)
When using any of the math functions (sqrtf,...) you need to link to the math library which also depends on the C library. These libraries must have been compiled with the same floating point related settings. To do this, use arm-none-eabi-gcc instead of arm-none-eabi-ld for linking and pass -lm -lc at the end of the call after all object files. If the compiler still complains, give it a hint where to find the correct libraries by adding a path -L/tool_prefix/lib/thumb/cortex-m4/float-abi-hard/fpuv4-sp-d16/

Below is a small tarball which can be used in combination with the STM32F4 discovery board and source code from ST. The program endlessly does the following:

Blink all the LEDs.
Compute the Mandelbrot set (the famous "apple man") line-by-line and blink red and yellow LEDs whenever the iterated dot is within the Mandelbrot set (i.e. "the color is black"). It takes about 2 seconds and all you can see is a varying frequency of the red LED.
Compute 1 million square roots (sqrtf) and take the pin for the blue LED high for the duration of the computation of each one sqrtf call. This way you can see with an oscilloscope that a sqrtf takes 20 cycles. The controller is set to 168 MHz max speed, of course.

Note that the tarball contains only my code and makes symlinks into ../STM32F4-Discovery_FW_V1.1.0/ for support routines. You are supposed to unpack the tarball in the same directory as you unpack the stm32f4discovery_fw.zip (version 1.1.0) from ST.

Source:		stm32f4-bit-float_test.tar.gz [6kb gzipped source tarball]
Version:		1.0 (2012-04-08)
Author:		Wolfgang Wieser (report bugs here)
License:		GNU GPL (Version 2)
Requires:		ARM-GCC

If you have the same setup as I do and if you also downloaded the minimalistic hello world tarball with the stm32_flash.pl script, you can type make to compile. Use make run to have the STM32F4 execute the program.

References, Links

The contents on this page was stitched together from lots of web searches and a couple of very useful pages:

Summon ARM toolchain: Completely automated build script for the toolchain. Excellent work!
STM32/ARM Cortex-M3 HOWTO: Development under Ubuntu. Here I found the nifty perl script to program the STM32 and a good step-by-step introduction. But don't use the GCC toolchains mentioned there and don't build it yourself. Instead go for the summon-arm-toolchain. It's more up-to-date and much less effort to set up.
Regarding STM32 GCC linker scripts: GCC linker script and STM32 and Linker script and startup code
STM32 programming: STM32F10x Standard Peripherals Library (German)
SPI and STM32 (German)

[home] [site map] [Impressum] [Datenschutz/privacy policy]

Last modified: 2012-09-22 00:45:40