THIS IS A VERY HIGHLY OPINIONATED ARTICLE, ASIDE FROM THE GRAPH AND NUMBERS WHICH SPEAK FOR THEMSELVES, THE REST OF THE CONTENT IS MY PERSONAL TASTE AND BELIEFS. FEEL FREE TO LEAVE A COMMENT AND SHARE YOUR OPINION.
AS MENTIONED BY ONE OF MY INSTAGRAM FOLLOWERS (@EMBEDADDY), SOME OF THESE NUMBERS SPECIFICALLY FOR CUBE HAL CAN BE IMPROVED WITH OPTIMIZED IDE AND COMPILER SETTINGS, SO KEEP THAT IN MIND!
In this highly biased article I will examine a very simple program written using three different programming approaches STM32 Cube HAL, STM32 LL API, and register access code. The program itself consists of configuring three peripherals : GPIO SPI UART and well inadvertently the RCC must always be configured. The peripherals will be configured with default or common settings where warranted for the sake of simplicity, such as 9600 baud rate for the UART among other things and similar default settings for SPI clock phases and such. The main loop will simply toggle the led and send “hello!” through UART and send a byte through SPI. I will examine the code size of the final program with the intention of selling you on the idea to start using LL , there I said it , sue me. But anyways get in losers we’re going coding!!!
The main.c file consist of all 3 program conditionally compiled with #ifdef statements. So basically something like this:
//here i only un-comment one to compile the code I want #define USE_HAL // #define USE_LL // #define USE_REG #ifdef USE_HAL ...all HAL code here gets compiled #endif #ifdef USE_LL ...all LL code here gets compiled #endif #ifdef USE_REG ...all REG code here gets compiled #endif
The code is posted at the very bottom in its entirety. The graph below shows the difference in overall code size of the entire application. You can click on it to enlarge.
Lets crunch some numbers given the graph above.
HAL code is 122% larger than direct register access code..
LL api code is 44% larger than direct register access code.
HAL code is 53% larger than LL api code size
The jump from register access code to HAL is significant and even the jump from LL to HAL is larger than that of register to LL. The images below help understand where the code increase is coming from.
Register vs Cube HAL : 122% increase
The image below is a capture of the Embedded Memory Explorer feature of visualGDB in Visual Studio IDE. The previous compilation before this image was the register version of the application, followed by a compilation of the Cube HAL version. The red text is the size increase from one compilation to the next. In other words the size increase when going form the previous register access code to Cube HAL code. As you can see the largest chunk of code size increase is coming from the CUBES implementation of the RCC driver.
If you want to save yourself 2kb just access the RCC yourself to enable peripherals, its literally 1 line of code to do so. And if you want to learn how to change the core clock via the registers I have tutorials for that too. Click here
I was wondering why there would be a jump in size for the core_cm3.h file, but this is a result of HAL calling functions in that file to access the NVIC and setup the SystickTimer which it uses for its delay function so the memory viewer credits the increase of memory usage to that file since it contains the NVIC and Systick functions.
Also to be noted is the amount of SRAM increase going from register code to CUBE HAL, you will notice on the next image that with LL API there is no SRAM increase from that of register access code.
Register vs LL API : 44% Increase
And here we have the memory usage increase when going from register access to LL API. In this case the largest culprit seems to be the GPIO driver which makes sense because its a structure that gets used extensively in setting up all the pins for the peripherals, that I understand, but HAL's massive RCC module I do not understand . Also note there is no increase in SRAM usage from that of register access code.
At the time of this writing there are few tutorials on how to use STM’s Low Level drivers ( LL ) So I will be coming out with a series on that shortly..which really means eventually. One thing I am noticing is people are confusing the term “Low Level” with register level code. Yes register level code is low level but “Low Level” is the actual name of a HAL offered by STM just as Cube is the name of another HAL offered but STM. In-fact Cube HAL relies heavily on the LL drivers. After all the asserts and error checking inside a Cube function you will find often times it ultimately calls an LL function.
By using the LL drivers on their own you remove an entire layer of abstraction and bloat. (if you appreciate the error checking / full asserts and ease of use?? Then its not bloat)
So why would I, being a strong advocate for register level code, now opt for an abstraction layer? Well I am not opposed to hardware abstraction, most of my tutorials that show register level code is for learning purposes so that you can perhaps design your own abstraction layer with as much bloat as you want or do not want.
Simple peripherals like GPIO, I2C, USART, SPI, DMA, TIMERS, CRC do not necessarily need an abstraction layer because they are simple enough to configure directly at the register level. For learning purposes these peripherals are a great way to get intimate with the hardware.
Once you start using things like RTOS, Ethernet and USB and even CAN you will cry your eyes out trying to manage the hundreds of settings that need to be configured properly to get an error free implementation. At this point an abstraction and/or well written library is pretty much a must and for those applications LL is not enough and CUBE HAL is alright if and when you can get it working.
The clear benefit I see with LL API versus Cube HAL is that LL still requires you to know what your are doing, and that is why this article is biased, because knowing what you are doing is something that is important to ME and should be important to any real engineer or student. Cube HAL on the other hand hides too much and by doing so ends up bloating the code a lot more and thus you have bigger than necessary code size and lower efficiency and at the end of the day you have no clue how it works.
Saving a few kb of code space is not such a huge deal. The tragedy is when you call yourself a programmer or engineer and have no idea what the code is doing and do not know why HAL generated code does not work, debugging is nightmare because you do not know how the hardware is supposed to be configured in the first place so how can you spot a bug?
Should your boss or company ask you to not use Cube HAL and write very specific and efficient code because they have opted for the cheapest MCU with smallest memory space to keep cost down, you wil not know where to start because you have no idea how to get anything going without calling a pre-written init function that some other smart fella/gal wrote.
One thing I like about using LL is that I do not have to dig into the reference manual looking for bit locations. After all the programming I do it is impossible to remember the location of bits or exact register names. LL helps with this since you can just initialize a structure and have it configure the desired peripheral (this is also my preferred method for personal driver development) In conjunction with code completion features as well as naming conventions used in LL API it is easy to find the desired configuration settings.
It is my opinion that the little bit of extra code size added by LL is worth not digging through the reference manual, at least LL does not promote ignorance of the hardware. However Cube HAL does have its uses and targeted users as well.