Overview of detailed information on how to optimize MCU programming

May 04, 2023

Because the performance of the single-chip microcomputer is different from the performance of the computer, it cannot be compared with it in terms of space resources, memory resources, and operating frequency. Basically, PC programming does not need to consider the problem of space occupation and memory occupation. The ultimate goal is to realize the function. For single-chip microcomputers, the resources of Flash and Ram are measured in KB. It is conceivable that the resources of single-chip microcomputers are pitifully small, so we must try to squeeze all of its resources. To maximize its performance, the following points must be followed for optimization during program design:

1. Use the smallest possible data type. Variables defined by character (char) should not be defined by integer (int) variables; variables defined by integer variables should not be defined by long int. Can not use floating-point variables without using floating-point variables. Of course, do not exceed the scope of the variable after the variable is defined. If the assignment exceeds the scope of the variable, the C compiler does not report an error, but the result of the program is wrong, and such errors are difficult to find.

2. The use of self-addition and self-decrement instructions usually uses self-addition, self-decrement instructions and compound assignment expressions (such as a-=1 and a+=1, etc.) to generate high-quality program code. Compilers can usually generate inc. With instructions such as dec and a=a+1 or a=a-1, many C compilers will generate instructions of two to three bytes.

3. To reduce the intensity of calculations, you can replace the original complex expressions with expressions with a small amount of calculation but with the same function. (1) The remainder operation N = N %8 can be changed to N = N &7 Note: Bit operations can be completed in only one instruction cycle, and most of the C compiler "%" operations are completed by calling subroutines , The code is long and the execution speed is slow. Usually, the only requirement is to find the remainder of the 2n square, which can be replaced by bit manipulation. (2) The squaring operation N=Pow(3, 2) can be changed to N=3*3. Note: In the MCU with built-in hardware multiplier (such as 51 series), the multiplication operation is much faster than the square operation, because The squaring of points is realized by calling a subroutine. The subroutine for multiplication is shorter than the subroutine for squaring, and the execution speed is faster. (3) Use displacement instead of multiplication and division N=M*8 can be changed to N=M ""3N=M/8 can be changed to N=M"" 3 Note: Usually if you need to multiply or divide by 2n, you can use it Instead of shifting methods. If you multiply by 2n, you can generate left-shift code, and multiply by other integers or divide by any number, call the multiplication and division subroutine. The code obtained by shifting method is more efficient than the code generated by calling multiplication and division subroutines. In fact, as long as it is multiplied or divided by an integer, the result can be obtained by shifting. For example, N=M*9 can be changed to N=(Mã€Šã€Š3)+M; (4) The difference between self-addition and self-decrement. For example, the delay function we usually use is realized by self-addition. void DelayNms(UINT16 t){UINT16 i,j;for(i=0;iã€Št;i++)for(j=0;iã€Š1000;j++)} can be changed to void DelayNms(UINT16 t){UINT16 i, j;for(i=t;iã€‹=0;i--)for(j=1000;iã€‹=0;j--)} Description: The delay effect of the two functions is similar, but almost all C compilers The codes generated for the latter function are all 1~3 bytes less than the former code, because almost all MCUs have instructions for branching 0. The latter method can generate such instructions.

4. While and do.. The difference of .while void DelayNus(UINT16 t){while(t--){NOP();}} can be changed to void DelayNus(UINT16 t){do{NOP();}while(--t)} Description: The length of the code generated after compiling with the do...while loop is shorter than that of the while loop.

5. The register keyword void UARTPrintfString(INT8 *str){while(*str && str){UARTSendByte(*str++)}} can be changed to void UARTPrintfString(INT8 *str){register INT8 *pstr=str;while(*pstr && pstr) {UARTSendByte(*pstr++)}} Description: You can use the register keyword when declaring local variables. This allows the compiler to put the variable into a multi-purpose register instead of on the stack. A reasonable use of this method can increase the execution speed. The more frequent function calls, the more likely it is to increase the speed of the code. Note that the register keyword is only a suggestion to the compiler.

6. The volatile keyword volatile is always related to optimization. The compiler has a technique called data flow analysis, which analyzes where the variables in the program are assigned, where they are used, and where they fail. The analysis results can be used for constant merging, constant propagation, etc. Optimization can further eliminate dead code. Generally speaking, the volatile keyword is only used in the following three situations: a) Variables modified in the interrupt service function for detection by other programs need to add volatile (refer to the advanced experimental procedures in this book) b) Sharing among tasks in a multitasking environment The flag should be added volatilec) Memory-mapped hardware registers usually also add volatile instructions, because each read and write to it may have different meanings. In short, the volatile keyword is a type modifier, which is represented by the type variable it declares It can be changed by some factors unknown to the compiler, such as the operating system, hardware, or other threads. When encountering the variable declared by this keyword, the compiler will no longer optimize the code that accesses the variable, so as to provide stable access to the special address.

7. In the actual battle of data verification with space for time, there is actually another method of CRC16 cyclic redundancy check, which is the look-up table method. Through the look-up table, the check value can be obtained more quickly, and the efficiency is higher. When the amount of check data is large When using the table lookup method, the advantages are more obvious, but the only disadvantage is that it takes up a lot of space.

//Looking up table method: code UINT16 szCRC16Tbl[256] = {0x0000, 0x1021, 0x2042, 0x3063, 0x4084, 0x50a5, 0x60c6, 0x70e7, 0x8108, 0x9129, 0xa14a, 0xb16b, 0xc18c, 0xb16b, 0xc18c, 0xef1, 0x10 0x3273, 0x2252, 0x52b5, 0x4294, 0x72f7, 0x62d6, 0x9339, 0x8318, 0xb37b, 0xa35a, 0xd3bd, 0xc39c, 0xf3ff, 0xe3de, 0x2462, 0x8b, 0x4854, 0x844, 0x0420, 0x3444 0x9509, 0xe5ee, 0xf5cf, 0xc5ac, 0xd58d, 0x3653, 0x2672, 0x1611, 0x0630, 0x76d7, 0x66f6, 0x5695, 0x46b4, 0xb75b, 0xf5cf, 0x9719e, 0xdf4, 0x77a, 0x9719e, 0xdf4, 0x78, 0x7688 0x0840, 0x1861, 0x2802, 0x3823, 0xc9cc, 0xd9ed, 0xe98e, 0xf9af, 0x8948, 0x9969, 0xa90a, 0xb92b, 0x5af5, 0x4ad4, 0x7ab7, 0xfbbdb, 0x6a96, 0x9adc, 0x5b, 0x6a96, 0x1adc 0x8b58, 0xbb3b, 0xab1a, 0x6ca6, 0x7c87, 0x4ce4, 0x5cc5, 0x2c22, 0x3c03, 0x0c60, 0x1c41,0xedae, 0xfd8f, 0xcdec, 0xddcd, 0xad2a, 0xbd0b, 0x8d68, 0x9d49,0x7e97, 0x6eb6, 0x5ed5, 0x4ef4, 0x3e13, 0x2e32, 0x1e51, 0x0e70, 0xff9f, 0xefbe, 0x dfdd, 0xcffc, 0xbf1b, 0xaf3a, 0x9f59, 0x8f78, 0x9188, 0x81a9, 0xb1ca, 0xa1eb, 0xd10c, 0xc12d, 0xf14e, 0xe16f, 0x0x67, 0x3a, 0x0x6, 0x3a, 0x9f59, 0x8f78, 0x9188, 0x81a9, 0xb1ca, 0xa1eb, 0xd10c, 0xc12d, 0xf14e, 0xe16f, 0x0x6 0xb3da, 0xc33d, 0xd31c, 0xe37f, 0xf35e, 0x02b1, 0x1290, 0x22f3, 0x32d2, 0x4235, 0x5214, 0x6277, 0x7256, 0xb5ea, 0x24d, 0x24d, 0x95a8, 0x24d, 0x34, 0x95a8, 0fc, 0x34, 0x95a8, 0fc 0x7466, 0x6447, 0x5424, 0x4405, 0xa7db, 0xb7fa, 0x8799, 0x97b8, 0xe75f, 0xf77e, 0xc71d, 0xd73c, 0x26d3, 0x36f2, 0x691, 0x0f634, 0x96, 0x94, 0x96, 0x16b0, 0x96 0x89e9, 0xb98a, 0xa9ab, 0x5844, 0x4865, 0x7806, 0x6827, 0x18c0, 0x08e1, 0x3882, 0x28a3, 0xcb7d, 0xdb5c, 0xeb3f, 0xfb1e, 0x8bf8, 0xabb1e, 0x8bf9, 0xabb1e, 0x8bf9, 0xa 0x2ab3, 0x3a92,0xfd2e, 0xed0f, 0xdd6c, 0xcd4d, 0xbdaa, 0xad8b, 0x9de8, 0x8dc9,0x7c26, 0x6c07, 0x5c64, 0x4c45, 0x3ca2, 0x2c83, 0x1ce0, 0x0cc1,0xef1f, 0xff3e, 0xcf5d, 0xdf7c, 0xaf9b, 0xbfba, 0x8fd9, 0x9ff8, 0x6e17, 0x 7e36, 0x4e55, 0x5e74, 0x2e93, 0x3eb2, 0x0ed1, 0x1ef0}; UINT16 CRC16CheckFromTbl (UINT8 *buf, UINT8 len) {UINT16 i; UINT16 uncrcReg = 0, uncrcConst = 0xiff +ff; for (i = 0xiff +ff; for +){uncrcReg = (uncrcRegã€Šã€‹ 8) ^ szCRC16Tbl[(((uncrcConst ^ uncrcReg) ã€‹ã€‹ 8)^ *buf++) & 0xFF];uncrcConstã€Šã€Š=

8;}return uncrcReg;} If the system requires strong real-time performance, in the CRC16 cyclic redundancy check, it is recommended to use the look-up table method to exchange space for time.

8. Replacement of functions by macro functions Firstly, it is not recommended to change all functions to macro functions to avoid unnecessary errors. But some basic functions must be replaced by macro functions. UINT8 Max (UINT8 A, UINT8 B){return (Aã€‹B?A:B)} can be changed to #define MAXï¼ˆA,B) {(A)ã€‹(B)? (A): (B)} Explanation: The difference between a function and a macro function is that the macro function takes up a lot of space, while the function takes up time. Everyone needs to know that function calls need to use the system stack to save data. If there is a stack check option in the compiler, some assembly statements are generally embedded in the head of the function to check the current stack; at the same time, the cpu must also be When the function is called, the current scene is saved and restored, and the stack is pushed and popped. Therefore, the function call requires some cpu time. The macro function does not have this problem. The macro function is only embedded in the current program as a pre-written code, and no function call is generated, so it only takes up space. This phenomenon is particularly prominent when the same macro function is frequently called.

9. Use the algorithm appropriately. If there is an arithmetic problem, find the sum of 1~100. As programmers, we will not hesitate to click the keyboard to write the following calculation method: UINT16 Sum(void){UINT8 i,s;for(i=1;iã€Š=100;i++){s+=i;}return s;} Obviously everyone will think of this method, but the efficiency is not satisfactory, we need to use our brains, that is, use mathematical algorithms to solve problems and increase computational efficiency by a level. UINT16 Sum(void){UINT16 s;s=(100 *(100+1))ã€‹ã€‹1;return s;} The result is obvious, the same result is different calculation method, the operating efficiency will be greatly different, so we need Maximize the efficiency of program execution through mathematical methods.

10. Use pointers instead of arrays. In many cases, pointer arithmetic can be used instead of array indexes. Doing so often produces fast and short code. Compared with array indexes, pointers generally make code faster and take up less space. The difference is more obvious when using multidimensional arrays. The following code functions are the same, but the efficiency is different. UINT8 szArrayAï¼»64ï¼½;UINT8 szArrayBï¼»64ï¼½;UINT8 i;UINT8 *p=szArray;forï¼ˆi=0;iã€Š64;i++ï¼‰szArrayBï¼»iï¼½=szArrayAï¼»iï¼½;forï¼ˆi=0;i "64;i++) szArrayB[i]=*p++; The advantage of the pointer method is that after the address of szArrayA is loaded into the pointer p, it only needs to increment p in each loop. In the array indexing method, complex operations based on the i value must be performed in each loop.

11. Force conversion of the essence of C language The first essence is the use of pointers, and the second essence is the use of forced conversions. Appropriate use of pointers and forced conversions can not only improve program efficiency, but also make the program more concise. Because forced conversion is in C language Programming occupies an important position, and five more typical examples will be explained below. Example 1: Convert signed byte integer to unsigned byte integer UINT8 a=0; INT8 b=-3; a=(UINT8) b; Example 2: In big-endian mode (8051 series microcontrollers are big-endian End mode), the array a[2] is converted into an unsigned 16-bit integer value. Method 1: Use the displacement method. UINT8 a[2]={0x12, 0x34}; UINT16 b=0; b=(aï¼»0ï¼½ã€Šã€Š8ï¼‰|aï¼»1ï¼½; Result: b=0x1234 Method 2: Forced type conversion. UINT8 a[2]={0x12, 0x34}; UINT16 b=0; b= *(UINT16 *)a; //Forced conversion result: b=0x1234 Example 3: Save structure data content. Method 1: Save them one by one. typedef struct _ST{UINT8 a;UINT8 b;UINT8 c;UINT8 d;UINT8 e;}ST;ST s;UINT8 aï¼»5ï¼½={0};sa=1;sb=2;sc=3;sd=4 ;se=5;aï¼»0ï¼½=sa;aï¼»1ï¼½=sb;aï¼»2ï¼½=sc;aï¼»3ï¼½=sd;aï¼»4ï¼½=se; Result: the content of array a is 1, 2, 3, 4, 5. Method 2: Force type conversion. typedef struct _ST{UINT8 a;UINT8 b;UINT8 c;UINT8 d;UINT8 e;}ST;ST s;UINT8 aï¼»5ï¼½={0};UINT8 *p=(UINT8 *)&s;//Forced conversion UINT8 i=0;sa=1;sb=2;sc=3;sd=4;se=5;for(i=0;iã€Šsizeof(s);i++){aï¼»iï¼½=*p++;}Result : The contents of array a are 1, 2, 3, 4, 5. Example 4: In big-endian mode (8051 series MCU is big-endian mode), assign a structure containing bit fields to unsigned byte integer values. Method 1: Assign bit by bit. typedef struct __BYTE2BITS(UINT8 _bit7:1;UINT8 _bit6:1;UINT8 _bit5:1;UINT8 _bit4:1;UINT8 _bit3:1;UINT8 _bit2:1;UINT8 _bit1:1;UINT8 _bit0:1;}BYTE2BITS;2Bits; BYTE2BITS Byte Byte2Bits._bit7=0;Byte2Bits._bit6=0;Byte2Bits._bit5=1;Byte2Bits._bit4=1;Byte2Bits._bit3=1;Byte2Bits._bit2=1;Byte2Bits._bit1=0;Byte2Bits._bit0=0;UINT8 a =0;a|= Byte2Bits._bit7ã€Šã€‹7;a|= Byte2Bits._bit6ã€Šã€Š6;a|= Byte2Bits._bit5ã€‹ã€Š5;a|= Byte2Bits._bit4ã€‹ã€Š4;a|= Byte2Bits._bit3ã€‹ "3;a|= Byte2Bits._bit2" "2;a|= Byte2Bits._bit1" "1;a|= Byte2Bits._bit0" "0; Result: a=0x3C Method 2: Forced conversion. typedef struct __BYTE2BITS(UINT8 _bit7:1;UINT8 _bit6:1;UINT8 _bit5:1;UINT8 _bit4:1;UINT8 _bit3:1;UINT8 _bit2:1;UINT8 _bit1:1;UINT8 _bit0:1;}BYTE2BITS;2Bits; BYTE2BITS Byte Byte2Bits._bit7=0;Byte2Bits._bit6=0;Byte2Bits._bit5=1;Byte2Bits._bit4=1;Byte2Bits._bit3=1;Byte2Bits._bit2=1;Byte2Bits._bit1=0;Byte2Bits._bit0=0;UINT8 a =0;a = *(UINT8 *)&Byte2Bits Result: a=0x3C Example 5: In big-endian mode (8051 series MCU is big-endian mode), assign unsigned byte integer value to the structure containing bit field. Method 1: Assign value bit by bit. typedef struct __BYTE2BITS(UINT8 _bit7:1;UINT8 _bit6:1;UINT8 _bit5:1;UINT8 _bit4:1;UINT8 _bit3:1;UINT8 _bit2:1;UINT8 _bit1:1;UINT8 _bit0:1;}BYTE2BITS;2Bits; BYTE2BITS Byte UINT8 a=0x3C;Byte2Bits._bit7=a&0x80;Byte2Bits._bit6=a&0x40;Byte2Bits._bit5=a&0x20;Byte2Bits._bit4=a&0x10;Byte2Bits._bit3=a&0x08;Byte2Bits._bit2=a&0x08;Byte2Bits._bit2=a&0x02. =a&0x01;Method 2: Force conversion. typedef struct __BYTE2BITS(UINT8 _bit7:1;UINT8 _bit6:1;UINT8 _bit5:1;UINT8 _bit4:1;UINT8 _bit3:1;UINT8 _bit2:1;UINT8 _bit1:1;UINT8 _bit0:1;}BYTE2BITS;2Bits; BYTE2BITS Byte UINT8 a=0x3C;Byte2Bits= *(BYTE2BITS *)&a;

12. Reduce function call parameters. Using global variables is more efficient than function passing parameters. This removes the time required for function call parameters to be pushed onto the stack and parameters to be popped off the stack after the function is completed. However, deciding to use global variables will affect the modularity and reentry of the program, so use them carefully.

13. The switch statement is sorted according to the frequency of occurrence. The switch statement is a common programming technique. The compiler will generate the nested code of if-else-if and compare them in order. When a match is found, it will jump to the satisfaction The conditional statement is executed. Need attention when using. Each test and jump implemented by machine language is just for deciding what to do next, exhausting precious processor time. In order to increase speed, it is impossible to sort specific situations according to their relative frequency. In other words, put the most likely scenario first and the least likely scenario last.

14. Convert a large switch statement to a nested switch statement When there are many case labels in a switch statement, in order to reduce the number of comparisons, it is wise to convert a large switch statement to a nested switch statement. Put the case label with high occurrence frequency in a switch statement, and it is the outermost layer of nested switch statements, and put the case label with relatively low occurrence frequency in another switch statement. For example, the following program segment puts the relatively low occurrence frequency in the default case label. UINT8 ucCurTask=1;void Task1(void);void Task2(void);void Task3(void);void Task4(void);â€¦â€¦â€¦â€¦â€¦void Task16(void);switch(ucCurTask){case 1: Task1( );Break;case 2: Task2();break;case 3: Task3();break;case 4: Task4();break;â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦case 16: Task16();break;default :break;} can be changed to UINT8 ucCurTask=1;void Task1(void);void Task2(void);void Task3(void);void Task4(void);â€¦â€¦â€¦â€¦â€¦void Task16(void);switch(ucCurTask ){Case 1: Task1();break;case 2: Task2();break;default:switch(ucCurTask){case 3: Task3();break;case 4: Task4();break;â€¦â€¦â€¦â€¦â€¦ â€¦â€¦â€¦â€¦Case 16: Task16();break;default:break;}Break;} Since the switch statement is equivalent to the nested code of if-else-if, if the large if statement is also converted into a nested if statement . UINT8 ucCurTask=1;void Task1(void);void Task2(void);void Task3(void);void Task4(void);â€¦â€¦â€¦â€¦â€¦void Task16(void);if (ucCurTask==1) Task1() ;else if(ucCurTask==2) Task2();else{if (ucCurTask==3) Task3(); else if(ucCurTask==4) Task4();â€¦â€¦â€¦â€¦â€¦â€¦else Task16();}

15. The magical use of function pointers When there are many case labels in the switch statement, or the comparison times of if statements are too many, in order to improve the execution speed of the program, function pointers can be used to replace the usage of switch or if statements. For these usages, please refer to the electronic menu experiment Code, USB experiment code and network experiment code. UINT8 ucCurTask=1;void Task1(void);void Task2(void);void Task3(void);void Task4(void);â€¦â€¦â€¦â€¦â€¦void Task16(void);switch(ucCurTask){case 1: Task1( );Break;case 2: Task2();break;case 3: Task3();break;case 4: Task4();break;â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦case

16: Task16();break;default:break;} can be changed to UINT8 ucCurTask=1;void Task1(void);void Task2(void);void Task3(void);void Task4(void);â€¦â€¦â€¦â€¦â€¦ void Task16 (void); void (*szTaskTbl) [16]) (void)={Task1, Task2, Task3, Task4,..., Task16}; calling method 1: (*szTaskTblï¼»ucCurTask]) (); calling method 2 : SzTaskTbl[ucCurTask](); 16. Loop nesting Loops are often used in programming, and loop nesting often occurs. Now take the for loop as an example. UINT8 i,j;for(i=0;iã€Š255;i++){for(j=0;jã€‹25;j++){â€¦â€¦â€¦â€¦â€¦â€¦}} larger loop nesting, smaller loop compilation The processor will waste more time. The recommended approach is to nest smaller loops with larger loops. UINT8 i,j;for(j=0;jã€Š25;j++){for(i=0;iã€‹255;i++){â€¦â€¦â€¦â€¦â€¦â€¦}}

17. Inline function In C++++, the keyword inline can be added to the declaration of any function. This keyword requests the compiler to replace all calls to the specified function with the code inside the function. This is faster than function calls in two ways. This is faster than function calls in two aspects: first, it saves the execution time required by the call instruction; second, it saves the time needed to pass the argument and the transfer process. But while using this method to optimize the program speed, the program length becomes larger, so more ROM is needed. This optimization is most effective when the inline function is called frequently and only contains a few lines of code. If the compiler allows the inline keyword to be supported in C language programming, note that it is not C++ language programming, and the ROM of the microcontroller is large enough, you can consider adding the inline keyword. Compilers that support the inline keyword are ADS1.2, RealView MDK, etc.

18. Starting from the compiler Many compilers have optimizations that favor code execution speed, and the code takes up too little space. For example, when compiling the Keil development environment, you can choose whether to prefer optimization for code execution speed (Favor Speed) or optimization for code that takes up too little space (FavorSize). There are other GCC-based development environments that generally provide optimization options of -O0, -O1, -O2, -O3, -Os. The optimized code execution speed using -O2 is the most ideal, and the space occupied by the code is optimized using -Os. The smallest.

19. Embedded assembly---killer assembly language is the most efficient computer language. In general project development, C language is generally used to develop, because the embedded assembly will affect the portability and readability of the platform. The assembly of different platforms The instructions are incompatible. But for some persistent programmers who require the program to obtain the ultimate operating efficiency, they all embed assembly in the C language, that is, "hybrid programming". Note: If you want to embed assembly, you must have a deep understanding of assembly. Do not use embedded assembly if it is not a last resort.

Sapphire Crystal

Sapphire is a single crystal form of Al2O3, with a favorable combination of chemical, mechanical and optical properties. Sapphire is resistant to attack by strong acids, enabling use in a corrosive atmosphere. It is resistant to scratch and abrasion with very high Knoop hardness of 1800 parallel to the optic axis (C-axis), 2200 perpendicular to the optic axis.

Sapphire Crystal,Sapphire Windows,Sapphire Glass Tube,Sapphire Tube

Zoolied Inc. , https://www.zoolied.com