![NXP Semiconductors SLN-LOCAL2-IOT User Manual Download Page 46](http://html.mh-extra.com/html/nxp-semiconductors/sln-local2-iot/sln-local2-iot_user-manual_1721901046.webp)
7.1.2 Language-specific voice control engine
7.1.2.1 Specification
The speech recognition engine is based on the state-of-the-art deep neural network technique. Note that the engine is not
intended for natural language understanding, but for the keyword spotting which is useful for various MCU-based applications.
The computing resource consumption is based on fixed-point operations and almost constant. The specification of an inference
engine instance is described in
. Because the Chinese language requires tone recognition, its voice engine requires more
resources than the other languages. The CPU consumption can increase with the number of commands. The rule-of-thumb is 0.08
MIPS per a 4-syllable command.
Table 8. Specification of an inference engine instance
Chinese (with tone recognition)
Other languages
Code size
150 KB
30 KB
Data size
170 KB + 32 x
M
Bytes
155 KB + 32 x
M
Bytes
RAM
85 KB + 128 x
M
Bytes
45 KB + 128 x
M
Bytes
CPU
68 MIPS
*
45 MIPS
*
M
: The number of wake words or commands.
*
: Optimized for the SIMD instructions. The values of 68 and 45 represent typical voice control applications.
7.1.2.2 Architecture
Figure 55. ASR software architecture
NXP Semiconductors
Far-field local voice control framework
SLN-LOCAL2-IOT Developer’s Guide, Rev. 0, 19 April 2021
User's Guide
46 / 87