The specific NPU doesn't seem to be mentioned in TFA, but my guess is that the blessed way to deal with it is the Neon SDK: https://www.arm.com/technologies/neon
I've not found Neon to be fun or easy to use, and I frequently see devices ignoring the NPU and inferring on CPU because it's easier. Maybe you get lucky and someone has made a backend for something specific you want, but it's not common.
TFA does directly mention the NPU "Arm-China Zhouyi: 30 TOPS (Dedicated)"
"you cannot simply use standard versions of PyTorch or TensorFlow out of the box. You must use the NeuralONE AI SDK."
Neon is a SIMD instruction set for the CPU, not a separate accelerator. It doesn't need an SDK to use, it's supported by compiler intrinsics and assembly language in any modern ARM compiler.
I've not found Neon to be fun or easy to use, and I frequently see devices ignoring the NPU and inferring on CPU because it's easier. Maybe you get lucky and someone has made a backend for something specific you want, but it's not common.