We are looking for an applied deep learning engineer, with an emphasis on accelerating models inference on an embedded device.
- Accelerating DL models to run efficiently over our embedded device. It includes benchmarking for efficient models, implementing/optimizing specific NN layers (from scratch in C++/Cuda).
- Converting DL models.
- As part of the DL team, you will also work on the full development cycle of applied deep learning problems such as object-detection, fine-grained classification, change detection. This includes, research, implementation and deployment to the edge device.
- 4+ years of experience in algorithm development and performance-tuning/optimizations , or equivalent academic background.
- Hands-on experience in accelerating inference of deep learning models.
- Hands-on experience in TensorRT is advantage.
- Excellent coding skills in C/C++.
- Knowledge in python and Cuda are an advantage.
- Expert knowledge of at least one DL framework , such as TensorFlow / PyTorch / MXNet.
- Experience using GPU-accelerated libraries (e.g. cuDNN and cuBLAS).
- Ability to work collaboratively in a team environment and to communicate complex ideas effectively.