手里有一些Jetson nano和Jetson TX2,最近做了一些非常浅显的性能测试,后找到一篇RTAS的论文获取一些洞见。

文章是基于自动驾驶系统,safety-critical real-time systems。

Safety-critical embedded systems are undergoing an evolution towards greater autonomy. This evolution is perhaps best exemplified by current trends in the automotive industry.

DRAM organization: 集成的GPU与CPU是共享内存的。 Integrated GPUs v.s. Discrete GPUs.

作为integrated GPU,有一个特性叫做zero-copy memory: 可以让GPU直接使用host的主存,且无需移动数据,原理就是只传递指针。与unified memory不同,unified memory还是会进行拷贝,只不过在用户看来,GPU显存和内存被统一了。

With zero-copy memory, the CPU and the GPU can access the same memory area, avoiding GPU memory allocations and data copies between CPU and GPU memory.

GPU Scheduling:

Whether GPU co-scheduling should be allowed, i.e., whether different tasks should be able to access the GPU concurrently. 让不同的任务可以同时调用GPU。这一点对于自动驾驶系统非常有意义。此外,30帧的识别速度对于自动驾驶系统还是不够用,尤其是在某些紧急情况下。

正常情况下,只有一个stream,所有的计算(kernel)在FIFO order which determined by the GPU


Previous work:GPU co-scheduling must be avoided because concurrently executing kernels might adversely interfere with each other.

Jetson TX1 Architecture




Google Assistant is moving from cloud to local.



An Evaluation of the NVIDIA TX1 for Supporting Real-time Computer-Vision Workloads. - RTAS(CCF-B)