NVIDIA TX1 Knowledge

手里有一些Jetson nano和Jetson TX2，最近做了一些非常浅显的性能测试，后找到一篇RTAS的论文获取一些洞见。

文章是基于自动驾驶系统，safety-critical real-time systems。

Safety-critical embedded systems are undergoing an evolution towards greater autonomy. This evolution is perhaps best exemplified by current trends in the automotive industry.

DRAM organization: 集成的GPU与CPU是共享内存的。 Integrated GPUs v.s. Discrete GPUs.

作为integrated GPU，有一个特性叫做zero-copy memory: 可以让GPU直接使用host的主存，且无需移动数据，原理就是只传递指针。与unified memory不同，unified memory还是会进行拷贝，只不过在用户看来，GPU显存和内存被统一了。

With zero-copy memory, the CPU and the GPU can access the same memory area, avoiding GPU memory allocations and data copies between CPU and GPU memory.

GPU Scheduling:

Whether GPU co-scheduling should be allowed, i.e., whether different tasks should be able to access the GPU concurrently. 让不同的任务可以同时调用GPU。这一点对于自动驾驶系统非常有意义。此外，30帧的识别速度对于自动驾驶系统还是不够用，尤其是在某些紧急情况下。

正常情况下，只有一个stream，所有的计算（kernel）在FIFO order which determined by the GPU

scheduling。而如果开启了co-scheduling，多个streams就可以同时执行（但一般并不会有什么性能提升）。

Previous work：GPU co-scheduling must be avoided because concurrently executing kernels might adversely interfere with each other.

Jetson TX1 Architecture

Insights:

其实语音系统，也可以说作为自动驾驶的辅助而不借助云端。

调研：

Google Assistant is moving from cloud to local.

Siri等是什么样的解决方案。

Reference

An Evaluation of the NVIDIA TX1 for Supporting Real-time Computer-Vision Workloads. - RTAS(CCF-B)