Figure 2.1.6: Azure  Kinect  1024x1024  K4aviewer results 
visualizing  the  scenes  from  RGB,  Infrared  &  Depth 
cameras corresponding to 720p resolution. 
 
Figure 2.1.7: Azure  Kinect  1024x1024  K4aviewer results 
visualizing  the  Point  Cloud  scene  corresponding  to  720p 
resolution. 
The  figures from  2.1.4  to  2.1.7  are representing 
the  RGB,  Infrared,  Depth  and  Point  Cloud  results 
obtained from Azure Kinect DK correspondingly. 
2.2  NVIDIA Jetson Tx2 
NVIDIA  Jetson  Tx2  designed  explicitly  for  High 
performance  AI  on-the-edge  applications,  housing 
both  GPU  and  CPU  on  the  same  System-on-Chip 
(SoC)  achieving  faster  processing  supporting  the 
emerging deep neural networks. (NVIDIA, 2021). 
Jetson Tx2 leverages Pascal GPU architecture to 
increase performance upon Streaming Multiprocessor 
(SM).  The  Graphics  Processing  Cluster  (GPC) 
includes multiple SM units and a  Raster Engine  for 
computing, rasterization, shading and texturing. 
Jetson Tx2 runs more efficiently between 5 watts 
at max efficiency and 15 watts at max performance. 
(NVIDIA,  2021).  It  has  better  power  efficiency 
making it ideal for AI on-the-edge applications such 
as  autonomous  vehicles,  drones,  virtual  reality  and 
mixed reality applications. 
The  figure 2.2.1  corresponds to  NVIDIA  Jetson 
Tx2 connected with the Azure Kinect DK leveraging 
the GPU parallel computing capabilities on the data 
obtained from the camera. 
 
 
Figure 2.2.1: Edge AI Setup Leveraging Azure Kinect DK 
Accelerated with NVIDIA Jetson Tx2. 
2.3  Emergence of AI-on-the-Edge 
With the rise of Industry - 5.0 we are at the cusp of 
technology  explosion  driving  innovation. 
Convergence  of  various  technologies  such  as  Big 
Data,  Artificial  Intelligence  and  Deep  Learning 
incorporates AI on-the-edge applications. 
With these rapidly evolving technologies such as 
Intel Movidius VPUs, NVIDIA GPUs, Intel Nervana 
Neural  Network  Processors  (NNP),  Google  Tensor 
Processing Units (TPUs), Intel FPGA, Xilinx FPGA 
etc., the deep learning algorithms are emerging with 
advanced architectures leading to better performance, 
reducing  latency  and  achieving  higher  throughput 
exploring  unlimited  possibilities  leveraging 
Disruptive Technologies. 
Leveraging the state of art technologies with deep 
learning  architectures  opens  New  Product  Market 
providing  immersive  Business  solutions.  Exploring 
various approaches towards computer vision leads us 
to  CNN,  DNN,  CuDNN,  Mask  R-CNN,  Mesh  R-
CNN,  LSTM,  GoogLeNet,  ResNet,  SegNet  and 
YOLO approaches which are trained on 2D data.  
With the focus on Industry - 5.0 (Atwell, 2017) 
real-time  applications,  point-cloud  based  3D  deep 
learning is gaining pace rapidly as it involves an end-
to-end  deep  learning  network  acquiring  features 
directly from the point clouds. 
3D  object  recognition,  3D  object  segmentation 
and  point-wise  semantic  segmentation  tasks  are 
crucial component for applications which are tightly 
constrained  by  hardware  resources  and  battery. 
Therefore, it is important to design efficient and fast 
3D deep learning models for real-time applications on 
the  edge  such  as  virtual  reality,  mixed  reality  and 
autonomous driving.