Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Deep Neural Networks (DNNs) are increasingly used on the edge in mobile and other resource-constrained devices for inference tasks ranging from object detection and image recognition to video processing. Many of these tasks have low latency requirements that cannot be satisfied if they are processed locally due to their high computational complexity. Offloading computation to the edge and cloud offers a way to alleviate this computational latency. Doing so introduces communication delays, which makes offloading a balancing act between the benefits of reduced processing time and the communication delays incurred. Existing algorithms for DNN offloading based on DNN partitioning are optimised for handling successive tasks on a single remote server, and perform sub-optimally when tasks are interleaved, or with multiple servers, or when privacy concerns require local processing of certain DNN layers. A viable alternative are generic computational offloading algorithms (GOAs), which can break down DNN tasks into their components and perform fine-grained offloading. We perform a simulation-based comparison of traditional GOAs with various levels of proactivity and offloading constraints and a naive DNN partitioning approach. We identify key requirements for offloading algorithms in the ability to create processing chains between several remote servers in order to reduce communication overheads, and the ability to prioritise already running tasks. The results confirm the expectations about the shortcomings of DNN partitioning, and show that GOA algorithms can provide a significant performance improvement.
Jamie Cotter, Ignacio Castiñeiras, Donna O’Shea, Victor Cionca, A comparative analysis of proactive and reactive methods for privacy-aware interleaved DNN offloading, Computer Networks, 2023, 109999, https://doi.org/10.1016/j.comnet.2023.109999.