
PieSlicer: Dynamically Improving Response Time for Cloud-based CNN Inference
12th ACM/SPEC International Conference on Performance Engineering (ICPE'21)
The bottleneck for using cloud-based inference can come down to poor mobile or network performance. PieSlicer improves this performance by dynamically deciding where to preprocess the inference input based on empirical-driven performance models.

CINET: Redesigning Deep Neural Networks for Efficient Mobile-Cloud Collaborative Inference
SIAM International Conference on Data Mining (SDM'21)
We design a collaboration-aware neural network called CiNet by considering the low on-device computation and network transmission cost from the outset. CiNet allows easy and efficient inference computation partition across mobile device and remote server.

GRAD: Learning for Overhead-aware Adaptive Video Streaming with Scalable Video Coding
28th ACM International Conference on Multimedia (ACM MM'20)
We provide a new mechanism for bitrate adaptation algorithms, enabling finer-grained bitrate adjustments to both buffered and incoming video chunks. Our deep reinforcement learning based approach outperforms state-of-the-art, especially under highly-variable network.













Challenges and Opportunities of DNN Model Execution Caching
Third Workshop on Distributed Infrastructures for Deep Learning (DIDL'19)

Update in progress, coming soon... Papers prior 2019 can be found in my Google Scholar.