Efficient Mobile Deep Inference
An ever-increasing number of mobile applications are leveraging deep learning models to provide novel and useful features, such as real-time language translation and object recognition. However, current mobile inference paradigm requires application developers to statically trade-off between inference accuracy and inference speed during development time. As a result, mobile user experience is negatively impact given dynamic inference scenarios and heterogeneous device capacity. The MODI project proposes new research in designing and implementing a mobile-aware deep inference platform that combines innovations in both algorithm and system optimizations.
Acknowledgements: This project is generously supported by NSF Grant #1755659, #1815619, and Google Cloud Research Credits.
Project Personnel
Papers
Memory-Efficient Deep Learning Inference in Trusted Execution Environments
Jean-Baptiste Truong, William Gallagher, Tian Guo, Robert J. Walls
arXiv'21
@inproceedings{,
author = {Truong, Jean-Baptise and Gallagher, William and Guo, Tian and Walls, Robert J.},
title = {Memory-Efficient Deep Learning Inference in Trusted Execution Environments},
year = {2021},
series = {arXiv'21}
}
CINET: Redesigning Deep Neural Networks forEfficient Mobile-Cloud Collaborative Inference
Xin Dai, Xiangnan Kong, Tian Guo, Yixian Huang
SIAM International Conference on Data Mining (SDM'21)
@inproceedings{cinet_sdk21,
author = {Dai, Xin and Kong, Xiangnan and Guo, Tian and Huang, Yixian},
title = {CINET: Redesigning Deep Neural Networks forEfficient Mobile-Cloud Collaborative Inference},
year = {2021},
booktitle = {Proceedings of 2021 SIAM International Conference on Data Mining},
series = {SDM '21}
}
EPNet: Learning to Exit with Flexible Multi-Branch Network
Xin Dai, Xiangnan Kong and Tian Guo
ACM International Conference on Information and Knowledge Management (CIKM'20)
@inproceedings{epnet_cikm2020,
author = {Dai, Xin and Kong, Xiangnan and Guo, Tian},
title = {EPNet: Learning to Exit with Flexible Multi-Branch Network},
year = {2020},
isbn = {9781450368599},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3340531.3411973},
doi = {10.1145/3340531.3411973},
pages = {235–244},
numpages = {10},
keywords = {early exiting, dynamic neural network, efficient neural network},
location = {Virtual Event, Ireland},
series = {CIKM '20}
}
Recurrent Networks for Guided Multi-Attention Classification
Xin Dai, Xiangnan Kong, Tian Guo, John Lee, Xinyue Liu, Constance Moore
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20)
@inproceedings{garn_kdd2020,
title={Recurrent Networks for Guided Multi-Attention Classification},
author={Xin Dai and Xiangnan Kong and Tian Guo and John Lee and Xinyue Liu and Constance Moore},
booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
series = {KDD'20},
year={2020}
}
MDInference: Balancing Inference Accuracy and Latency for Mobile Applications.
Samuel S. Ogden and Tian Guo.
2020 IEEE International Conference on Cloud Engineering (IC2E'20), Invited paper
@article{mdinference_ic2e2020,
title={MDInference: Balancing Inference Accuracy and Latency for Mobile Applications},
author={Samuel S. Ogden and Tian Guo},
journal={2020 IEEE International Conference on Cloud Engineering},
year={2020},
}
Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models.
Matthew LeMay, Shijian Li, Tian Guo.
2020 IEEE International Conference on Cloud Engineering (IC2E'20)
@article{perseus_ic2e2020,
title={Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models},
author={Matthew LeMay, Shijian Li, Tian Guo},
journal={2020 IEEE International Conference on Cloud Engineering},
year={2020},
}
MODI: Mobile Deep Inference Made Efficient by Edge Computing
Samuel S. Ogden, Tian Guo
The USENIX Workshop on Hot Topics in Edge Computing (HotEdge'18)
@inproceedings {modi_hotedge2018,
author = {Samuel S. Ogden and Tian Guo},
title = {{MODI}: Mobile Deep Inference Made Efficient by Edge Computing},
booktitle = {{USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 18)},
year = {2018},
address = {Boston, MA},
url = {https://www.usenix.org/conference/hotedge18/presentation/ogden},
publisher = {{USENIX} Association},
month = jul,
}
Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Tian Guo
2018 IEEE International Conference on Cloud Engineering (IC2E'18)
@article{mdmeasurement_ic2e2018,
title={Cloud-Based or On-Device: An Empirical Study of Mobile Deep Inference},
author={Tian Guo},
journal={2018 IEEE International Conference on Cloud Engineering (IC2E)},
year={2017},
pages={184-190}
}