Efficient Mobile Deep Inference

An ever-increasing number of mobile applications are leveraging deep learning models to provide novel and useful features, such as real-time language translation and object recognition. However, current mobile inference paradigm requires application developers to statically trade-off between inference accuracy and inference speed during development time. As a result, mobile user experience is negatively impact given dynamic inference scenarios and heterogeneous device capacity. The MODI project proposes new research in designing and implementing a mobile-aware deep inference platform that combines innovations in both algorithm and system optimizations.

Acknowledgements: This project is generously supported by NSF Grant #1755659, #1815619, and Google Cloud Research Credits.

Project Personnel

Papers

Memory-Efficient Deep Learning Inference in Trusted Execution Environments

Jean-Baptiste Truong, William Gallagher, Tian Guo, Robert J. Walls

arXiv'21

@inproceedings{,
  author = {Truong, Jean-Baptise and Gallagher, William and Guo, Tian and Walls, Robert J.},
  title = {Memory-Efficient Deep Learning Inference in Trusted Execution Environments},
  year = {2021},
  series = {arXiv'21}
}

CINET: Redesigning Deep Neural Networks forEfficient Mobile-Cloud Collaborative Inference

Xin Dai, Xiangnan Kong, Tian Guo, Yixian Huang

SIAM International Conference on Data Mining (SDM'21)

@inproceedings{cinet_sdk21,
  author = {Dai, Xin and Kong, Xiangnan and Guo, Tian and Huang, Yixian},
  title = {CINET: Redesigning Deep Neural Networks forEfficient Mobile-Cloud Collaborative Inference},
  year = {2021},
  booktitle = {Proceedings of 2021 SIAM International Conference on Data Mining},
  series = {SDM '21}
}

EPNet: Learning to Exit with Flexible Multi-Branch Network

Xin Dai, Xiangnan Kong and Tian Guo

ACM International Conference on Information and Knowledge Management (CIKM'20)

@inproceedings{epnet_cikm2020,
  author = {Dai, Xin and Kong, Xiangnan and Guo, Tian},
  title = {EPNet: Learning to Exit with Flexible Multi-Branch Network},
  year = {2020},
  isbn = {9781450368599},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3340531.3411973},
  doi = {10.1145/3340531.3411973},
  pages = {235–244},
  numpages = {10},
  keywords = {early exiting, dynamic neural network, efficient neural network},
  location = {Virtual Event, Ireland},
  series = {CIKM '20}
}

Recurrent Networks for Guided Multi-Attention Classification

Xin Dai, Xiangnan Kong, Tian Guo, John Lee, Xinyue Liu, Constance Moore

ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'20)

@inproceedings{garn_kdd2020,
  title={Recurrent Networks for Guided Multi-Attention Classification},
  author={Xin Dai and Xiangnan Kong and Tian Guo and John Lee and Xinyue Liu and Constance Moore},
  booktitle = {Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
  series = {KDD'20},
  year={2020}
}

MDInference: Balancing Inference Accuracy and Latency for Mobile Applications.

Samuel S. Ogden and Tian Guo.

2020 IEEE International Conference on Cloud Engineering (IC2E'20), Invited paper

@article{mdinference_ic2e2020,
  title={MDInference: Balancing Inference Accuracy and Latency for Mobile Applications},
  author={Samuel S. Ogden and Tian Guo},
  journal={2020 IEEE International Conference on Cloud Engineering},
  year={2020},
}

Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models.

Matthew LeMay, Shijian Li, Tian Guo.

2020 IEEE International Conference on Cloud Engineering (IC2E'20)

@article{perseus_ic2e2020,
  title={Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models},
  author={Matthew LeMay, Shijian Li, Tian Guo},
  journal={2020 IEEE International Conference on Cloud Engineering},
  year={2020},
}

MODI: Mobile Deep Inference Made Efficient by Edge Computing

Samuel S. Ogden, Tian Guo

The USENIX Workshop on Hot Topics in Edge Computing (HotEdge'18)

@inproceedings {modi_hotedge2018,
  author = {Samuel S. Ogden and Tian Guo},
  title = {{MODI}: Mobile Deep Inference Made Efficient by Edge Computing},
  booktitle = {{USENIX} Workshop on Hot Topics in Edge Computing (HotEdge 18)},
  year = {2018},
  address = {Boston, MA},
  url = {https://www.usenix.org/conference/hotedge18/presentation/ogden},
  publisher = {{USENIX} Association},
  month = jul,
}

Cloud-based or On-device: An Empirical Study of Mobile Deep Inference

Tian Guo

2018 IEEE International Conference on Cloud Engineering (IC2E'18)

@article{mdmeasurement_ic2e2018,
  title={Cloud-Based or On-Device: An Empirical Study of Mobile Deep Inference},
  author={Tian Guo},
  journal={2018 IEEE International Conference on Cloud Engineering (IC2E)},
  year={2017},
  pages={184-190}
}