Photo of Tian Guo

 Tian Guo

Assistant Professor
Computer Science Department
Worcester Polytechnic Institute
100 Institute Road
Worcester, MA 01609
Office: Fuller Labs 138

tian@wpi.edu

(508)831-6860

github.com/belindanju


Publications [Google Scholar Profile] [Bibtex Entries]

MODI: Mobile Deep Inference Made Efficient by Edge Computing
Samuel S. Ogden, Tian Guo
The USENIX Workshop on Hot Topics in Edge Computing (HotEdge '18)
abstract

In this paper, we propose a novel mobile deep inference platform, MODI, that delivers good inference performance. MODI improves deep learning powered mobile applications performance with optimizations in three complementary aspects. First, MODI provides a number of models and dynamically selects the best one during runtime. Second, MODI extends the set of models each mobile application can use by storing high quality models at the edge servers. Third, MODI manages a centralized model repository and periodically updates models at edge locations ensuring up-to-date models for mobile applications without incurring high network latency. Our evaluation demonstrate the feasibility of trading off inference accuracy for improved inference speed, as well as the acceptable performance of edge-based inference.

Cloud-based or On-device: An Empirical Study of Mobile Deep Inference
Tian Guo
2018 IEEE International Conference on Cloud Engineering (IC2E'18)
abstract

Modern mobile applications benefit significantly from the advancement in deep learning, e.g., implementing real-time image recognition and conversational system. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of computation complexity and size constrained, these trained models are often hosted in the cloud. When utilizing these cloud-based models, mobile apps will have to send input dat over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it also restricts the use case scenarios, e.g. mobile apps need to have access to network. With mobile specific deep learning optimizations, it is now possible to employ on-device inference. However, because mobile hardware, e.g. GPU and memory size, can be very limited when compared to desktop counterpart, it is important to understand the feasibility of this new on-device deep learning inference architecture. In this paper, we empirically evaluate the inference efficiency of three Convolutional Neural Networks using a benchmark Android application we developed. Our measurement and analysis suggest that on-device inference can cost up to two orders of magnitude response time and energy when compared to cloud-based inference, and loading model and computing probability are two performance bottlenecks for on- device deep inferences.

Paper Slides
Providing Geo-Elasticity in Geographically Distributed Clouds
Tian Guo, and Prashant Shenoy
ACM Transactions on Internet Technology (TOIT'17)
abstract

Geographically distributed cloud platforms are well suited for serving a geographically diverse user base. However traditional cloud provisioning mechanisms that make local scaling decisions are not adequate for delivering best possible performance for modern web applications that observe both temporal and spatial workload fluctuations. In this paper, we propose GeoScale, a system that provides geo-elasticity by combining model-driven proactive and agile reactive provisioning approaches. GeoScale can dynamically provision server capacity at any location based on workload dynamics. We conduct a detailed evaluation of GeoScale on Amazon’s geo-distributed cloud, and show up to 40% improvement in the 95th percentile response time when compared to traditional elasticity techniques.

Paper
On the Feasibility of Cloud-Based SDN Controllers for Residential Networks
Curtis R. Taylor, Tian Guo, Craig A. Shue, and Mohamed E. Najd
2017 IEEE Conference on Network Function Virtualization and Software Defined Networks (NFV-SDN'17)
abstract

Residential networks are home to increasingly diverse devices, including embedded devices that are part of the Internet of Things phenomenon, leading to new management and security challenges. However, current residential solutions that rely on customer premises equipment (CPE), which often remains deployed in homes for years without updates or maintenance, are not evolving to keep up with these emerging demands. Recently, researchers have proposed to outsource the tasks of managing and securing residential networks to cloud-based security services by leveraging software-defined networking (SDN). However, the use of cloud-based infrastructure may have performance implications. In this paper, we measure the performance impact and perception of a residential SDN using a cloud-based controller through two measurement studies. First, we recruit 270 residential users located across the United States to measure residential latency to cloud providers. Our measurements suggest the cloud controller architecture provides 90% of end-users with acceptable performance with judiciously selected public cloud locations. When evaluating web page loading times of popular domains, which are particularly latency-sensitive, we found an increase of a few seconds at the median. However, optimizations could reduce this overhead for top websites in practice.

Paper
Towards Efficient Deep Inference for Mobile Applications
Tian Guo
arXiv:1707.04610
abstract

Mobile applications are benefiting significantly from the advancement in deep learning, e.g. providing new features. Given a trained deep learning model, applications usually need to perform a series of matrix operations based on the input data, in order to infer possible output values. Because of model computation complexity and increased model sizes, those trained models are usually hosted in the cloud. When mobile apps need to utilize those models, they will have to send input data over the network. While cloud-based deep learning can provide reasonable response time for mobile apps, it also restricts the use case scenarios, e.g. mobile apps need to have access to network. With mobile specific deep learning optimizations, it is now possible to employ device-based inference. However, because mobile hardware, e.g. GPU and memory size, can be very different and limited when compared to desktop counterpart, it is important to understand the feasibility of this new device-based deep learning inference architecture. In this paper, we empirically evaluate the inference efficiency of three Convolutional Neural Networks using a benchmark Android application we developed. Based on our application-driven analysis, we have identified several performance bottlenecks for mobile applications powered by on-device deep learning inference.

Paper
Performance and Cost Considerations for Providing Geo-Elasticity in Database Clouds
Tian Guo, Prashant Shenoy
Transactions on Autonomous and Adaptive Systems (TAAS'17)
abstract

Online applications that serve global workload have become a norm and those applications are experiencing not only temporal but also spatial workload variations. In addition, more applications are hosting their backend tiers separately for benefits such as ease of management. To provision for such applications, traditional elasticity approaches that only consider temporal workload dynamics and assume well-provisioned backends are insufficient. Instead, in this paper, we propose a new type of provisioning mechanisms---geo-elasticity, by utilizing distributed clouds with different locations. Centered this idea, we build a system called DBScale that tracks geographic variations in the workload to dynamically provision database replicas at different cloud locations across the globe. Our geo-elastic provisioning approach comprises a regression-based model that infers database query workload from spatially distributed front-end workload, a two-node open queueing network model that estimates the capacity of databases serving both CPU and I/O-intensive query workloads, and greedy algorithms for selecting best cloud locations based on latency and cost. We implement a prototype of our DBScale system on Amazon EC2’s distributed cloud. Our experiments with our prototype show up to a 66% improvement in response time when compared to local elasticity approaches.

Paper
Latency-aware Virtual Desktops Optimization in Distributed Clouds
Tian Guo, Prashant Shenoy, K. K. Ramakrishnan, Vijay Gopalakrishnan
Multimedia Systems (MMSJ'17)
abstract

Distributed clouds offer a choice of data center locations for providers to host their applications. In this paper we consider distributed clouds that host virtual desktops which are then accessed by users through remote desktop protocols. Virtual desktops have different levels of latency-sensitivity, primarily determined by the actual applications running and affected by the end users’ locations. In the scenario of mobile users, even switching between 3G and WiFi networks affects the latency sensitivity. We design VMShadow, a system to automatically optimize the location and performance of latency-sensitive VMs in the cloud. VMShadow performs black-box fingerprinting of a VM’s network traffic to infer the latency-sensitivity and employs both ILP and greedy heuristic based algorithms to move highly latency-sensitive VMs to cloud sites that are closer to their end users. VMShadow employs a WAN-based live migration and a new network connection migration protocol to ensure that the VM migration and subsequent changes to the VM’s network address are transparent to end-users. We implement a prototype of VMShadow in a nested hypervisor and demonstrate its effectiveness for optimizing the performance of VM-based desktops in the cloud. Our experiments on a private as well as the public EC2 cloud show that VMShadow is able to discriminate between latency-sensitive and insensitive desktop VMs and judiciously moves only those that will benefit the most from the migration. For desktop VMs with video activity, VMShadow improves VNC’s refresh rate by 90% by migrating virtual desktop to the closer location. Transcontinental remote desktop migrations only take about 4 minutes and our connection migration proxy imposes 13µs overhead per packet.

Paper
Managing Risk in a Derivative IaaS Cloud
Prateek Sharma, Stephen Lee, Tian Guo, David Irwin, and Prashant Shenoy
IEEE Transactions on Parallel and Distributed Systems (TPDS'17)
abstract

Infrastructure-as-a-Service (IaaS) cloud platforms rent computing resources with different cost and availability tradeoffs. For example, users may acquire virtual machines (VMs) in the spot market that are cheap, but can be unilaterally terminated by the cloud operator. Because of this revocation risk, spot servers have been conventionally used for delay and risk tolerant batch jobs. In this paper, we develop risk mitigation policies which allow even interactive applications to run on spot servers.

Our System, SpotCheck is a derivative cloud platform, and provides the illusion of an IaaS platform that offers always-available VMs on demand for a cost near that of spot servers, and supports unmodified applications. SpotCheck’s design combines virtualization-based mechanisms for fault-tolerance, and bidding and server selection policies for managing the risk and cost. We implement SpotCheck on EC2 and show that it i) provides nested VMs with 99.9989% availability, ii) achieves nearly 5× cost savings compared to using on-demand VMs, and iii) eliminates any risk of losing VM state.

Paper
Placement Strategies for Virtualized Network Functions in a NFaaS Cloud
Xin He, Tian Guo, Erich Nahum and Prashant Shenoy
Fourth IEEE Workshop on Hot Topics in Web Systems and Technologies (HotWeb'16)
abstract

Enterprises that host services in the cloud need to protect their cloud resources using network services such as firewalls and deep packet inspection systems. While middleboxes have typically been used to implement such network functions in traditional enterprise networks, their use in cloud environments by cloud tenants is problematic due to the boundary between cloud providers and cloud tenants. Instead we argue that network function virtualization is a natural fit in cloud environments, where the cloud provider can implement Network Functions as a Service using virtualized network functions running on cloud servers, and enterprise cloud tenants can employ these services to implement security and performance optimizations for their cloud resources. In this paper, we focus on placement issues in the design of a NFaaS cloud and present two placement strategies---tenant-centric and service-centric---for deploying virtualized network services in multi-tenant settings. We discuss several trade-offs of these two strategies. We implement a prototype NFaaS testbed and conduct a series of experiments to show to quantify the benefits and drawbacks of our two strategies. Our results suggest that the tenant-centric placement provides lower latencies while service-centric approach is more flexible for reconfiguration and capacity scaling.

Paper
Elastic Resource Management in Distributed Clouds
Tian Guo
Ph.D. thesis, University of Massachusetts Amherst.
abstract

The ubiquitous nature of computing devices and their increasing reliance on remote resources have driven and shaped public cloud platforms into unprecedented large-scale, distributed data centers. Concurrently, a plethora of cloud-based applications are experiencing multi-dimensional workload dynamics—workload volumes that vary along both time and space axes and with higher frequency. The interplay of diverse workload characteristics and distributed clouds raises several key challenges for efficiently and dynamically managing server resources. First, current cloud platforms impose certain restrictions that might hinder some resource management tasks. Second, an application-agnostic approach might not entail appropriate performance goals, therefore, requires numerous specific methods. Third, provisioning resources outside LAN boundary might incur huge delay which would impact the desired agility. In this dissertation, I investigate the above challenges and present the design of automated systems that manage resources for various applications in distributed clouds. The intermediate goal of these automated systems is to fully exploit potential benefits such as reduced network latency offered by increasingly distributed server resources. The ultimate goal is to improve end-to-end user response time with novel resource management approaches, within a certain cost budget. Centered around these two goals, I first investigate how to optimize the location and performance of virtual machines in distributed clouds. I use virtual desktops, mostly serving a single user, as an example use case for developing a black-box approach that ranks virtual machines based on their dynamic latency requirements. Those with high latency sensitivities have a higher priority of being placed or migrated to a cloud location closest to their users. Next, I relax the assumption of well-provisioned virtual machines and look at how to provision enough resources for applications that exhibit both temporal and spatial workload fluctuations. I propose an application-agnostic queueing model that captures the resource utilization and server response time. Building upon this model, I present a geo-elastic provisioning approach—referred as geo-elasticity—for replicable multi-tier applications that can spin up an appropriate amount of server resources in any cloud locations. Last, I explore the benefits of providing geo-elasticity for database clouds, a popular platform for hosting application backends. Performing geo-elastic provisioning for backend database servers entails several challenges that are specific to database workload, and therefore requires tailored solutions. In addition, cloud platforms offer resources at various prices for different locations. Towards this end, I propose a cost-aware geo-elasticity that combines a regression-based workload model and a queueing network capacity model for database clouds. In summary, hosting a diverse set of applications in an increasingly distributed cloud makes it interesting and necessary to develop new, efficient and dynamic resource management approaches.

Paper
Flint: Batch-Interactive Data-Intensive Processing on Transient Servers
Prateek Sharma, Tian Guo, Xin He, David Irwin, Prashant Shenoy
Procceedings of the Eleventh European Conference on Computer Systems (EuroSys'16)
abstract

Cloud providers now offer transient servers, which they may revoke at anytime, for significantly lower prices than on-demand servers, which they cannot revoke. Transient servers’ low price is particularly attractive for executing an emerging class of workload, which we call Batch-Interactive Data-Intensive (BIDI), that is becoming increasingly impor- tant for data analytics. BIDI workloads require large sets of servers to cache massive datasets in memory to enable low latency operation. In this paper, we illustrate the challenges of executing BIDI workloads on transient servers, where re- vocations (akin to failures) are the common case. To address these challenges, we design Flint, which is based on Spark and includes automated checkpointing and server selection policies that i) support batch and interactive applications and ii) dynamically adapt to application characteristics. We evaluate a prototype of Flint using EC2 spot instances, and show that it yields cost savings of up to 90% compared to using on-demand servers, while increasing running time by < 2%.

Paper
GeoScale: Providing Geo-Elasticity in Distributed Clouds
Tian Guo, Prashant Shenoy, Hakan Hacigumus
Proceedings of 2016 IEEE International Conference on Cloud Engineering (IC2E'16)
abstract

Distributed cloud platforms are well suited for serving a geographically diverse user base. However traditional cloud provisioning mechanisms that make local scaling decisions are not well suited for temporal and spatial workload fluctuations seen by modern web applications. In this paper, we argue the need of geo-elasticity and present GeoScale, a system to provide geo-elasticity in distributed clouds. We describe GeoScale’s model-driven proactive provisioning ap- proach and conduct an initial evaluation of GeoScale on Amazon’s distributed EC2 cloud. Our results show up to 31% improvement in the 95th percentile response time when compared to traditional elasticity techniques.

Paper
Analyzing the Efficiency of a Green University Data Center
Patrick Pegus II, Benoy Varghese, Tian Guo, David Irwin, Prashant Shenoy, Anirban Mahanti, James Culbert, John Goodhue, Chris Hill
Proceedings of 2016 ACM International Conference on Performance Engineering (ICPE'16)
abstract

Data centers are an indispensable part of today’s IT infrastructure. To keep pace with modern computing needs, data centers continue to grow in scale and consume increasing amounts of power. While prior work on data centers has led to significant improvements in their energy-efficiency, detailed measurements from these facilities’ operations are not widely available, as data center design is often considered part of a company’s competitive advantage. However, such detailed measurements are critical to the research community in motivating and evaluating new energy-efficiency optimizations. In this paper, we present a detailed analysis of a state-of-the-art 15MW green multi-tenant data center that incorporates many of the technological advances used in commercial data centers. We analyze the data center’s computing load and its impact on power, water, and carbon usage using standard effectiveness metrics, including PUE, WUE, and CUE. Our results reveal the benefits of optimizations, such as free cooling, and provide insights into how the various effectiveness metrics change with the seasons and increasing capacity usage. More broadly, our PUE, WUE, and CUE analysis validate the green design of this LEED Platinum data center.

Paper
SpotOn: A Batch Computing Service for the Spot Market
Supreeth Subramanya, Tian Guo, Prateek Sharma, David Irwin, and Prashant Shenoy
Proceedings of the 6th Annual Symposium on Cloud Computing (SoCC'15)
abstract

Cloud spot markets enable users to bid for compute resources, such that the cloud platform may revoke them if the market price rises too high. Due to their increased risk, revocable resources in the spot market are often significantly cheaper (by as much as 10X) than the equivalent non-revocable on-demand resources. One way to mitigate spot market risk is to use various fault-tolerance mechanisms, such as checkpointing or replication, to limit the work lost on revocation. However, the additional performance overhead and cost for a particular fault-tolerance mechanism is a complex function of both an application’s resource usage and the magnitude and volatility of spot market prices.

We present the design of a batch computing service for the spot market, called SpotOn, that automatically selects a spot market and fault-tolerance mechanism to mitigate the impact of spot revocations without requiring application modification. SpotOn’s goal is to execute jobs with the performance of on-demand resources, but at a cost near that of the spot market. We implement and evaluate SpotOn in simulation and using a prototype on Amazon’s EC2 that packages jobs in Linux Containers. Our simulation results using a job trace from a Google cluster indicate that SpotOn lowers costs by 91.9% compared to using on-demand resources with little impact on performance.

Paper Slides Poster
Model-driven Geo-Elasticity In Database Clouds
Tian Guo and Prashant Shenoy
International Conference on Autonomic Computing and Communications (ICAC'15)
abstract

Motivated by the emergence of distributed clouds, we argue for the need for geo-elastic provisioning of application replicas to effectively handle temporal and spatial workload fluctuations seen by such applications. We present DBScale, a system that tracks geographic variations in the workload to dynamically provision database replicas at different cloud locations across the globe. Our geo-elastic provisioning approach comprises a regression-based model to infer the database query workload from observations of the spatially distributed frontend workload and a two-node open queueing network model to provision databases with both CPU and I/O-intensive query workloads. We implement a prototype of our DBScale system on Amazon EC2’s distributed cloud. Our experiments with our prototype show up to a 66% improvement in response time when compared to local elasticity approaches.

Paper Slides
SpotCheck: Designing a Derivative IaaS Cloud on the Spot Market
Prateek Sharma, Stephen Lee, Tian Guo, David Irwin, and Prashant Shenoy
Procceedings of the Tenth European Conference on Computer Systems (EuroSys'15)
abstract

Infrastructure-as-a-Service (IaaS) cloud platforms rent resources, in the form of virtual machines (VMs), under a variety of contract terms that offer different levels of risk and cost. For example, users may acquire VMs in the spot market that are often cheap but entail significant risk, since their price varies over time based on market supply and demand and they may terminate at any time if the price rises too high. Currently, users must manage all the risks associated with using spot servers. As a result, conventional wisdom holds that spot servers are only appropriate for delay-tolerant batch applications. In this paper, we propose a derivative cloud platform, called SpotCheck, that transparently manages the risks associated with using spot servers for users.

SpotCheck provides the illusion of an IaaS platform that offers always-available VMs on demand for a cost near that of spot servers, and supports all types of applications, including interactive ones. SpotCheck’s design combines the use of nested VMs with live bounded-time migration and novel server pool management policies to maximize availability, while balancing risk and cost. We implement SpotCheck on Amazon’s EC2 and show that it i) provides nested VMs to users that are 99.9989% available, ii) achieves nearly 5X cost savings compared to using equivalent types of on-demand VMs, and iii) eliminates any risk of losing VM state.

Paper Slides Poster
Cost-Aware Cloud Bursting for Enterprise Applications
Tian Guo, Upendra Sharma, Prashant Shenoy, Timothy Wood, and Sambit Sahu
ACM Transactions on Internet Technology, Volume 13 Issue 3, May 2014, Article No. 10 (TOIT'14)
abstract

The high cost of provisioning resources to meet peak application demands has led to the widespread adoption of pay-as-you-go cloud computing services to handle workload fluctuations. Some enterprises with existing IT infrastructure employ a hybrid cloud model where the enterprise uses its own private resources for the majority of its computing, but then “bursts” into the cloud when local resources are insufficient. However, current commercial tools rely heavily on the system administrator’s knowledge to answer key questions such as when a cloud burst is needed and which applications must be moved to the cloud. In this paper we describe Seagull, a system designed to facilitate cloud bursting by determining which applications should be transitioned into the cloud and automating the movement process at the proper time. Seagull optimizes the bursting of applications using an optimization algorithm as well as a more efficient but approximate greedy heuristic. Seagull also optimizes the overhead of deploying applications into the cloud using an intelligent precopying mechanism that proactively replicates virtualized applications, lowering the bursting time from hours to minutes. Our evaluation shows over 100% improvement compared to naive solutions but produces more expensive solutions compared to ILP. However, the scalability of our greedy algorithm is dramatically better as the number of VMs increase. Our evaluation illustrates scenarios where our prototype can reduce cloud costs by more than 45% when bursting to the cloud, and that the incremental cost added by precopying applications is offset by a burst time reduction of nearly 95%.

Paper
VMShadow: Optimizing the Performance of Latency-sensitive Virtual Desktops in Distributed Clouds
Tian Guo, Vijay Gopalakrishnan, K. K. Ramakrishnan, Prashant Shenoy, Arun Venkataramani, and Seungjoon Lee
Proceedings of the 5th ACM Multimedia Systems Conference (MMSys'14)
abstract

Distributed clouds offer a choice of data center locations to application providers to host their applications. In this paper we consider distributed clouds that host virtual desktops which are then accessed by their users through remote desktop protocols. We argue that virtual desktops that run latency-sensitive applications such as games or video players are particularly sensitive to the choice of the cloud data center location. We design VMShadow, a system to automatically optimize the location and performance of location-sensitive virtual desktops in the cloud. VMShadow performs black-box fingerprinting of a VM’s network traffic to infer its location-sensitivity and employs a greedy heuristic based algorithm to move highly location-sensitive VMs to cloud sites that are closer to their end-users. VMShadow employs WAN-based live migration and a new network connection migration protocol to ensure that the VM migration and subsequent changes to the VM’s network address are transparent to end-users. We implement a prototype of VMShadow in a nested hypervisor and demonstrate its effectiveness for optimizing the performance of VM-based desktops in the cloud. Our experiments on a private and the public EC2 cloud show that VMShadow is able to discriminate between location-sensitive and insensitive desktop applications and judiciously move only those VMs that will benefit the most. For desktop VMs with video activity, VMShadow improves VNC’s refresh rate by 90%. Further our connection migration proxy, which utilizes dynamic rewriting of packet headers, imposes a rewriting overhead of only 13µs per packet. Trans-continental VM migrations take about 4 minutes.

Paper
VMShadow: Optimizing The Performance of Virtual Desktops in Distributed Clouds
Tian Guo, Vijay Gopalakrishnan, K. K. Ramakrishnan, Prashant Shenoy, Arun Venkataramani, and Seungjoon Lee
Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13)
abstract

We present VMShadow, a system that automatically optimizes the location and performance of applications based on their dynamic workloads. We prototype VMShadow and demonstrate its efficacy using VM-based desktops in the cloud as an example application. Our experiments on a private cloud as well as the EC2 cloud, using a nested hypervisor, show that VMShadow is able to discriminate between location-sensitive and location-insensitive desktop VMs and judiciously moves only those that will benefit the most from the migration. For example, VMShadow performs transcontinental VM migrations in ∼4 mins and can improve VNC’s video refresh rate by up to 90%.

Paper Poster
Seagull: Intelligent Cloud Bursting for Enterprise Applications
Tian Guo, Upendra Sharma, Timothy Wood, Sambit Sahu, and Prashant Shenoy
Proceedings of the 2012 USENIX conference on Annual Technical Conference (ATC'12)
abstract

Enterprises with existing IT infrastructure are beginning to employ a hybrid cloud model where the enterprise uses its own private resources for the majority of its computing, but then “bursts” into the cloud when local resources are insufficient. However, current approaches to cloud bursting cannot be effectively automated because they heavily rely on system administrator knowledge to make decisions. In this paper we describe Seagull, a system designed to facilitate cloud bursting by determining which applications can be transitioned into the cloud most economically, and automating the movement process at the proper time. We further optimize the deployment of applications into the cloud using an intelligent precopying mechanism that proactively replicates virtualized applications, lowering the bursting time from hours to minutes. Our evaluation illustrates how our prototype can reduce cloud costs by more than 45% when bursting to the cloud, and the incremental cost added by precopying applications is offset by a burst time reduction of nearly 95%.

Paper Slides Videos