The research in my group strives to make the networked systems that govern our world sustainable and resilient. We develop new mathematical tools in machine learning, optimization, control, and economics and apply these tools to design new algorithms and markets with provable guarantees that can be deployed in data centers, the electricity grid, transportation systems, and beyond. Some of the areas we have focused on in the past are listed below.
Reinforcement Learning (RL) for Networked Systems
The success of RL in recent years has led to excitement about the potential for applications to networked systems such as the power grid. However, in multi-agent settings RL suffers from seemingly unavoidable scalability problems. However, there is the potential for properties of the network structure in systems like the power grid to help. So, the question becomes: Can the network structure be exploited to make multi-agent RL tractable? Our work on this topic develops new algorithms that exploit “exponential decay” properties of networked systems to achieve provable guarantees for multi-agent RL. Find out more here, here, and here.
Online Optimization & Energy
Online optimization is a powerful framework in machine learning that has seen numerous applications to problems in energy and sustainability. We began by applying online optimization to ‘right-size’ capacity in data centers nearly a decade ago; and by now tools from online optimization have been applied to develop algorithms for geographical load balancing among data centers, demand response, generation planning, energy storage management, and beyond. Over a decade, we have moved from designing algorithms for one-dimensional problems with restrictive assumptions on costs to general results for high-dimensional, non-convex problems that highlight the role of constraints, predictions, delay, multi-timescale control, and more. Further, we have uncovered a deep connection between online optimization and the control of dynamical systems that we are only beginning to understand. Find out more here, here, and here. You can also check out the CarbonFirst project page.
Learning & Control
The contrast between the grounded, model-based approach of traditional control theory and the free-wheeling model-free approach common in ML/AI is extreme. It is clear that the model-free viewpoint in ML/AI can be valuable for control, but it must also respect the physical and safety constraints on the systems. So, the question becomes: How can model-free tools in ML/AI be combined with the model-based tools in control theory? Our work looks at this in a variety of contexts, from seeking connections between online optimization and online control (see here) to understanding how reinforcement learning can be used to refine model-based control (see here). Some examples of our papers on these topics are here, here, and here.
Algorithms for Sustainable IT
Everyone has heard the statistics about how much of an energy hog IT has become: The emissions of a server are nearly that of a car! The electricity usage of data centers is growing 12 times faster than that of the US as a whole! While the last decade has led to significant improvements in energy-efficiency across IT, there is still a long ways to go to be truly sustainable. The goal of the work in our group is to design the algorithms needed for 100% renewable-driven, carbon free data centers. You can find out more by watching a few talks I have given on the topic here and here, and by reading a profile of the work from the ENG magazine at Caltech. You can also check out the CarbonFirst project page
Mitigating Cascading Failures in the Power Grid
The economic impacts of cascading failures is enormous, and the frequency of failure events is likely to increase as the penetration of intermittent renewable generation grows. However, cascading failures propagate non-locally, making their prevention, detection, and mitigation difficult. In fact, very little is known about how to analyze their propagation mathematically. Our work uses a novel spectral view of the power system to develop a mathematical theory of line failure localization and propagation. Using this theory, we are working to develop and evaluate techniques that mitigate the spread of cascading failures. An introduction to our approach can be found here (Part I, Part II, Part III).
Optimizing Large-Scale Heterogeneous ML Cloud Platforms
Democratizing large-scale ML requires more than just better algorithms, it requires cloud platforms that make building and training models accessible. But, optimizing these systems requires rethinking scheduling, resource allocation, data management, power usage, and more. Our work on this topic focuses on topics such as mitigating the impact of stragglers, optimization the scheduling of the DAG workflows that underlie systems like TensorFlow, and distributed load balancing policies. Our research has been deployed in AzureML and Google Cloud platforms. You can see examples of our papers here, here, and here.
Markets for the Smart Grid
Over the coming decade, the electricity network will undergo a complete architectural transformation, similar to what has happened to the communication network over the last decades. However, there are huge engineering and economic challenges in making this transformation possible. In fact, unlike in the case of communication networks, the economic market structure and engineering architecture are inherently intertwined in the electricity grid, which necessitates a new architectural theory for guiding this transformation. To find out more, take a look at some papers here, here, and here; as well as some presentations here and here.
Competition in School Choice
In many public school districts lotteries are crucial tools for giving families choice to find programs that best fit their child. But, the design of these lotteries is crucial and could lead to integration or exacerbate self-segregation. Further, as public schools face increasing competition from private and charter schools, how the design of the open enrollment lottery is is essential for allowing public schools to compete. We have partnered with the Pasadena schools to rethink open enrollment lotteries with these issues in mind. Read about our work here and watch news coverage here.
Platforms and Networked Markets
These days it is almost impossible to study networks without considering economic incentives. From net neutrality to the sharing economy, understanding the economic incentives in networked systems is crucial. However, our understanding of the interaction of economics and networks is still in its infancy. Our work focuses on understanding how to design market platforms for cloud computing, ride-sharing, renewable energy, and beyond. To find out more, you can take a look at a few talks on the topic related to networked Cournot competition (part II is here), cloud computing markets, and electricity markets
Demystifying Heavy Tails
Heavy-tails are a continual source of excitement and confusion across disciplines as they are repeatedly “discovered” in new contexts. This is especially true within computer systems, where heavy-tails seemingly pop up everywhere — from degree distributions in the internet and social networks to file sizes and inter-arrival times of workloads. However, despite decades of work on heavy-tails they are still treated as mysterious, surprising, and even controversial. We are in the process of writing a new book on the topic that will (hopefully) highlight that heavy-tailed distributions need not be mysterious and should not be surprising or controversial. For details about our research on this topic you can check out my papers in this area here, here, and here; as well as slides from a tutorial I gave on the topic.
“Empirical” Algorithmic Game Theory
One of the fundamental tasks for research at the intersection of CS and Economics is to understand how to incorporate “computation” into classic economic theories. The task of understanding the computational complexity of economics models has been a cornerstone of the field and an understanding has emerged that many standard economic models are “hard” in the worst case. But, do these worst case examples appear in real-world settings. That is, can data reveal hard instances of economic models? Our work strives to quantify what properties of data sets necessitate hardness in the inferred economic models. For details, check out our papers here, here, and here.
Speed scaling: An Algorithmic Perspective
Speed scaling has long been used as a power-saving mechanism at a chip level. However, nowadays, speed scaling is used as an approach for trading off energy usage and performance throughout all levels of computer systems. This wide-spread use of speed scaling has motivated many to revisit foundational questions about the design of speed scaling policies, including questions like: What is the structure of the optimal speed scaling algorithm? How does speed scaling interact with scheduling? What is the impact of the sophistication of speed scaling algorithms? Does speed scaling have any unintended consequences? Our work has delved into these questions both in the domain of worst-case and stochastic scheduling analysis. For an overview, see the survey here and some of our papers here and here.
Fairness & Scheduling
Modern designs often improve user response times by giving priority to small job sizes. But, this leads to worries about whether large job sizes receive fair performance. So, we need to ask: How much starvation/unfairness is caused to large jobs by prioritizing small jobs? Answering this question was a theme in my PhD thesis.
Funding for the publications listed above has come in large part from the NSF through CNS-2146814, CPS-2136197, CNS-2106403, NGSDI-2105648, AitF-1637598, CNS-1518941, CPS-154471, CNS-1319820, EPAS-1307794, CCF-1101470, CNS-0846025, CCF-0830511, and a Career award. Additional funding has come from ARPA-E through the GENI and NODES programs. Other external sponsors have included Amazon AWS, Beyond Limits, PIMCO, Citadel, Google, HP, IBM, Microsoft, MIT Lincoln Laboratory, the Okawa Foundation, Pasadena Water & Power, the Simons Institute, Southern California Edison, VMWare, Yahoo, and Facebook. Internal Caltech funding has come from IST, CAST, the Resnick Institute, the Linde Institute, the Carver Mead New Adventures Fund, the Moore Foundation, and the Lee Center.