2015年4月25日,秦锋博士应邀来中国科学院深圳先进技术研究院云计算研究中心进行学术交流,作了题为“Is Your Storage System Reliable?的学术讲座,报告由须成忠研究员主持。

Abstract

Modern storage technology (SSDs, No-SQL databases, commoditized RAID hardware, etc.) bring new reliability challenges to the already complicated storage stack. At the higher layer of the stack, databases provide the strongest reliability guarantees including the atomicity, consistency, isolation, and durability (ACID) properties. However, the ACID properties are far from trivial to provide, particularly when high performance must be achieved. At the lower layer of the stack, the new components such as Solid State Drive (SSD) are often ignored or under-studied in the adverse conditions.


n this talk, I will mainly present our recent work on exposing and diagnosing violations of the ACID properties provided by databases and studying the behavior of SSDs, in an ostensibly easy context: power faults. More specifically, our framework for torturing databases include workloads to exercise the ACID guarantees, a record/replay subsystem to allow the controlled injection of simulated power faults, a ranking algorithm to prioritize where to fault based on our experience, and a multi-layer tracer to diagnose root causes. Additionally, our framework for testing SSDs includes specially-designed hardware to inject power faults directly to devices, workloads to stress storage components, and techniques to detect various types of failures. After applying our frameworks to the 8 widely-used databases and 15 SSDs, respectively, the results were surprising.


Bio

Feng Qin received his Ph.D. degree from the University of Illinois at Urbana-Champaign. He joined the Department of Computer Science and Engineering at Ohio State as an Assistant Professor in 2006 and was promoted to an Associate Professor with tenure in 2013. His research interests include Software Reliability, Operating Systems, High Performance Computing, and Security. He is particularly interested in developing system mechanisms to improve software availability and reliability at different software development stages. He has published papers in top system conferences in the past decade. One of his papers was awarded as best papers in SOSP'05. Two of his papers won IEEE Micro Top Picks in 2004 and 2007, respectively. Three of his papers were nominated as best papers in HPCA'05, SC'07, and SC'10, respectively. He has received NSF CAREER Award in 2010 and OSU Lumley Research Award in 2015.