Distributed Data Storage using NC

Introduction

Modern storage systems store massive amounts of data over a set of storage nodes. To achieve data reliability, redundancy is introduced to the data. In addition, in order to maintain the required redundancy, the storage system must support data repair. A critical issue is how to achieve timely repair process, so as to minimize the chance of data unreliability when more nodes fail. Recent work has shown that regenerating codes, which are based on the concept of network coding, can improve the data repair performance when some storage nodes are failed, as compared to traditional storage schemes such as erasure coding. However, there remain open issues regarding the feasibility of deploying regenerating codes in practice.

This project, led by Prof. Patrick Lee (http://www.cse.cuhk.edu.hk/~pclee), focuses on studying the practicality of network coding data storage. The main objectives of this project are:

- To realize network coding data storage in practical implementation.

- To conduct extensive experimental studies and evaluate the performance in a real storage environment.

- To provide insights into deploying network coding data storage in practice

Building on this success, INC held the First Workshop on Network Coding and Data Storage at CUHK (NCDS 2011) on July 21-22, 2011. The purpose of this workshop is to bring together researchers in information theory, data storage, and distributed systems to explore the potential of NC applications in distributed storage systems.

 

Projects

1.NCCloud

NCCloud is a proof-of-concept prototype of a network-coding-based file system that aims at providing fault tolerance and reducing data repair cost when storing files using multiple-cloud storage (or any other kinds of raw storage devices). NCCloud is a proxy-based file system that interconnects multiple (cloud) storage nodes. It can be mounted as a directory on Linux, and file uploading/downloading are done by copying files to/from the mounted directory. NCCloud is built on FUSE, an open-source, programmable user-space file system that provides application programmable interfaces (APIs) for file system operations. From the point of view of user applications, NCCloud presents a file system layer that transparently stripes data across storage nodes.

Network codes for storage repair require that storage nodes encode the stored data during the repair process. However, this may not be feasible for some storage systems where nodes only provide the basic I/O functionalities but do not have the encoding capability. Our work is to adapt the benefits of network codes in the storage repair of a practical storage setting, by relaxing the encoding requirement of storage nodes.

NCCloud supports a variety of coding schemes, in particular the Functional Minimum Storage Regenerating (F-MSR) codes. Compared to traditional optimal erasure codes (e.g., Reed-Solomon), FMSR codes maintains the same storage overhead under the same data redundancy level, but uses less repair traffic during the recovery of a single failed storage node. NCCloud realizes regenerating codes in a practical cloud storage system that does not require any encoding/decoding intelligence on the cloud storage nodes.

Publications

Yuchong Hu, Patrick P. C. Lee, Kenneth W. Shum
"Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems."
CoRR abs/1208.2787, August 2012.

Yuchong Hu, Henry C. H. Chen, Patrick P. C. Lee, and Yang Tang
"NCCloud: Applying Network Coding for the Storage Repair in a Cloud-of-Clouds"
Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST '12), San Jose, CA, February 2012.
pdf | talk | poster | source code

Source
http://ansrlab.cse.cuhk.edu.hk/software/nccloud/

 

2.FMSR-DIP

FMSR-DIP is a proof-of-concept prototype aimed at providing data integrity protection atop today's cloud storage. Regenerating code is a recently proposed erasure code that requires less data to be downloaded when repairing node failures, compared to conventional codes such as the Reed-Solomon codes. Functional minimum storage regenerating (FMSR) code is a type of regenerating code that is also maximum distance separable. FMSR-DIP augments the FMSR code with a data checking capability that allows stored data to be sampled for checking in a flexible manner, without adding to its download traffic requirements during file downloads or repairs. In short, our work adds an efficient data integrity checking capability to FMSR code to provide a more comprehensive data protection solution, without eliminating the advantage of using FMSR code.

Publications

Henry C. H. Chen and Patrick P. C. Lee
"Enabling Data Integrity Protection in Regenerating-Coding-Based Cloud Storage"
Proceedings of the 31st International Symposium on Reliable Distributed Systems (SRDS 2012), Irvine, CA, October 2012. 
Tech Report pdf | pdf | talk | source code

Source
http://ansrlab.cse.cuhk.edu.hk/software/fmsrdip/

 

3.NCFS

NCFS is a proof-of-concept prototype of a Network-Coding-based Distributed File System. NCFS is a proxy-based file system that interconnects multiple storage nodes. It relays regular read/write operations between user applications and storage nodes, and relays data among storage nodes during the data repair process. NCFS is built on FUSE, an open-source, programmable user-space file system that provides application programmable interfaces (APIs) for file system operations. From the point of view of user applications, NCFS presents a file system layer that transparently stripes data across physical storage nodes.

NCFS supports a specific regenerating coding scheme called Exact Minimum Bandwidth Regenerating (E-MBR) codes [Rashmi et al.; 2009], which seek to minimize repair bandwidth. One key property of E-MBR is that it does not require any encoding/decoding intelligence on the storage nodes, as long as the storage nodes provide the standard I/O interfaces. NCFS also supports RAID-based erasure coding schemes, so as to enable us to conduct a comprehensive empirical study of different classes of data recovery for distributed storage under real network settings. To the best of our knowledge, NCFS is the first work that realizes regenerating codes in a practical distributed storage system.

Publications

Yuchong Hu, Chiu-Man Yu, Yan Kit Li, Patrick P. C. Lee, and John C. S. Lui
"NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System"
Proceedings of the 2011 International Symposium on Network Coding (NETCOD), Beijing, China, July 2011. 
pdf | talk | source code

Source
http://ansrlab.cse.cuhk.edu.hk/software/ncfs/

 

Ongoing Work

NCVFS

With the proliferation of video data and hence demand for data storage in general, it is critical to provide a fault-tolerant and secure cloud storage solution for storing a massive amount of video files. We propose the network-coding-based video file system (NCVFS), a distributed storage system which leverages on the concept of network coding, to provide scalable fault-tolerant data storage. NCVFS not only achieves the same level of reliability as traditional RAID-like systems, but also minimizes the repair bandwidth during data recovery. The latter implies a significant improvement on the overall system reliability. Aiming to support Internet streaming of video contents, NCVFS is developed as a platform with the broadcasting industry in mind. The underlying technologies are applicable to other (non-video) file types. Thus, we expect that NCVFS can also find applications in other areas/industries, and hence fuel the data storage industry as a whole.

 

Grants

The following grants are obtained based on this project.

"Distributed Archival Storage Systems using Non-Systematic Codes: Theory and Practice." RGC Early Career Scheme (ECS), 01/2013 - 12/2015. Amount: HKD 550,000.

"Network Coding Distributed Storage of Video Files." Innovation and Technology Fund (ITF), 07/2012 - 12/2013. Amount: HKD 994,750.