Author ORCID Identifier

https://orcid.org/0000-0003-4541-6918

Year of Publication

2018

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Engineering

Department

Computer Science

First Advisor

Dr. Chen Qian

Second Advisor

Dr. Jinze Liu

Abstract

Systems that process big data (e.g., high-traffic networks and large-scale storage) prefer data structures and algorithms with small memory and fast processing speed. Efficient and fast algorithms play an essential role in system design, despite the improvement of hardware. This dissertation is organized around a novel algorithm called Othello Hashing. Othello Hashing supports ultra-fast and memory-efficient key-value lookup, and it fits the requirements of the core algorithms of many large-scale systems and big data applications. Using Othello hashing, combined with domain expertise in cloud, computer networks, big data, and bioinformatics, I developed the following applications that resolve several major challenges in the area.

Concise: Forwarding Information Base. A Forwarding Information Base is a data structure used by the data plane of a forwarding device to determine the proper forwarding actions for packets. The polymorphic property of Othello Hashing the separation of its query and control functionalities, which is a perfect match to the programmable networks such as Software Defined Networks. Using Othello Hashing, we built a fast and scalable FIB named \textit{Concise}. Extensive evaluation results on three different platforms show that Concise outperforms other FIB designs.

SDLB: Cloud Load Balancer. In a cloud network, the layer-4 load balancer servers is a device that acts as a reverse proxy and distributes network or application traffic across a number of servers. We built a software load balancer with Othello Hashing techniques named SDLB. SDLB is able to accomplish two functionalities of the SDLB using one Othello query: to find the designated server for packets of ongoing sessions and to distribute new or session-free packets.

MetaOthello: Taxonomic Classification of Metagenomic Sequences. Metagenomic read classification is a critical step in the identification and quantification of microbial species sampled by high-throughput sequencing. Due to the growing popularity of metagenomic data in both basic science and clinical applications, as well as the increasing volume of data being generated, efficient and accurate algorithms are in high demand. We built a system to support efficient classification of taxonomic sequences using its k-mer signatures.

SeqOthello: RNA-seq Sequence Search Engine. Advances in the study of functional genomics produced a vast supply of RNA-seq datasets. However, how to quickly query and extract information from sequencing resources remains a challenging problem and has been the bottleneck for the broader dissemination of sequencing efforts. The challenge resides in both the sheer volume of the data and its nature of unstructured representation. Using the Othello Hashing techniques, we built the SeqOthello sequence search engine. SeqOthello is a reference-free, alignment-free, and parameter-free sequence search system that supports arbitrary sequence query against large collections of RNA-seq experiments, which enables large-scale integrative studies using sequence-level data.

Digital Object Identifier (DOI)

https://doi.org/10.13023/ETD.2018.222

Share

COinS