Author ORCID Identifier

https://orcid.org/0000-0003-1880-5977

Date Available

4-2-2020

Year of Publication

2019

Document Type

Master's Thesis

Degree Name

Master of Science (MS)

College

Engineering

Department/School/Program

Computer Science

Advisor

Dr. Jinze Liu

Abstract

The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2019.386

Share

COinS