Author ORCID Identifier
Date Available
4-2-2020
Year of Publication
2019
Degree Name
Master of Science (MS)
Document Type
Master's Thesis
College
Engineering
Department/School/Program
Computer Science
First Advisor
Dr. Jinze Liu
Abstract
The rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing an analysis. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallel processable. We evaluate Hadoop-EDF’s scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster improves 27 times and 47 times than sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2019.386
Recommended Citation
Wu, Yuanyuan, "HADOOP-EDF: LARGE-SCALE DISTRIBUTED PROCESSING OF ELECTROPHYSIOLOGICAL SIGNAL DATA IN HADOOP MAPREDUCE" (2019). Theses and Dissertations--Computer Science. 88.
https://uknowledge.uky.edu/cs_etds/88