Year of Publication


Degree Name

Master of Science (MS)

Document Type

Master's Thesis




Computer Science

First Advisor

Dr. Tingting Yu

Second Advisor

Dr. Jane Hayes


Concurrency faults are difficult to reproduce and localize because they usually occur under specific inputs and thread interleavings. Most existing fault localization techniques focus on sequential programs but fail to identify faulty memory access patterns across threads, which are usually the root causes of concurrency faults. Moreover, existing techniques for sequential programs cannot be adapted to identify faulty paths in concurrent programs. While concurrency fault localization techniques have been proposed to analyze passing and failing executions obtained from running a set of test cases to identify faulty access patterns, they primarily focus on using statistical analysis. We present a novel approach to fault localization using feature selection techniques from machine learning. Our insight is that the concurrency access patterns obtained from a large volume of coverage data generally constitute high dimensional data sets, yet existing statistical analysis techniques for fault localization are usually applied to low dimensional data sets. Each additional failing or passing run can provide more diverse information, which can help localize faulty concurrency access patterns in code. The patterns with maximum feature diversity information can point to the most suspicious pattern. We then apply data mining technique and identify the interleaving patterns that are occurred most frequently and provide the possible faulty paths. We also evaluate the effectiveness of fault localization using test suites generated from different test adequacy criteria. We have evaluated Cadeco on 10 real-world multi-threaded Java applications. Results indicate that Cadeco outperforms state-of-the-art approaches for localizing concurrency faults.

Digital Object Identifier (DOI)