Year of Publication

2004

Document Type

Dissertation

College

Engineering

Department

Computer Science

First Advisor

Alexander Dekhtyar

Second Advisor

Judy Goldsmith

Abstract

Probabilistic reasoning in databases has been an active area of research during the last twodecades. However, the previously proposed database approaches, including the probabilistic relationalapproach and the probabilistic object approach, are not good fits for storing and managingdiverse probability distributions along with their auxiliary information.The work in this dissertation extends significantly the initial semistructured probabilistic databaseframework proposed by Dekhtyar, Goldsmith and Hawkes in [20]. We extend the formal SemistructuredProbabilistic Object (SPO) data model of [20]. Accordingly, we also extend the SemistructuredProbabilistic Algebra (SP-algebra), the query algebra proposed for the SPO model.Based on the extended framework, we have designed and implemented a Semistructured ProbabilisticDatabase Management System (SPDBMS) on top of a relational DBMS. The SPDBMS isflexible enough to meet the need of storing and manipulating diverse probability distributions alongwith their associated information. Its query language supports standard database queries as wellas queries specific to probabilities, such as conditionalization and marginalization. Currently theSPDBMS serves as a storage backbone for the project Decision Making and Planning under Uncertaintywith Constraints 1‡ , that involves managing large quantities of probabilistic information. Wealso report our experimental results evaluating the performance of the SPDBMS.We describe an extension of the SPO model for handling interval probability distributions. TheExtended Semistructured Probabilistic Object (ESPO) framework improves the flexibility of theoriginal semistructured data model in two important features: (i) support for interval probabilitiesand (ii) association of context and conditionals with individual random variables. An extended SPO1 This project is partially supported by the National Science Foundation under Grant No. ITR-0325063.(ESPO) data model has been developed, and an extended query algebra for ESPO has also beenintroduced to manipulate probability distributions for probability intervals.The Bayesian Network Development Suite (BaNDeS), a system which builds Bayesian networkswith full data management support of the SPDBMS, has been described. It allows expertswith particular expertise to work only on specific subsystems during the Bayesian network constructionprocess independently and asynchronously while updating the model in real-time.There are three major foci of our ongoing and future work: (1) implementation of a queryoptimizer and performance evaluation of query optimization, (2) extension of the SPDBMS to handleinterval probability distributions, and (3) incorporation of machine learning techniques into theBaNDeS.

Share

COinS