MayBMS

MayBMS - A Probabilistic Database Management System

Download MayBMS
Latest version: 2.1-beta

Quick Links

What is MayBMS?

The MayBMS system (note: MayBMS is read as “maybe-MS”, like DBMS) is a complete probabilistic database management system that leverages robust relational database technology: MayBMS is an extension of the Postgres server backend. MayBMS is open source and the source code is available under the BSD license.

MayBMS stands alone as a complete probabilistic database management system that supports a powerful, compositional query language for which nevertheless worst-case efficiency and result quality guarantees can be made. The MayBMS backend is accessible through several APIs, with efficient internal operators for computing and managing probabilistic data.

In summary, MayBMS has the following features:

  • Full support of all features of PostgreSQL 8.3.3, including unrestricted query functionality, query optimization, APIs, updates, concurrency control and recovery, etc.
  • Essentially no performance loss on PostgreSQL 8.3.3 functionality: After parsing a query or DML statement, a fast syntactic check is made to decide whether the statement uses the extended functionality of MayBMS. If it does not, the subsequently executed code is exactly that of PostgreSQL 8.3.3.
  • Support for efficiently creating and updating probabilistic databases, i.e., uncertain databases in which degrees of belief can be associated with uncertain data.
  • A powerful query and update language for processing uncertain data that gracefully extends SQL with a small number of well-designed language constructs.
  • State-of-the-art efficient techniques for exact and approximate probabilistic inference.

Applications

Database systems for uncertain and probabilistic data promise to have many applications. Query processing on uncertain data occurs in the contexts of data warehousing, data integration, and of processing data extracted from the Web. Data cleaning can be fruitfully approached as a problem of reducing uncertainty in data and requires the management and processing of large amounts of uncertain data. Decision support and diagnosis systems employ hypothetical (what-if) queries. Scientific databases, which store outcomes of scientific experiments, frequently contain uncertain data such as incomplete observations or imprecise measurements. Sensor and RFID data is inherently uncertain. Applications in the contexts of fighting crime or terrorism, tracking moving objects, surveillance, and plagiarism detection essentially rely on techniques for processing and managing large uncertain datasets. Beyond that, many further potential applications of probabilistic databases exist and will manifest themselves once such systems become available.

The MayBMS distribution comes with a number of examples that illustrate its use in these application domains. Some of these examples are described in the tutorial chapter of our manual.

For Researchers

Cite: L. Antova, T. Jansen, C. Koch, and D. Olteanu. "Fast and Simple Relational Processing of Uncertain Data". Proc. 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancun, Mexico, pp. 983-992.

[Read as "maybe-em-es"; rhymes with DBMS]

Overview

MayBMS is a state-of-the-art probabilistic database management system developed as an extension of the Postgres server backend.

The MayBMS project is founded on the thesis that a principled effort to use and extend mature relational database technology will be essential for creating robust and scalable systems for managing and querying large uncertain datasets.

MayBMS stands alone as a complete probabilistic database management system that supports a very powerful, compositional query language (examples) for which nevertheless worst-case efficiency and result quality guarantees can be made. Central to this is our choice of essentially using probabilistic versions of conditional tables as the representation system, but in a form engineered for admitting the efficient evaluation and automatic optimization of most operations of our language using robust and mature relational database technology.

Central themes in our research include the creation of foundations of query languages for probabilistic databases by developing analogs of relational algebra and SQL and the development of efficient query processing techniques. In practice, the efficient evaluation of queries on probabilistic data requires approximation techniques, and another important goal is to understand which approximation guarantees can be made for complex,realistic query languages. Apart from data representation and storage mechanisms, a query language, and query processing techniques, our work covers query optimization, an update language, concurrency control and recovery, and APIs for uncertain data.

Overview Papers and Slides

  • C. Koch. "MayBMS: A System for Managing Large Uncertain and Probabilistic Databases" [pdf]. Chapter 6 of Charu Aggarwal, ed., Managing and Mining Uncertain Data, Springer-Verlag, 2008/9.
  • Slides on MayBMS2.
  • Slides on MayBMS1.

People

Alumni: Viktor Ivanov, Thomas Jansen, Ali Baran Sari, Yin Lou, Anton Morozov, Lucja Kot.

Overview Papers

  • MayBMS: A System for Managing Large Uncertain and Probabilistic Databases [pdf]
    C. Koch. Chapter 6 of Charu Aggarwal, ed., Managing and Mining Uncertain Data, Springer-Verlag, 2009.
    • This is currently the best document giving an overview of the project.

New: Pip (MayBMS with continuous distributions)

  • PIP: A Database System for Great and Small Expectations
    Oliver Kennedy, Christoph Koch.
    Proc. ICDE 2010. Long paper. (pdf)

Papers on the MayBMS2 System

  • Approximate Confidence Computation in Probabilistic Databases
    Dan Olteanu, Jiewen Huang, Christoph Koch.
    Proc. ICDE 2010. Long paper. (pdf)

  • MayBMS: A Probabilistic Database Management System [pdf]
    Jiewen Huang, Lyublena Antova, Christoph Koch, Dan Olteanu. Proc. SIGMOD 2009. Demo paper.

  • A Compositional Framework for Complex Queries over Uncertain Data [pdf]
    M. Goetz and C. Koch. Proc. ICDT 2009.

  • SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases [pdf]
    D. Olteanu, J. Huang, C. Koch. Proc. ICDE, 2009. Long paper.
    • This paper shows how to find query plans that are more efficient than safe plans for hierarchical queries on tuple-independent databases. The paper also introduces a special operator for efficiently processing such plans.

  • Conditioning Probabilistic Databases [arXiv:0803.2212]
    C. Koch and D. Olteanu. Proc. VLDB 2008.
    • This paper is the first to consider the problem of conditioning a probabilistic database outside of the context of graphical models. The core contribution is an exact confidence computation algorithm that seems to perform well in practice.

  • Approximating Predicates and Expressive Queries on Probabilistic Databases [pdf]
    C. Koch. Proc. PODS 2008.
    • This paper shows that queries in our expressive compositional query language can be efficiently arbitrarily closely approximated.

  • Fast and Simple Relational Processing of Uncertain Data [pdf]
    L. Antova, T. Jansen, C. Koch, D. Olteanu. Proc. ICDE 2008. Best paper runner-up.
    • This paper presents the representation system of MayBMS2 and the efficient SQL-only evaluation of a large fragment of our query language.

Papers on the MayBMS Query Language and on APIs

  • A Compositional Query Algebra for Second-Order Logic and Uncertain Databases [pdf]
    C. Koch. Proc. ICDT 2009. Technical Report arXiv:0807.4620.
    • This paper proves that world-set algebra, (the nonprobabilistic version of) the core of the MayBMS query language, has exactly the same expressive power as second-order logic. It also provides some useful insights into query languages for uncertain databases in general.

  • On Query Algebras for Probabilistic Databases
    Christoph Koch. SIGMOD Record 37(4): 78-85, 2008. (pdf).

  • On APIs for Probabilistic Databases [pdf]
    L. Antova and C. Koch. Proc. MUD 2008.
    • This paper studies the challenge of defining an application programming interface for probabilistic databases. This is difficult because the goal of keeping the API independent from database internals (specifically, the representation system) clashes with the desire for efficiency.

  • Query language support for incomplete information in the MayBMS system (Demonstration) [pdf]
    L. Antova, C. Koch, D. Olteanu. Proc. VLDB 2007.
    • This was a MayBMS2 demo, but the paper focuses on the query language of MayBMS. The PODS 2008 paper is a much better reference for (the algebraic version of) the language.

  • From Complete to Incomplete Information and Back [pdf]
    L. Antova, C. Koch, D. Olteanu. Proc. SIGMOD 2007.
    • This paper presents the nonprobabilistic version of the MayBMS query language and studies its properties.

Papers on the MayBMS1 Prototype

Note: We are currently working on the second prototype of MayBMS -- MayBMS2 -- which is based on U-relations as the representation system (see our ICDE 2008 paper). The first prototype, MayBMS1, was based on world-set decompositions (WSDs). U-relations allow for more efficient query processing than WSDs and are more succinct.

  • World-set Decompositions: Expressiveness and Efficient Algorithms [arxiv 0705.4442]
    L. Antova, C. Koch, D. Olteanu. Theoretical Computer Science 403 (2-3):265-284 (2008) Preliminary version in Proc. ICDT 2007.
    • This paper studies the theory of world-set decompositions. Of particular interest is the factorization algorithm, which does a form of minimization of representations.

  • MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions (Demonstration) [pdf]
    L. Antova, C. Koch, D. Olteanu. Proc. ICDE 2007. Demo Paper.
    • This was a demo of MayBMS1. The paper is the first to discuss world-set decompositions for representing probabilistic databases.

  • 10^(10^6) Worlds and Beyond: Efficient Representation and Processing of Incomplete Information. [pdf]
    L. Antova, C. Koch, D. Olteanu. Proc. ICDE 2007. Technical Report INFOSYS-TR-2005-4. Journal version in VLDB Journal 18(5): 1021-1040 (2009), Special Issue on Uncertain and Probabilistic Databases (pdf).
    • This paper introduces world-set decompositions, the representation formalism of MayBMS1, and studies query evaluation on these representations. World-set decompositions are based on factorizations to exploit independence and, at least in their probabilistic form, can be thought of as shallow Bayesian Networks.

Posters

Additional Resources

Acknowledgments

The MayBMS project was supported by grant IIS-0812272 of the US National Science Foundation. It was previously supported by the German Science Foundation (DFG) under grant KO 3491/1-1.