Data Models

The Object-Relational Mismatch

if data is stored in relational tables, an awkward translation layer is required between the objects in the application code and the database model of tables, rows, and columns. The disconnect between the models is sometimes called an impedance mismatch.

Schema flexibility in the document model

as the code that reads the data usually assumes some kind of structure—i.e., there is an implicit schema, but it is not enforced by the database.

Relational Model Versus Document Model

  • Relational model: data is organized into relations (tables), where each relation is an unordered collection of tuples (rows).
    • Schema on write.
    • Easy for many to many / one relations.
    • query optimizer automatically decides which parts of the query to execute in which order, and which indexes to use.
  • Document Model: data is organized into nested documents (records).
    • Schema on read.
    • Easy for localize one to many relations.

The main arguments in favor of the document data model are schema flexibility, better performance due to locality, and that for some applications it is closer to the data structures used by the application.

The relational model counters by providing better support for joins, and many-to-one and many-to-many relationships.

It’s not possible to say in general which data model leads to simpler application code; it depends on the kinds of relationships that exist between data items.

a schema may hurt more than it helps, and schemaless documents can be a much more natural data model. But in cases where all records are expected to have the same structure, schemas are a useful mechanism for documenting and enforcing that structure.

  1. Document databases target use cases where data comes in self-contained documents and relationships between one document and another are rare.
  2. Graph databases go in the opposite direction, targeting use cases where anything is potentially related to everything.

Many Relational database has XML and/or JSON support. And some document database have relational like join support.

Data locality for queries

If your application often needs to access the entire document (for example, to render it on a web page), there is a performance advantage to this storage locality. If data is split across multiple tables, like in Figure 2-1, multiple index lookups are required to retrieve it all, which may require more disk seeks and take more time.

The locality advantage only applies if you need large parts of the document at the same time. The database typically needs to load the entire document, even if you access only a small portion of it, which can be wasteful on large documents.

Query Languages for Data

Declarative Query Language

In a declarative query language, like SQL or relational algebra, you just specify the pattern of the data you want—what conditions the results must meet, and how you want the data to be transformed (e.g., sorted, grouped, and aggregated)—but not how to achieve that goal.

  • XPath, CSS selector are examples of declarative language on the web.
  • MongoDB 2.2 added support for a declarative query language called the aggregation pipeline.

MapReduce Querying

MapReduce is a fairly low-level programming model for distributed execution on a cluster of machines. Higher-level query languages like SQL can be implemented as a pipeline of MapReduce operations

Graph-Like Data Models

  • Triple Stores / Semantic Web / RDF
  • Cypher Query language
  • SPARQL