NoSQL (Not Only SQL)

ISHMEET KAUR
5 min readMay 25, 2020

Not using the relational model: Data is not stored in tables of rows and columns, but in a variety of different formats.

Built for the 21st century web explosion: Needs of data centric companies (Example: Facebook and Twitter) are different from typical ERP applications.

Developed to support the real time processing needs of high volume and high velocity big data.

Support agile development where nature of data changes rapidly over cycles of a software development project.

Challenges of Relational databases

Required a well defined structure of data, not suitable for high variety of big data

Database schema is defined upfront before building the application. This pattern does not provide flexibility in an agile development project that deals with highly dynamic applications.

  • Relational databases can only grow vertically with more resources need to be added to the existing servers.

Benefits of NoSQL over Relational databases

•NoSQL databases are schema less , do not define strict data structure.

Highly agile, can adapt to variety of data format including structured, unstructured and semi-structured.

Can scale horizontally by adding more servers. Utilizes concepts of sharding and replication.

-Sharding : distributes data over multiple servers so that a single server acts as a source for subset of data.

-Replication: Copies data across multiple servers so that data can be found in multiple places and can be recovered in case of failure of servers.

  • Can utilize cloud computing model which utilizes virtual servers that can be scaled as per demand.
  • Better performance than relational databases for unstructured data.

CAP theorem-Describes basic requirements for a distributed system and only 2 of the 3 can be achieved at a point of time and trade offs must be made depending on the task.

Consistency: All servers maintain the same state of data. Any queries for data will yield same answers regardless of which server answers the query.

Availability: The system is always available to yield answers to queries for data.

  • Partition tolerance: System continues to operate as a whole even if individual servers fail or crash.

ACID describes a set of properties that apply to data transactions and that databases can choose to follow.

•Atomicity: All or none of a transaction must happen successfully.

Consistency: Data is committed only if it passes all rules imposed by data types, triggers, constraints etc

Isolation: Transactions are ordered such that one transaction does not get affected by another making changes to data.

  • Durability: Once data is committed, it is durably stored and safe against errors, crashes or any other malfunctions within the database.

Eventual consistency in NoSQL databases

•Consistency principle amongst the ACID principles is comprised on, promising “eventual consistency” instead.

•As per the CAP theorem, availability and partition tolerance are chosen over consistency instead offering “eventual consistency”.

•Database changes are propagated to all nodes “eventually” (typically within milliseconds) so queries for data might not return updated data immediately or might result in stale data.

•NoSQL databases offer options to tune the database as per specific requirements. If database is read heavy, eventual consistency is preferred.

For example: DynamoDB in AWS offers options between eventual consistency and strong consistency.

  • Eventually Consistent Reads (Default): maximizes your read throughput. However, an eventually consistent read might not reflect the results of a recently completed write. Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data.
  • Strongly Consistent Reads: A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read.

Popular No SQL Databases

Types of NoSQL Databases

  1. Key-value:

1.Perform operations based on a unique key, where value is a blob that can be of any format.

2.Can store a blob with a key, read , update or delete blob for a specific key.

3. Scalable and fast performing.

4.Generally useful for storing session information, user profiles, preferences, shopping cart data.

5.Avoid using Key-value databases when relationships exist between data and need to operate on multiple keys at at time.

6.Some persist on disk (example: Redis), others like Memcached store in memory. Choose as per application needs.

7.Other examples include Amazon DynamoDB, Riak and Couchbase.

Document store NoSQL databases

1.Documents can be of type XML, JSON, EDI, SWIFT etc.

2.Document databases store documents in the value part of the key.

3.Documents are often self describing, hierarchical tree structures consist of map, collections and scalar values.

4.Documents are indexed using Btree and queried using a Javascript query engine.

5.Examples include MongoDB and CouchDB

Column store NoSQL Databases

1.Column-family databases store data in column families as rows that have many columns associated with a row key.

2.A column family is a container for an ordered collection of rows. Column families are groups of related data that is often accessed together. For a Customer, we would often access their Profile information at the same time, but not their Orders.

3. Each column family can be compared to a container of rows in an RDBMS table where the key identifies the row and the row consists of multiple columns. The difference is that various rows do not have to have the same columns, and columns can be added to any row at any time without having to add it to other rows.

4.When a column consists of a map of columns, then we have a super column. A super column consists of a name and a value which is a map of columns. Think of a super column as a container of columns.

Graph NoSQL databases

1.Based on graph theory. Use nodes, edges and properties on both nodes and edges.

2.Graph databases allow you to store entities and relationships between these entities. Entities are also known as nodes, which have properties.

3.Relations are known as edges that can have properties. Edges have directional significance; nodes are organized by relationships which allow you to find interesting patterns between the nodes.

4.In graph databases, traversing the joins or relationships is very fast. The relationship between nodes is not calculated at query time but is actually persisted as a relationship. Traversing persisted relationships is faster than calculating them for every query.

5.Power from the graph databases comes from the relationships and their properties. Modeling relationships needs thorough design.

6.Examples include Neo4J, Infinite Graph, OrientDB

--

--