Data storage and retrieval

In the previous lesson you learned about many sources of data. Now let's discuss efficient ways of storing the data that your organization might collect.

Parallel Storage Solutions

Your business probably generates far more data than could even be stored on a single computer. In order to make sure that all data is saved and easy to access, you'll want to store it across many different computers.

A company might have their own set of storage computers, called a "cluster" or a "server", on premises.

The Cloud

Alternatively, a company might pay another company to store data for them. This is referred to as "cloud storage".

Common cloud storage providers include Microsoft Azure, Amazon Web Services, or AWS, and Google Cloud. These services provide more than just data storage; they can also help an organization with data analytics, machine learning, and deep learning.

For now, we'll just focus on data storage.

Types of data storage

Different types of data require different storage solutions. Some data is unstructured, like email, text, video and audio files, web pages, and social media messages. This type of data is stored in a type of database called a document database.

More commonly, data can be expressed as tables of information, like what you might find in a spreadsheet. A database that stores information in tables is called a relational database.

Both of these types of databases can be found on the cloud storage providers that were mentioned earlier.

Data Querying

Once a data has been stored in a document database or a relational database, we'll need to access it.

At a basic level, we'll want to be able to request a specific piece of data, such as "All of the images that were created on March 3rd" or "All of the customer addresses in Lucknow". In addition, we might even want to do some analysis, such as summing, counting, or averaging data.

Each type of database has its own query language; Document databases mainly use NoSQL, while Relational Databases mainly use SQL. SQL stands for "Structured Query Language" and NoSQL stands for "Not only SQL".

Putting it all together: location

Storing your company's data is like building a library.

First, you need to decide where to build your library. That corresponds to choosing a cloud: either an on-premises cluster or one of the providers we discussed before: Azure, AWS, or Google Cloud.

Putting it all together: Data type

Next, you need to decide what types of shelves to install or store your books. The types of shelves will depend upon the types of books.

This is analogous to choosing between a Document database for unstructured data or a Relational database for tabular data.

Just like a library might have multiple types of shelves, you might need to have some data stored in a Document Database and other data stored in a Relational Database.

Putting it all together: Queries

Finally, you'll need a system for referencing and checking out books. The way you locate and retrieve each book depends on how that book is stored.

Similarly, you need a query to speak to the database. For document databases, we generally use NoSQL, and for relational databases, we generally use SQL.