MongoDB workshop: LinuxWorld
MnogoDB: The database for modern applications
On 1st and 8th of May 2021, I attended a workshop on MongoDB by LinuxWorld Informatics Pvt Ltd. In this workshop we learnt about the ins and outs of MongoDB. MongoDB is a document database, which means it stores data in JSON-like documents, and hence is much more flexible then our traditional schema-based databases.
In this workshop we learnt:
Data model is the way that we manage data. It decides how and where to use the database.
SQL is short for sequential query language, where the data model is sequential i.e. the data is arranged in a table with a static type schema and hence is not flexible.
NOSQL is short for not only SQL where there is no static schema in which our data is to be sorted in key and value pairs. It is also known as flexible data oriented model.
Document oriented DB is a data model where a single row(record) is treated as it is one single document.
In widows we just require to download the software, install, and then just set the path for convenience.
JSON is short for java script object notation which is a tool for arranging in data in some particular notation for maintaining uniformity from one tool to another.
CRUD operation in mongoDB can be done in multiple different ways depending upon the requirements.
To configure compass we can just install compass from the official website and then we can just feed in the in url for the server.
To uplad dataset in mongoDB we can use the mongoinsert command in mongo prompt.
To integrate mongoDB with python we can use the pymongo library in python. And then use the MongoClient function to define that our code is a client of the server.
Indexes are a unique number set which are defined to all the documents in our database collection.
These indexes are also the primary key as they will always uniquely identify one and only one document at a time
Indexing is a way of arranging the data such that the searching operations in the database is faster as the data is pre-arranged based on some criteria which allows us to traverse lesser amount of data to search something.
Sharding in mongoDB is way to devide data based on some criteria in the slave nodes so that it gives us better IO speed.
In mongoDB find function will go to every document and compare the query we are looking for and then give the result. Therefore it is quite time consuming if we have huge database. This is by default nature of mangoDB and it is known as Collection Scan(COLLSCAN).
The method of retrieving in particular field with id and place in ordered format is known as index. After creating index if we search something using that index then it is known as index scanning (IXSCAN).
By default mongo DB creates 1 index for ID and sort it in ascending order.
Mongodb also supports user defined indexes on multiple fields i.e, compound indexes. For example if a requirement is to search females of age greater than 25 so we need to examine two keys- age and gender
For Aggregation mongodb has Aggregation framework. Pipeline is created to do aggregation. A pipeline consists of multiple stages and output of every state is passed to the next stage.
$match: for searching
$group: for grouping
$sort: for sorting
In an enterprise we need multi node setup for avoiding single point of failure. Due to any issue if single node goes down then the entire database is unavailable. Therefore to avoid the risk of single point of failure we always use multi node cluster.
We always copy the data in multiple nodes and this is known as ReplicaSet.
In data bases we can store data on different nodes on basis of groups and categories.
MangoDB also supports clustering architecture that is master slave architecture. The master stores the metadata and have router program which tells the client that where the data you are looking for is present.
Aws provides a service Amazon documentDB for mongoDB.
MangoDB Atlas is a fully managed cloud database developed by the same people that built mongoDB. It is a mongo cloud that which can set up entire MongoDB database and behind the scene it uses cloud providers like aws, gcp and azure to setup the clusters.
Sometimes in our multiple documents some fields are common. So instead of creating field or embedded document in main document we put that information in some other centralised document and provide its reference in the main document. This is known as reference data model.
I would like to thank Mr. Vimal Daga Sir Vimal13 for teaching us such a great tool that too in such level of detail, and to LinuxWorld Informatics Pvt Ltd for arranging this amazing workshop.