The overall realm of this topic is about DATA. in general the traditional way of storing the data is in form of databases SQL – RDBMS ( Oracle, MSSQL, MYSQL and etc..) and NoSQL (MongoDB, Cassandra, Hbase and etc..), these databases are divided into tables for normalization based on Primary and Foreign keys, now for fetching the data we join these tables and display the information. this process of combining the tables is called joins , this is one of the most important feature of databases. We use Inner join, Outer join, Right join, Left join to join the tables and fetch the data based on the requirement.
This same feature is also available in pandas, pandas is a python library which is used for Data Engineering and Data Analytics. We might have data in different format like ( excel, csv, tsv, .txt and etc…) , databases and etc… Assume we have data spread across 2 datasets and in different file formats like excel and csv . here using pandas we can fetch the data into different data frames and then join the data. It behaves much the same as database joins. Lets see how this works.
I have 2 Datasets here Age (Age.xlsx) and Weight (Weight.csv) of People, here we have Names of the people common in both the datasets and few records which are not common. This is to test the Right Joins and Left Joins.
Lets see how the Inner join works, Inner join / Equi join joins the common records in both the tables. The default join is Inner , for other kinds of joins you need to specifically mention the type of join .
The Outer join joins both the tables Irrespective, of common names in both the datasets, here the columns with no values is represented as NAN ( Not a Number). The below out displays the NAN underlined in RED.
Left join joins all the rows of the Left dataset ( weight) that are common and also the columns that does not match in the Right dataset ( Age)
Right join joins all the rows of the Right dataset ( Age) that are common and also the columns that does not match in the Left dataset ( weight)
Here in pandas we have an indicator that tells us about which kind of join it performed, just to ensure the it performed the correct join on the rows.