Load Dataset into Pandas DataFrame

Kailash Chandra Behera | Tuesday, July 07, 2020

Introduction

A DataFrame is a very efficient two-dimensional flat data structure and arranging data in rows and columns. The rows and columns can be an index or name. It can be imagined as a table in SQL. The data frame is inherited into Python by the Pandas library, hence this DataFrame is commonly known as Pandas DataFrame.

Here in this blog, we will discuss how to create DataFrame pandas or use python DataFrame for loading, filtering, sorting, grouping, and so on.

Load Dataset into Pandas DataFrame

load iris dataset in python

Load Dataset into Pandas DataFrame

Getting Started

Pandas is a very popular and most widely used library for data exploration and presentation, it provides DataFrame for loading and presenting data in the structure.

Pandas DataFrame can be used for loading, filtering, sorting, grouping, and joining dataset, more it also supports for dealing with missing data. Pandas Library provides the below method for the loading dataset.

  1. read_excel to read excel file.
  2. read_csv to read comma separated values.
  3. read_json to read data with JSON format.
  4. read_table to read database tables.
  5. read_fwf to read data with the fixed-width format.

To use pandas in python, the Pandas module needs to be imported into the environment using the import keyword. The Pandas method can be invoked using the format pandas.method name (pandas.read_cvc). Instead of using Pandass full name, the library can be imported using alias like the below code and the same alias can be used to invoke pandas methods as well.

 Import pandas as pdf
 pdf.read_csv('file name with path')   

Load Dataset into Pandas DataFrame

Demonstration

As mentioned in the Characterstic session, the Pandas library provides 5 methods to load Dataset. Here I have loaded Dataset into python DataFrame using read_excel. The below example reads an xls file from 'D' drive, loads data from sheet1 into Pandas DataFrame, and prints on the screen.

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xls', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print (data)   

The above example reads an excel sheet from 'D' drive, loads data from sheet1 into Python DataFrame and prints in the screen. You can read any excel data sheet using the above code.

Display the first few Records of the DataFrame.

Let’s say you have DataFrame with the data structured as like image on the top. Where you have 20 records (Rows) and 8 columns. The below example will display 10 rows and a maximum of 5 columns.

To select a few rows or columns the DataFrame can be accessed or sliced by index or names. The row and column index always starts from 0. For example from the DataFrame, you want only to display the first 5 rows and columns. The iloc method of DataFrame takes row range as the first parameter and column range as the second parameter to select a few rows and columns from DataFrame. See the example.

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print (data[0:5,0:5])   

Display a few Rows and all Columns of the DataFrame.

To select the first few records from DataFrame pandas library introduced head(n) function The function head(n) (the parameter n is the number of records to display) limits the number of columns to be displayed.

The below example prints a maximum of 10 rows and all the columns.

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print (data.head(10))   

List out Column Names

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
 for col in data.Columns:  
  print(col)  

Sort DataFrame by Column Value

The sort_values() function in the Pandas library sorts DataFrame based on the column valued provided. By default, the sorting is done in the ascending order.

The below example sorts the DataFrame in descending order by value of column 'Math".

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print(data.sort_values('Math',ascending=False))  

Transpose DataFrame Columns to Rows

The best way to display the columns as rows and rows as columns of DataFrame is transpose a DataFrame where the row indexes will be shown as column headers and columns names will be shown as row indexes. The below example transposes a DataFrame using transpose method.

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print(data.head(2).transpose())   

The above code example transposes the DataFrame with two rows and all the columns of dataset.

Filter Records from DataFrame

Asume that you want to filter records from DataFrame based on the certain condition For example, you want list out the students who have scored more than 80 in math. The DataFrame records can be filtered using a condition as indexing mechanism, for those records which the condition returns true are selected as part of the result.

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print(data[data['Math']<80][['First Name','Last Name','Math']])   

Summary

Here in this blog, we saw how to load, filter, sort, group a DataFrame, I hope you have enjoyed it a lot.

Thanks