How to Access Dataset in Python

Kailash Chandra Behera | Saturday, July 04, 2020

Introduction

SQL table represents data in terms of rows and columns and makes it convenient to explore, the similar structure of presenting data is supported in the Python through DataFrame.

Here in this article, we will discuss how to install pandas library, what is DataFrame, how to access the dataset in python using pandas DataFrame and present in the screen in a various way.

How to Access Dataset in Python

load data into pandas datafram

How to Access Dataset in Python

Getting Started

In SQL tables organize and represent data in terms of rows and columns and make it convenient to explore and transformations. In Python, The same structure of organizing and representing data is through DataFrame.

A DataFrame is a very efficient two-dimensional flat data structure and arranging data in rows and columns. The rows and columns can be an index or name. It can be imagined as a table in SQL. The data frame is inherited into Python by the Pandas library, hence this DataFrame is commonly known as Pandas DataFrame.

How to Access Dataset in Python

Getting Started with Pandas DataFrame

Pandas is a very popular and most widely used library for data exploration and presentation, it provides DataFrame for loading and presenting data in the structure.

Pandas DataFrame can be used for loading, filtering, sorting, grouping, and joining dataset, more it also supports for dealing with missing data. Pandas library provided different methods for loading data from dataset or files.

Pandas is an open-source data analysis tool for Python programming language, which is easy to use in a structure, analyze, and present data. It is a highly-performed, popular, and most widely used library.

Characteristics of Pandas DataFrame

  1. Pandas DataFrame is a highly performed and efficient DataFrame object for data manipulation with integrated indexing.

  2. It is a tool for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format. Pandas Library provides the below method for the loading dataset.

    1. read_csv to read comma separated values.
    2. read_json to read data with JSON format.
    3. read_excel to read excel file.
    4. read_table to read database tables.
    5. read_fwf to read data with the fixed-width format.
  3. Flexible reshaping and pivoting of data sets.

  4. Intelligent data alignment and integrated handling of missing data.

  5. Intelligent label-based slicing, fancy indexing, and subsetting of large data sets.

  6. Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets.

  7. High-performance merging and joining of data sets.

  8. Columns can be inserted and deleted from data structures for size mutability.

To use pandas in python, the pandas module needs to be imported into the environment using the import keyword. The Pandas method can be invoked using the format pandas.method name (pandas.read_cvc). Instead of using Pandas full name, the library can be imported using alias like the below code and the same alias can be used to invoke pandas methods as well.

 Import pandas as pdf   

In the above code, the pandas library is imported as alias pdf.

How to Access Dataset in Python

Installing Pandas Library

Pandas can be installed through conda-forge and PyPI as well. Here the below commands to install pandas.

Command to install Panda via conda

 conda install pandas   

Command to install Panda via PyPI

 pip install pandas   

Note that the Package Manager tool(pip) must have installed in your machine, follow the below steps to install the Pandas library.

  1. Press Windows Key+R key.
  2. Enter cmd.exe and press Enter.
  3. The command prompt will appear.
  4. Use the above appropriate command and press Enter.
  5. The installation process will be started, if everything going fine the package or module will be installed successfully.

How to Access Dataset in Python

Access dataset examples

Pandas Library provides the below method for the loading dataset. Here in the below, we have provided the example of popular and commonly used 3 methods.

  1. read_csv to read comma separated values.
  2. read_json to read data with JSON format.
  3. read_excel to read excel file.
  4. read_table to read database tables.
  5. read_fwf to read data with the fixed-width format.

Access Dataset in Python using read_excel

The below example describes how to access Dataset into panda DataFrame using read_excel.

The below example reads an excel sheet from 'D' drive, loads data from sheet1 into Pandas DataFrame and prints in the screen.

 # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from excel    
  data = xl.read_excel (r'D:\pandas.xlsx', sheet_name='Sheet1')    
  #Displaying data in the screen    
  print (data)   

The read_excel takes first parameter as full name of excel sheet and second parameter is sheet name to be read. Note that the first row of excel file is expected as header of dataset.

Access Dataset in Python using read_csv

The below example access dataset of cvc file using read_csv method.

  # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from cvc    
  data = xl.read_cvc (r'D:\pandas.csv')    
  #Displaying data in the screen    
  print(data)  

read_csv takes the file name as parameter, it uses comma as separator. If any other separator uses then parameter sep to be set to appropriate character. The first line in the dataset is expected to be header. if no header is there in dataset then the header parameter needs to be set to none.

Access Dataset in Python using read_json

  # Demonstration for Reading and Loading data from excel    
  # Importing Pandas library    
  import pandas as xl    
  #Loading data from cvc    
  data = xl.read_json (r'D:\pandas.json')    
  #Displaying data in the screen    
  print(data)  

Related Articles

  1. Read Excel data using Pandas DataFrame
  2. Python Connect to SQL Database
  3. How to install pyodbc window
  4. Install mysql for Python
  5. PIP Install on Windows
  6. Installing Python

Summary

Here in the above, we have discussed how to access data from various datasets like excel, csv, JSON etc. in python and load into Pandas DataFrame. I hope you have enjoyed it a lot.

Thanks