This is the 6th article in the ‘Data Science with Python’ series. In this article, we will learn about the Pandas library of Python extensively used for data science.
Table of Contents
Introduction of Pandas in Python
Pandas Installation
DataFrame Creation
Pandas Operations on DataFrame
Final Thoughts
Introduction of Pandas in Python
Pandas is the most useful library in Python used for data science. It is very versatile and very handy to work with series and data frames. Pandas help in the basics to advance data analysis operations.
A Series is essentially a column, and a data frame is a two-dimensional data structure shown in tabular format in rows and columns. Various datasets are stored in a data frame structure and many operations can be performed using Pandas on Dataframe.
Pandas Installation
To install and use Pandas, we first need to install a Python environment on our system. To install Python IDE refer to our first article ‘Data Science with Python: Introduction‘ where we have explained this process in detail. After successfully installing IDE, open the command prompt and use pip to install Pandas as shown below.
pip install pandas |
DataFrame Creation
To experiment and learn various operations in Pandas, we need a sample data frame with data. Let’s first create a data frame using Python Code.
- The first step is to create data for the data frame. Below is the Python code for the same.
Python Code: data = { |
- Now, the second step is to import the pandas’ library as shown below:
import pandas as pd |
- The third step is to pass the data to the dataframe constructor:
Python Code: import pandas as pd data_df = pd.DataFrame(data) Output: |
Pandas Operations on DataFrame
Let’s perform some basic operations on the dataframe.
1. Saving DataFrame to CSV
Data frame data can be easily saved in CSV file format using pandas.
Python Code: data_df.to_csv(‘data.csv’) |
This will save the dataframe to a CSV file at a default location.
2. Reading CSV to the DataFrame
With the help of Pandas, CSV files(Excel files) can easily be read and stored in data frame format in Python. Let’s read the csv file saved in the above section to the dataframe.
Python Code: df = pd.read_csv(‘data.csv’, index_col=0) Output: |
See, we get the same data, which we saved earlier.
3. Viewing your Data
Since data in Machine Learning is very huge. We can have glimpses of data using pandas commands.
Python Code: print(“Showing top 5 rows of df \n”, data_df.head()) #Shows top 5 rows of the dataframe print(“Showing bottom 5 rows of df \n”, data_df.tail()) #shows the bottom 5 rows of the dataframe Output: |
4. Extract Information
We get information about data using two commands and they give different information.
- The info () function of Pandas gives information about the total count of rows and columns, data types of features, and non-null values in case of no missing data.
Python Code: data_df.info() Output: |
- The Pandas describe() function statistical information about data features like count, mean, standard deviation, min and max values, and different percentiles values.
Python Code: data_df.describe() Output: |
5. Data Frame Indexing
Indexing is the same in the data frame as we used for the matrix. iloc and loc are functions of Pandas used to access particular data from the data frame.
Python Code: print(“Accessing first row and first column \n”,data_df.iloc[0][0]) #Accessing 1st row, 1st column using `iloc[]` print(“Accessing all rows with ‘height’ column \n”,data_df.loc[:,’height’]) #Accessing all rows, ‘height’ column print(“Accessing 2nd row with all columns \n”,data_df.loc[[2],:]) #Accessing 2nd row and all columns Output: |
Final Thoughts
These are the basics of Pandas operation use in Data Science. As we learn further, we will explore more and more pandas’ operations used in Data Science and Machine Learning. Keep practicing pandas’ various operations in Python and hands-on practice is the key to better understanding.
Stay Tuned!!
Learn the basics of NumPy and its numerical operations from our last article Data Science with Python: Numpy
Keep learning and keep implementing!!
Explained in very simple words….. very good for beginners