Pandas DataFrames

The purpose of this article is to explain what Pandas DataFrames are, how to implement and manipulate them, and some key operations they can perform.

Pandas DataFrame – what is it?

It is important to understand that a Pandas Dataframe is a 2-dimensional data structure, such as a 2 dimensional array, or a table with columns and rows like in a spreadsheet.

There are various ways to create a Pandas DataFrame. The most common method is to pass a dictionary of equal-length lists or NumPy arrays as the data parameter to the DataFrame constructor.

Utilizing Pandas, generate an employee_detail DataFrame as follows:

Example: 

import pandas as pds employee_detail = { "ID": [1, 2, 3, 4, 5], "NAME": ["Harry", "Mike", "Steve", "Jonathan", "Dustin"] } #load the above employee_detail data into a DataFrame object as follows: mrx_df = pds.DataFrame(employee_detail) print(mrx_df)

Make a Pandas student_detail DataFrame by calling the DataFrame() function as follows:

Example: 

import pandas as pds student_detail = { "NAME": ["Joe", "Kane", "Lia", "Kate", "Tim"], "AGE": [19, 20, 19, 21, 20] } #load the above student_detail data into a DataFrame object as follows: mrx_df = pds.DataFrame(student_detail) print(mrx_df)


Pandas Dataframes – Row

The Pandas Dataframes are similar to a table with a row and a column, as you can observe from the output above.

One or more rows can be retrieved by Pandas through the loc attribute

Display the first row (0th index) of the employee_detail DataFrame:

Example: 

import pandas as pds employee_detail = { "ID": [1, 2, 3, 4, 5], "NAME": ["Harry", "Mike", "Steve", "Jonathan", "Dustin"] } mrx_df = pds.DataFrame(employee_detail) print(mrx_df.loc[0])

Show the details of the fourth index student:

Example: 

import pandas as pds student_detail = { "NAME": ["Joe", "Kane", "Lia", "Kate", "Tim"], "AGE": [19, 20, 19, 21, 20] } mrx_df = pds.DataFrame(student_detail) print(mrx_df.loc[4])
Reminder: As a result of the above example, we get a Pandas Series.

Retrieve the data of the first three employees from the employee_detail data set:

Example: 

import pandas as pds employee_detail = { "ID": [1, 2, 3, 4, 5], "NAME": ["Harry", "Mike", "Steve", "Jonathan", "Dustin"] } mrx_df = pds.DataFrame(employee_detail) print(mrx_df.loc[[0,1,2]])

From the student_detail data set display the data of index 3 and 4:

Example: 

import pandas as pds student_detail = { "NAME": ["Joe", "Kane", "Lia", "Kate", "Tim"], "AGE": [19, 20, 19, 21, 20] } mrx_df = pds.DataFrame(student_detail) print(mrx_df.loc[[3,4]])
Reminder: The output of applying [] is a Pandas DataFrame.

Named Indexes

You can provide identity to your custom indexes with the index argument.

Assign an individual row a name by providing a list of names:

Example: 

import pandas as pds student_detail = { "NAME": ["Joe", "Kane", "Lia", "Kate", "Tim"], "AGE": [19, 20, 19, 21, 20] } mrx_df = pds.DataFrame(student_detail, index = ["101", "102", "103", "104", "105"]) print(mrx_df)

Customize the indexes of the course_detail data set:

Example: 

import pandas as pds course_detail = { "COURSE NAME": ["Data Structures", "Object Oriented Programming", "Database Management", "Artificial Intelligence"], "CREDIT HOURS": [3, 2, 3, 2] } mrx_df = pds.DataFrame(course_detail, index = ["9991", "9992", "9993", "9994"]) print(mrx_df)

By referencing the loc attribute, you can retrieve the requested row(s).

Show the information of the “102” index student:

Example: 

import pandas as pds student_detail = { "NAME": ["Joe", "Kane", "Lia", "Kate", "Tim"], "AGE": [19, 20, 19, 21, 20] } mrx_df = pds.DataFrame(student_detail, index = ["101", "102", "103", "104", "105"]) #Access the customized index: print(mrx_df.loc["102"])

From the course_detail data set, retrieve the data of index “9991” and “9994”:

Example: 

import pandas as pds course_detail = { "COURSE NAME": ["Data Structures", "Object Oriented Programming", "Database Management", "Artificial Intelligence"], "CREDIT HOURS": [3, 2, 3, 2] } mrx_df = pds.DataFrame(course_detail, index = ["9991", "9992", "9993", "9994"]) #Access the customized indexes: print(mrx_df.loc[["9991", "9994"]])

Files Data In DataFrame

Pandas can load data sets that are contained in files into DataFrames.

Utilizing a CSV file as an input for a DataFrame, you can load a comma separated value file as follows:

Example: 

import pandas as pds mrx_df = pds.read_csv('language_data.csv') print(mrx_df)

Set “RANKING” as an index title:

Example: 

import pandas as pds mrx_df = pds.read_csv('language_data.csv').set_index("RANKING") print(mrx_df)

Example Explanation

The above example code reads a CSV file named ‘language_data.csv’ using the read_csv() function from the pandas library and creates a pandas DataFrame called mrx_df. The set_index() method is then used to set the ‘RANKING‘ column as the index of the DataFrame.

The resulting DataFrame mrx_df is then printed to the console using the print() function. This DataFrame will contain all the data from the CSV file, but with the ‘RANKING’ column used as the index.

By setting the ‘RANKING’ column as the index, the DataFrame can be easily and efficiently queried based on the ranking of different programming languages. This makes it easier to perform analyses and visualizations on the data.

We value your feedback.
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0

Subscribe To Our Newsletter
Enter your email to receive a weekly round-up of our best posts. Learn more!
icon

Leave a Reply

Your email address will not be published. Required fields are marked *