The purpose of this article is to explain what Pandas DataFrames are, how to implement and manipulate them, and some key operations they can perform.
Pandas DataFrame – what is it?
It is important to understand that a Pandas Dataframe is a 2-dimensional data structure, such as a 2 dimensional array, or a table with columns and rows like in a spreadsheet.
There are various ways to create a Pandas DataFrame. The most common method is to pass a dictionary of equal-length lists or NumPy arrays as the data parameter to the DataFrame constructor.
Utilizing Pandas, generate an employee_detail DataFrame as follows:
Make a Pandas student_detail DataFrame by calling the DataFrame() function as follows:
Pandas Dataframes – Row
The Pandas Dataframes are similar to a table with a row and a column, as you can observe from the output above.
One or more rows can be retrieved by Pandas through the loc attribute
Display the first row (0th index) of the employee_detail DataFrame:
Show the details of the fourth index student:
Retrieve the data of the first three employees from the employee_detail data set:
From the student_detail data set display the data of index 3 and 4:
You can provide identity to your custom indexes with the index argument.
Assign an individual row a name by providing a list of names:
Customize the indexes of the course_detail data set:
By referencing the loc attribute, you can retrieve the requested row(s).
Show the information of the “102” index student:
From the course_detail data set, retrieve the data of index “9991” and “9994”:
Files Data In DataFrame
Pandas can load data sets that are contained in files into DataFrames.
Utilizing a CSV file as an input for a DataFrame, you can load a comma separated value file as follows:
Set “RANKING” as an index title:
The above example code reads a CSV file named ‘language_data.csv’ using the read_csv() function from the pandas library and creates a pandas DataFrame called mrx_df. The set_index() method is then used to set the ‘RANKING‘ column as the index of the DataFrame.
The resulting DataFrame mrx_df is then printed to the console using the print() function. This DataFrame will contain all the data from the CSV file, but with the ‘RANKING’ column used as the index.
By setting the ‘RANKING’ column as the index, the DataFrame can be easily and efficiently queried based on the ranking of different programming languages. This makes it easier to perform analyses and visualizations on the data.