Aim: To implement the Pandas library in Python to create and manipulate data structures, specifically Series and DataFrames, and to explore essential data inspection methods (describe, head, tail, info) for preliminary data analysis.
#week1a
import pandas as pd
series = pd.Series([65, 97, 83, 74, 58], index=['john', 'kim', 'bush', 'putin', 'biden'])
print(series)
#week1b
import pandas as pd
dict = {'john': 75, 'bush': 62, 'modi': 78, 'putin': 84, 'biden': 58}
series = pd.Series(dict)
print(series)
#week1c
import pandas as pd
cai={'roll':[1,2,3],'name':['ab','cd','ef'] }
info = pd.DataFrame(cai)
print(info)
#week1d
import pandas as xy
data={'name':['a','b','c','b','z','y','m','t','w','10'],'place':['kkd','kkd','rjy','kkd','rjy','kkd','kkd','ptp','rjy','kkd'],'age':[None,18,14,24,12,24,22,18,24,13]}
xyz=xy.DataFrame(data)
print('The info summary is: \n',xyz)
print('Describe function \n',xyz.describe())
print('Head fucntion \n',xyz.head())
print('tail function \n',xyz.tail())
print('info function \n',xyz.info())
-----> For obtaining info, all the integers are converted to Float,
-----> None becomes NaN ( Not A Number)
Questions:
What is the Pandas library, and why is it preferred over basic Python data structures for data analysis?
What is a Pandas Series? How is it different from a Python list or NumPy array?
What is a Pandas DataFrame, and how does it conceptually differ from a Series?
How do you create a Series and a DataFrame in Pandas? Mention at least two data sources for each.
What is the role of an index in a Series and in a DataFrame?
What does the head() method do? When would you typically use it during data analysis?
What is the difference between head() and tail()? Give a practical use case for tail().
What information does the info() method provide, and why is it important before performing data analysis?
What does the describe() method return? Which types of columns are included by default?
How do head(), tail(), info(), and describe() together help in preliminary data inspection?
Answers:
Pandas is a Python library used for data analysis and manipulation. It provides fast, flexible data structures like Series and DataFrames, which are more efficient than basic Python lists and dictionaries for tabular data.
A Pandas Series is a one-dimensional labeled array capable of holding any data type. Unlike a list, it has an index, and unlike a NumPy array, it can hold mixed data types.
A Pandas DataFrame is a two-dimensional labeled data structure with rows and columns. A Series represents a single column, while a DataFrame represents a full table of data.
A Series can be created from a list, array, or dictionary. A DataFrame can be created from dictionaries, lists of lists, CSV files, or Excel files.
The index labels and uniquely identifies each element in a Series and each row in a DataFrame. It enables fast data access and alignment operations.
The head() method displays the first five rows of a DataFrame by default. It is used to quickly inspect the structure and values of a dataset.
The tail() method displays the last five rows of a DataFrame. It is useful for checking recently added data or the end of a dataset.
The info() method provides the number of rows, column names, data types, and non-null counts. It helps identify missing values and data type issues.
The describe() method returns statistical summaries such as count, mean, standard deviation, minimum, and maximum. By default, it includes only numerical columns.
head() and tail() show sample records, info() shows structure and data quality, and describe() provides statistical insights. Together, they enable quick and effective preliminary data analysis.