Python Tutorial #2 – Read Data in Python Using Panda
Hello Again, In this article i will show you how to read data in python and manipulate it, but first of all, if you havent yet install the python tool, Click here to read how to get ready.
How to write code in python language?
We will use spyder to write python program, spyder is a very nice IDE for python development. You have easy shortcut keys to run codes. you can either run entire code or run the selected lines of code. it has a good interface.
Declaring variables in python
You dont need to write separate line for delcaring variable in python. you just take any name and assign values or objects to it. python itself will take care of declaring varibale in memory. you need to remember that python is a case sensitive language. if you take variable in small letter, and later use it in any other case (proper or uppar) it won’t return anything. becuase that will be taken as another variable.
What Library/Package should you use?
Panda is package which makes reading and writing data python very easy. lets see what you can you do with panda
How to use Panda Library?
To use any library in python, we need to import it first. once you import any library, it will actually import all the functionality written in that library and we can use the library name or alias name to access them. see example here
import panda as pd
Now, select this line and hit Ctrl+Enter
Read data from excel
Once panda is imported, we will use it alias name pd to use it methods. see the example below.
varDataFrame = pd.read_excel(“ClassArray.xlsm”)
In above code, we are using pd.read_excel() where we need to supply the excel file path, but you notice that i am only using the file name, because i have set the folder in IDE to read all the files from there, to do so, see the screenshot below.
Once you set the folder here, you can use any file name in that folder to read data. but if you dont want to use this method you need to give full path of file. see the example below.
varDataFrame = pd.read_excel(r”C:\Users\User\Desktop\ClassArray.xlsm”)
Python by default use the backslash in file paths, thats why we need to r following by path here. Now select this line and hit Ctrl+Enter to execute this line.
Notice that we didn’t specify which sheet and which range panda should read. so by default , it will read the used range on the first tab and will create a dataframe object in python.
If you want to read data from specific tab you can specify that. you can even specify if you want to skip rows from the top
varDataFrame = pd.read_excel(“ClassArray.xlsm”,sheets_name=”sheet1″,skiprows=5)
How to concat two dataframes?
You can use panda method to concat two dataframes objects into one. first of all, you need to read two data sets into two variables
df_1 = pd.read_excel(“Book1.xlsx”)
df_2 = pd.read_excel(“Book2.xlsx”)
there are two ways you can concatenate these dataframs.
1. append second dataframe at bottom of first dataframe
df_append_bottom = pd.concat([df_1,df_2],axis=0)
2. you can concat them side by side.
df_append_right = pd.concat([df_1,df_2],axis=1)
axis is what makes the differnece, by default this axis is 0, so you can skip it if you want to concat by first method. select all these four lines and hit ctrl+enter.
When you hit ctrl enter, you will the variables listed into variable explorer window in sypder like this
You can double click on any dataframe variable to see what data they contains.
How to merge two dataframes?
Dont get confused with concat and merge, concat is just about to combine both data set and merege is about to match and then join them together, merge works like sql joins, you can apply left, right, outer joins here. see the example below
df_merge_left = pd.merge(df_1,df_2,on=’Header 1′,how=’left’)
df_merge_left = pd.merge(df_1,df_2,on=’Header 1′,how=’right’)
df_merge_outer = pd.merge(df_1,df_2,on=’Header 1′,how=’outer’)
if matching column name is different in both dataset then we need to use left_on and right_on to merge both dataframes.
df_merge_left = pd.merge(df_1,df_2,left_on=’Country’,right_on=’CountryName’,how=’left’)
Panda is a amazing library to work with data, in my next post i will share cleaning and modifying data, i will tell you how to export dataframe in different file formats as output. follow the blogs