Python Tutorial #2 – Read Data in Python Using Panda

Hello Again, In this article i will show you how to read data in python and manipulate it, but first of all, if you havent yet install the python tool, Click here to read how to get ready.

How to write code in python language?

We will use spyder to write python program, spyder is a very nice IDE for python development. You have easy shortcut keys to run codes. you can either run entire code or run the selected lines of code. it has a good interface.

Declaring variables in python

You dont need to write separate line for delcaring variable in python.  you just take any name and assign values or objects to it. python itself will take care of declaring varibale in memory. you need to remember that python is a case sensitive language. if you take variable in small letter, and later use it in any other case (proper or uppar) it won’t return anything. becuase that will be taken as another variable.

What Library/Package should you use?

Panda is package which makes reading and writing data python very easy. lets see what you can you do with panda

How to use Panda Library?

To use any library in python, we need to import it first. once you import any library, it will actually import all the functionality written in that library and we can use the library name or alias name to access them. see example here

import panda as pd

Now, select this line and hit Ctrl+Enter

Read data from excel

Once panda is imported, we will use it alias name pd to use it methods.  see the example below.

varDataFrame = pd.read_excel(“ClassArray.xlsm”)

In above code, we are using pd.read_excel() where we need to supply the excel file path, but you notice that i am only using the file name, because i have set the folder in IDE to read all the files from there, to do so, see the screenshot below.

Python Tutorial

Once you set the folder here, you can use any file name in that folder to read data. but if you dont want to use this method you need to give full path of file. see the example below.

varDataFrame = pd.read_excel(r”C:\Users\User\Desktop\ClassArray.xlsm”)

Python by default use the backslash in file paths, thats why we need to r following by path here. Now select this line and hit Ctrl+Enter to execute this line.

Notice that we didn’t specify which sheet and which range panda should read. so by default , it will read the used range on the first tab and will create a dataframe object in python.

If you want to read data from specific tab you can specify that. you can even specify if you want to skip rows from the top

varDataFrame = pd.read_excel(“ClassArray.xlsm”,sheets_name=”sheet1″,skiprows=5)

How to concat two dataframes?

You can use panda method to concat two dataframes objects into one. first of all, you need to read two data sets into two variables

df_1 = pd.read_excel(“Book1.xlsx”)
df_2 = pd.read_excel(“Book2.xlsx”)

there are two ways you can concatenate these dataframs.

1.  append second dataframe at bottom of first dataframe

df_append_bottom = pd.concat([df_1,df_2],axis=0)

2. you can concat them side by side.

df_append_right = pd.concat([df_1,df_2],axis=1)

axis is what makes the differnece, by default this axis is 0, so you can skip it if you want to concat by first method. select all these four lines and hit ctrl+enter.

When you hit ctrl enter, you will the variables listed into variable explorer window in sypder like this

Python Tutorial

You can double click on any dataframe variable to see what data they contains.

How to merge two dataframes?

Dont get confused with concat and merge, concat is just about to combine both data set and merege is about to match and then join them together, merge works like sql joins, you can apply left, right, outer joins here. see the example below

df_merge_left = pd.merge(df_1,df_2,on=’Header 1′,how=’left’)
df_merge_left = pd.merge(df_1,df_2,on=’Header 1′,how=’right’)

df_merge_outer = pd.merge(df_1,df_2,on=’Header 1′,how=’outer’)

if matching column name is different in both dataset then we need to use left_on and right_on to merge both dataframes.

df_merge_left = pd.merge(df_1,df_2,left_on=’Country’,right_on=’CountryName’,how=’left’)

Panda is a amazing library to work with data, in my next post i will share cleaning and modifying data, i will tell you how to export dataframe in different file formats as output. follow the blogs

Subscrible for newsletters

2 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *