How to read CSV files in Python

Python provides built-in csv module using which you can read csv files. In case, you don't know what is CSV? Then let me explain it to you.

CSV(Comma-Separated Values) is a simple way of storing text data in a file. It is generally used for storing tabular data. Here, each line contains data. Each data consists of one or more than one fields that are separated by commas. Below is an example of csv formatted data-

Date,Open,High,Low,Close,Volume
5/18/2018,1581.33,1583.5903,1572.1,1574.37,2612564
5/17/2018,1580.56,1594.04,1573,1581.76,2131238
5/16/2018,1577.5,1594.43,1576.67,1587.28,2535458
5/15/2018,1587.8047,1587.8047,1565.2201,1576.12,5066420
5/14/2018,1604,1611.18,1600.05,1601.54,2460302
5/11/2018,1610.99,1611.1,1597.891,1602.91,2258583
5/10/2018,1608.475,1615.6,1603.44,1609.08,2785394

As you can see above, csv files are typical for humans to read but reader() method of csv module will make it easier for us to read it and help in the data analysis process.

Example of reading CSV file in Python

#importing csv module
import csv
#specify csv file name
filename = 'amzn.csv'
#creating header and rows list
header = []
rows = []
#reading csv file
with open(filename, 'r') as mycsvfile:
	#created a stock reader object
	amznreader = csv.reader(mycsvfile)

	#extracting the field names from the first row
	header = amznreader.__next__()

	#extracting data from rows one by one by iterating through amznreader
	for row in amznreader:
		rows.append(row)
	
	print('Total number of rows: %d'%amznreader.line_num)

print('Fields present in header are: '+', '.join(field for field in header))
print('Data present in the first 10 rows are:')
for row in rows[:10]:
	for col in row:
		print("%10s"%col,end='')
	print('\n')

Output

How to read CSV file in Python

Run the above program by keeping amzn.csv in the same directory otherwise you will get an error. If you want to download amzn.csv then click here.

Explanation of the above code:

with open(filename, 'r') as mycsvfile:
	amznreader = csv.reader(mycsvfile)

Here, we have opened amzn.csv file in read mode. The file object is represented as mycsvfile. Later, mycsvfile object is transformed to csv.reader object and saved it as amznreader. Here I am using amznreader because I am processing amzn.csv, if you are analyzing weather data, then you can name it weatherreader.

header = amznreader.__next__()

amznreader is an iterable object and when we call __next__() method on it, it returns the current row and moves the iterator to the next row. Here, we have saved the fields name in a header list.

for row in amznreader:
	rows.append(row)

In this code snippet, we are iterating the remaining rows one by one and appending each row to a list called rows.

print('Total number of rows: %d'%amznreader.line_num)

By calling line_num, you can print the total number of rows in a csv file.

Situations that you may encounter while reading CSV files

Following are the situations that you may find while dealing with CSV files-

  • We all know that fields in CSV file are separated by commas. But you may find certain files in which fields are separated by semicolons(;) or pipe symbol(|) or colons(:).
  • ProductName|ProductCategory|Price|Retailer
    Samsung TV 60 LED|TV|1200|Bestbuy
    iPad Air|Tablet|499|Target
    Samsung Galaxy Tab 4|Tablet|399|Walmart
    Dell XP 13|Laptop|1099|Amazon
    HTC One	Smart|Phone|299|Target
  • You may find whitespaces after the delimiter. In the below example, comma is the delimiter and whitespaces are present after Product field.
  • Product, 	Price, Rating
    HTC One, 199,5
    Surface 3,  499,3
    Lenovo Y50,  1349,4
    HP Pavilion 15.6,    599, 4
  • In some CSV files, you may find that certain fields are enclosed in quotes.
  • ProductName,ProductCategory,Price,Retailer
    "Samsung TV 60 LED",TV,1200,Bestbuy
    "iPad Air",Tablet,499,Target
    "Samsung Galaxy Tab 4",Tablet,399,Walmart
    "Dell XP 13",Laptop,1099,Amazon

    To read such type of CSV files, Python provides csv.register_dialect() method. In this method, you can pass a set of parameters specific to a particular CSV file. Let's see an example of this method-

    #importing csv module
    import csv
    #specify csv file name
    filename = 'products.csv'
    csv.register_dialect(
    	'mydialect',
    	delimiter = '|',
    	skipinitialspace = True,
    	quoting = csv.QUOTE_ALL
    )
    #reading csv file
    with open(filename, 'r') as mycsvfile:
    	#created a stock reader object
    	productsreader = csv.reader(mycsvfile, dialect='mydialect')
    
    	#extracting data from rows one by one by iterating through productsreader
    	for row in productsreader:
    		print("%s %s %s %s"%(row[0],row[1],row[2],row[3]))

    Output of the above program-

    Python read csv file

    Content of products.csv file-

    ProductName| ProductCategory| Price|   Retailer
    "Samsung TV 60 LED"|TV| 1200|	Bestbuy
    "iPad Air"| Tablet|499| Target
    "Samsung Galaxy Tab 4"|Tablet|   399|Walmart
    "Dell XP 13"|Laptop|1099|Amazon

    Parameters that you can pass to register_dialect() method

    Parameters Description
    delimiter It is used to specify the delimiter. Example- delimiter=';'
    skipinitialspace It is used to remove whitespaces after the delimiter. You can pass True or False to it. By default, its value is False.
    quoting When fields are enclosed in double quotes then use this parameter to get rid of it.