Python provides built-in csv module using which you can read csv files. In case, you don't know what is CSV? Then let me explain it to you.
CSV(Comma-Separated Values) is a simple way of storing text data in a file. It is generally used for storing tabular data. Here, each line contains data. Each data consists of one or more than one fields that are separated by commas. Below is an example of csv formatted data-
Date,Open,High,Low,Close,Volume 5/18/2018,1581.33,1583.5903,1572.1,1574.37,2612564 5/17/2018,1580.56,1594.04,1573,1581.76,2131238 5/16/2018,1577.5,1594.43,1576.67,1587.28,2535458 5/15/2018,1587.8047,1587.8047,1565.2201,1576.12,5066420 5/14/2018,1604,1611.18,1600.05,1601.54,2460302 5/11/2018,1610.99,1611.1,1597.891,1602.91,2258583 5/10/2018,1608.475,1615.6,1603.44,1609.08,2785394
As you can see above, csv files are typical for humans to read but reader() method of csv module will make it easier for us to read it and help in the data analysis process.
#importing csv module import csv #specify csv file name filename = 'amzn.csv' #creating header and rows list header = [] rows = [] #reading csv file with open(filename, 'r') as mycsvfile: #created a stock reader object amznreader = csv.reader(mycsvfile) #extracting the field names from the first row header = amznreader.__next__() #extracting data from rows one by one by iterating through amznreader for row in amznreader: rows.append(row) print('Total number of rows: %d'%amznreader.line_num) print('Fields present in header are: '+', '.join(field for field in header)) print('Data present in the first 10 rows are:') for row in rows[:10]: for col in row: print("%10s"%col,end='') print('\n')
Output
Run the above program by keeping amzn.csv in the same directory otherwise you will get an error. If you want to download amzn.csv then click here.
Explanation of the above code:
with open(filename, 'r') as mycsvfile: amznreader = csv.reader(mycsvfile)
Here, we have opened amzn.csv file in read mode. The file object is represented as mycsvfile. Later, mycsvfile object is transformed to csv.reader object and saved it as amznreader. Here I am using amznreader because I am processing amzn.csv, if you are analyzing weather data, then you can name it weatherreader.
header = amznreader.__next__()
amznreader is an iterable object and when we call __next__() method on it, it returns the current row and moves the iterator to the next row. Here, we have saved the fields name in a header list.
for row in amznreader: rows.append(row)
In this code snippet, we are iterating the remaining rows one by one and appending each row to a list called rows.
print('Total number of rows: %d'%amznreader.line_num)
By calling line_num, you can print the total number of rows in a csv file.
Following are the situations that you may find while dealing with CSV files-
ProductName|ProductCategory|Price|Retailer Samsung TV 60 LED|TV|1200|Bestbuy iPad Air|Tablet|499|Target Samsung Galaxy Tab 4|Tablet|399|Walmart Dell XP 13|Laptop|1099|Amazon HTC One Smart|Phone|299|Target
Product, Price, Rating HTC One, 199,5 Surface 3, 499,3 Lenovo Y50, 1349,4 HP Pavilion 15.6, 599, 4
ProductName,ProductCategory,Price,Retailer "Samsung TV 60 LED",TV,1200,Bestbuy "iPad Air",Tablet,499,Target "Samsung Galaxy Tab 4",Tablet,399,Walmart "Dell XP 13",Laptop,1099,Amazon
To read such type of CSV files, Python provides csv.register_dialect() method. In this method, you can pass a set of parameters specific to a particular CSV file. Let's see an example of this method-
#importing csv module import csv #specify csv file name filename = 'products.csv' csv.register_dialect( 'mydialect', delimiter = '|', skipinitialspace = True, quoting = csv.QUOTE_ALL ) #reading csv file with open(filename, 'r') as mycsvfile: #created a stock reader object productsreader = csv.reader(mycsvfile, dialect='mydialect') #extracting data from rows one by one by iterating through productsreader for row in productsreader: print("%s %s %s %s"%(row[0],row[1],row[2],row[3]))
Output of the above program-
Content of products.csv file-
ProductName| ProductCategory| Price| Retailer "Samsung TV 60 LED"|TV| 1200| Bestbuy "iPad Air"| Tablet|499| Target "Samsung Galaxy Tab 4"|Tablet| 399|Walmart "Dell XP 13"|Laptop|1099|Amazon
Parameters | Description |
---|---|
delimiter | It is used to specify the delimiter. Example- delimiter=';' |
skipinitialspace | It is used to remove whitespaces after the delimiter. You can pass True or False to it. By default, its value is False. |
quoting | When fields are enclosed in double quotes then use this parameter to get rid of it. |