It’s pretty obvious to summon the fact that you wouldn’t have clicked on this article if you have no understanding of the basics and intermediate level concepts of Python. You have? Then it is fair enough to go ahead with this article.

You are confused with some basic stuffs, or perhaps forgot about it? I got you covered with my previous articles. Check out the notebook for beginners to make your base strong and then climb up the ladder with intermediate level notebooks which I have divided it in 2 parts (1st part and 2nd part) so that you don’t find it lengthy and get exhausted in the midway.

ATTENTION!!!
Pandas will drop support for Python 2 from 1st January 2019. This comes after Python’s core team announced they will stop support for Python 2.7 from 2020 onward. Hence, start working on Python 3.X.X.

As always, I have embedded the notebook at the end of the article. While writing this article, I faced issue in the code snippets — unable to print the Data Frames properly. So, it’s better if you go through the notebook rather than going through this article.

For explaining some of the functionalities with pandas, I have used a Google Play Store data-set from Kaggle. The data-set is a Web scraped data of 10,000 Play Store apps for analyzing the Android market. I have performed some of the mentioned methods for each operation.

Table of Content:

  • Pandas
  • Importing Data
  • Creating Test Object
  • Viewing Data
  • Data Cleaning
  • Selection
  • Filter, Sort & Group by
  • Iteration
  • Join, Merging
  • Statistics
  • Visualization
  • Exporting Data

Pandas

Pandas is an open source data analysis library for providing easy-to-use data structures and data analysis tools.
DataFrame is a m*n vector where
* m is the number of rows
* n is the number of columns

Series is a m*1 vector. Hence, each column in DataFrame is known as a pandas series.

NOTE
* df — A pandas DataFrame object
* s — A pandas Series object

Importing Data

>>> import pandas as pd
>>> import numpy as np
#read from csv
>>> df = pd.read_csv('google-play-store-apps/googleplaystore.csv')

Other ways of importing data depending on the file type.

  • pd.read_table(filename) — From a delimited text file (like TSV)
  • pd.read_excel(filename) — From an Excel file
  • pd.read_sql(query, connection_object) — Reads from a SQL table/database
  • pd.read_json(json_string) — Reads from a JSON file and extracts tables to a list of dataframes

Create Test Objects

  • pd.DataFrame(dict) — From a dict, keys for columns names, values for data as lists
  • pd.DataFrame(np.random.rand(20,5)) — 5 columns and 20 rows of random floats
  • pd.Series(my_list) — Creates a series from an iterable my_list
>>> df_dict = pd.DataFrame(columns=['City','State'], data = [['Kolkata','West Bengal'], ['Bangalore','Karnataka']])
df_dict
| |   City    | State       |
|0| Kolkata | West Bengal |
|1| Bangalore | Karnataka |

Viewing Data

  • df.head(n) — First n rows of the DataFrame [replace head with tail, you know what you will get]
  • df.shape — Number of rows and columns
  • df.info() — Index, Datatype and Memory
  • df.describe() — Summary statistics for numerical columns
  • df.apply(pd.Series.value_counts) — Unique values and counts for all columns

s.value_counts(dropna=False) — Views unique values and counts

>>> print("df shape\n")
>>> print(df.shape)
>>> print("\n================")
>>> print("df info\n")
>>> df.info()
df shape

(10841, 13)

================
df info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
App 10841 non-null object
Category 10841 non-null object
Rating 9367 non-null float64
Reviews 10841 non-null object
Size 10841 non-null object
Installs 10841 non-null object
Type 10840 non-null object
Price 10841 non-null object
Content Rating 10840 non-null object
Genres 10841 non-null object
Last Updated 10841 non-null object
Current Ver 10833 non-null object
Android Ver 10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB

Selection

  • df[col] or df.col– Returns column with label col as Series
  • df[[col1, col2]] — Returns Columns as a new DataFrame
  • s.iloc[0] — Selection by position
  • s.loc[0] — Selection by index
  • df.loc[:, :] and df.iloc[:, :] — First argument represents the number of rows and the second for columns
  • df.ix[0:a, 0:b] — Arguments notation is same as above but returns a rows and (b-1) columns [deprecated in Python 3]
# row 0, all columns
>>> df.loc[0, :]
App               Photo Editor & Candy Camera & Grid & ScrapBook
Category ART_AND_DESIGN
Rating 4.1
Reviews 159
Size 19M
Installs 10,000+
Type Free
Price 0
Content Rating Everyone
Genres Art & Design
Last Updated January 7, 2018
Current Ver 1.0.0
Android Ver 4.0.3 and up
Name: 0, dtype: object

# rows 0 to 4; all columns
>>> df.loc[0:4,:] # : for columns is optional here since we are asking for all columns

# rows 0 to 4; selective columns
>>> df.loc[0:4,['App','Category']]
| | Apps | Category |
|0|Photo Editor & Candy Camera & Grid & ScrapBook |ART_AND_DESIGN|
|1|Coloring book moana |ART_AND_DESIGN|
|2|U Launcher Lite – FREE Live Cool Themes, Hide |ART_AND_DESIGN|
|3|Sketch - Draw & Paint |ART_AND_DESIGN|
|4|Pixel Draw - Number Art Coloring Book |ART_AND_DESIGN|

# rows 0 to 4; selective columns using iloc
>>> df.iloc[0:4,[0,1]]
| | Apps | Category |
|0|Photo Editor & Candy Camera & Grid & ScrapBook |ART_AND_DESIGN|
|1|Coloring book moana |ART_AND_DESIGN|
|2|U Launcher Lite – FREE Live Cool Themes, Hide |ART_AND_DESIGN|
|3|Sketch - Draw & Paint |ART_AND_DESIGN|

NOTE:

  • In loc, we are mentioning the column names for selection, while in iloc we are specifying the column number
  • In loc, rows are getting printed including the upper bound, while in iloc, it is excluding it

Also NOTE the following:

  • For creating a new DataFrame using column names

df[[col1, col2]]

is same as

df.loc[:,[col1, col2]]

  • For printing the first 5 rows of the DataFrame

df[0:n]

is same as

df.iloc[0:n, :]

Data Cleaning

  • df.drop([col1, col2, col3], inplace = True, axis=1) — Remove set of column(s)
  • df.columns = [‘a’,’b’,’c’] — Renames columns
  • df.isnull() — Checks for null Values, Returns Boolean DataFrame
  • df.isnull().any() — Returns boolean value for each column, gives True if any null value detected corresponding to that column
  • df.dropna() — Drops all rows that contain null values
  • df.dropna(axis=1) — Drops all columns that contain null values
  • df.fillna(x) — Replaces all null values with x
  • s.replace(1,’one’) — Replaces all values equal to 1 with ‘one’
  • s.replace([1,3], [‘one’,’three’]) — Replaces all 1 with ‘one’ and 3 with ‘three’
  • df.rename(columns = lambda x: x + ‘_1’) — Mass renaming of columns
  • df.rename(columns = {‘old_name’: ‘new_name’}) — Selective renaming
  • df.rename(index = lambda x: x + 1) — Mass renaming of index
  • df[new_col] = df.col1 + ‘, ‘ + df.col2 — Add two columns to create a new column in the same DataFrame
>>> df.drop(['Category'], inplace=True, axis = 1)
>>> df_any_null = df.isnull().any()
>>> df_any_null
App               False
Rating True
Reviews False
Size False
Installs False
Type True
Price False
Content Rating True
Genres False
Last Updated False
Current Ver True
Android Ver True
dtype: bool
>>> df.dropna(axis=0, inplace=True)
>>> df_check_null = df.isnull().any()
>>> print(df.shape)
>>> df_check_null
(9360, 12)
App               False
Rating False
Reviews False
Size False
Installs False
Type False
Price False
Content Rating False
Genres False
Last Updated False
Current Ver False
Android Ver False
dtype: bool
# mass renaming of column where column names are made lower case and any blank spaces is replaced with _
>>> df_new_cols_name = df.rename(columns = lambda x: (x.lower()).replace(' ','_'))

Filter, Sort & Group By

  • df[df[col] > 0.5] — Rows where the values in col > 0.5
  • df[(df[col] > 0.5) & (df[col] < 0.7)] — Rows where 0.7 > col > 0.5
  • df.sort_values(col1) — Sorts values by col1 in ascending order
  • df.sort_values(col2,ascending=False) — Sorts values by col2 in descending order
  • df.sort_values([col1,col2],ascending=[True,False]) — Sorts values by col1 in ascending order then col2 in descending order
  • df.groupby(col) — Returns a groupby object for values from one column
  • df.groupby([col1,col2]) — Returns a groupby object values from multiple columns
  • df.groupby(col1)[col2].mean() — (Aggregation) Returns the mean of the values in col2, grouped by the values in col1
  • df.pivot_table(index=col1,values=[col2,col3],aggfunc=mean) — Creates a pivot table that groups by col1 and calculates the mean of col2 and col3
  • df.apply(np.mean) — Applies a function across each column
  • df.apply(np.max, axis=1) — Applies a function across each row
  • df.applymap(lambda arg(s): expression) — Apply the expression on each value of the DataFrame
  • df[col].map(lambda arg(s): expression) — Apply the expression on each value of the column col
>>> df_high_rating = df[(df['Rating'] > 4) & (df['Rating'] < 5)]
>>> print(df_high_rating.shape)
(6522, 12)
>>> print(df.groupby('Genres').size())
Genres
Action 358
Action;Action & Adventure 17
Adventure 73
Adventure;Action & Adventure 13
Adventure;Brain Games 1
Adventure;Education 2
Arcade 207
Arcade;Action & Adventure 15
Arcade;Pretend Play 1
Art & Design 55
Art & Design;Creativity 7
Art & Design;Pretend Play 2
Auto & Vehicles 73
Beauty 42
Board 41
Board;Action & Adventure 3
Board;Brain Games 15
Board;Pretend Play 1
Books & Reference 178
Books & Reference;Education 2
Business 303
Card 45
Card;Action & Adventure 2
Card;Brain Games 1
Casino 37
Casual 185
Casual;Action & Adventure 21
Casual;Brain Games 13
Casual;Creativity 7
Casual;Education 3
...
Puzzle;Education 1
Racing 93
Racing;Action & Adventure 20
Racing;Pretend Play 1
Role Playing 106
Role Playing;Action & Adventure 7
Role Playing;Brain Games 1
Role Playing;Pretend Play 5
Shopping 238
Simulation 194
Simulation;Action & Adventure 11
Simulation;Education 3
Simulation;Pretend Play 4
Social 259
Sports 333
Sports;Action & Adventure 4
Strategy 103
Strategy;Action & Adventure 2
Strategy;Creativity 1
Strategy;Education 1
Tools 732
Tools;Education 1
Travel & Local 225
Travel & Local;Action & Adventure 1
Trivia 28
Video Players & Editors 158
Video Players & Editors;Creativity 2
Video Players & Editors;Music & Video 3
Weather 75
Word 28
Length: 115, dtype: int64
>>> print(df_high_rating.groupby(['Rating','Type']).size())
Rating  Type
4.1 Free 675
Paid 32
4.2 Free 889
Paid 62
4.3 Free 1025
Paid 51
4.4 Free 1031
Paid 77
4.5 Free 964
Paid 73
4.6 Free 741
Paid 82
4.7 Free 446
Paid 53
4.8 Free 195
Paid 39
4.9 Free 81
Paid 6
dtype: int64
# store the grouped by table in a dataframe
>>> df_group_by = pd.DataFrame({'count': df_high_rating.groupby(['Rating','Type']).size()}).reset_index()
# iterating through groupby
>>> grouped = df_high_rating.groupby('Rating')
>>> for name,group in grouped:
if name >= 4.8: #only printing for rating 4.8 and 4.9
print(name)
print(group)
print("="*50)
# aggregation
>>> grouped = df.groupby('Genres')
>>> print(grouped['Rating'].agg(np.mean))
Genres
Action 4.285475
Action;Action & Adventure 4.311765
Adventure 4.180822
Adventure;Action & Adventure 4.423077
Adventure;Brain Games 4.600000
Adventure;Education 4.100000
Arcade 4.304348
Arcade;Action & Adventure 4.346667
Arcade;Pretend Play 4.500000
Art & Design 4.380000
Art & Design;Creativity 4.400000
Art & Design;Pretend Play 3.900000
Auto & Vehicles 4.190411
Beauty 4.278571
Board 4.292683
Board;Action & Adventure 4.033333
Board;Brain Games 4.340000
Board;Pretend Play 4.800000
Books & Reference 4.346067
Books & Reference;Education 4.200000
Business 4.121452
Card 4.086667
Card;Action & Adventure 4.300000
Card;Brain Games 4.400000
Casino 4.286486
Casual 4.150811
Casual;Action & Adventure 4.266667
Casual;Brain Games 4.469231
Casual;Creativity 4.314286
Casual;Education 4.266667
...
Puzzle;Education 4.600000
Racing 4.173118
Racing;Action & Adventure 4.300000
Racing;Pretend Play 4.500000
Role Playing 4.275472
Role Playing;Action & Adventure 4.342857
Role Playing;Brain Games 4.300000
Role Playing;Pretend Play 4.020000
Shopping 4.259664
Simulation 4.151546
Simulation;Action & Adventure 4.418182
Simulation;Education 4.366667
Simulation;Pretend Play 4.350000
Social 4.255598
Sports 4.236637
Sports;Action & Adventure 4.350000
Strategy 4.245631
Strategy;Action & Adventure 4.600000
Strategy;Creativity 4.400000
Strategy;Education 4.500000
Tools 4.046585
Tools;Education 4.500000
Travel & Local 4.109333
Travel & Local;Action & Adventure 4.100000
Trivia 4.039286
Video Players & Editors 4.063924
Video Players & Editors;Creativity 4.100000
Video Players & Editors;Music & Video 4.000000
Weather 4.244000
Word 4.410714
Name: Rating, Length: 115, dtype: float64

# applying multiple aggregation functions at once
>>> print(grouped['Rating'].agg([np.sum, np.mean, np.std]))
                                        sum      mean       std
Genres
Action 1534.2 4.285475 0.291353
Action;Action & Adventure 73.3 4.311765 0.172780
Adventure 305.2 4.180822 0.312542
Adventure;Action & Adventure 57.5 4.423077 0.148064
Adventure;Brain Games 4.6 4.600000 NaN
Adventure;Education 8.2 4.100000 0.000000
Arcade 891.0 4.304348 0.351323
Arcade;Action & Adventure 65.2 4.346667 0.306749
Arcade;Pretend Play 4.5 4.500000 NaN
Art & Design 240.9 4.380000 0.321685
Art & Design;Creativity 30.8 4.400000 0.404145
Art & Design;Pretend Play 7.8 3.900000 0.000000
Auto & Vehicles 305.9 4.190411 0.543692
Beauty 179.7 4.278571 0.362603
Board 176.0 4.292683 0.417367
Board;Action & Adventure 12.1 4.033333 0.057735
Board;Brain Games 65.1 4.340000 0.297129
Board;Pretend Play 4.8 4.800000 NaN
Books & Reference 773.6 4.346067 0.429046
Books & Reference;Education 8.4 4.200000 0.707107
Business 1248.8 4.121452 0.624422
Card 183.9 4.086667 0.708263
Card;Action & Adventure 8.6 4.300000 0.000000
Card;Brain Games 4.4 4.400000 NaN
Casino 158.6 4.286486 0.310163
Casual 767.9 4.150811 0.442218
Casual;Action & Adventure 89.6 4.266667 0.367877
Casual;Brain Games 58.1 4.469231 0.209701
Casual;Creativity 30.2 4.314286 0.291139
Casual;Education 12.8 4.266667 0.152753
... ... ... ...
Puzzle;Education 4.6 4.600000 NaN
Racing 388.1 4.173118 0.327089
Racing;Action & Adventure 86.0 4.300000 0.194666
Racing;Pretend Play 4.5 4.500000 NaN
Role Playing 453.2 4.275472 0.340815
Role Playing;Action & Adventure 30.4 4.342857 0.229907
Role Playing;Brain Games 4.3 4.300000 NaN
Role Playing;Pretend Play 20.1 4.020000 0.426615
Shopping 1013.8 4.259664 0.404577
Simulation 805.4 4.151546 0.401710
Simulation;Action & Adventure 48.6 4.418182 0.252262
Simulation;Education 13.1 4.366667 0.152753
Simulation;Pretend Play 17.4 4.350000 0.331662
Social 1102.2 4.255598 0.413809
Sports 1410.8 4.236637 0.423600
Sports;Action & Adventure 17.4 4.350000 0.191485
Strategy 437.3 4.245631 0.373583
Strategy;Action & Adventure 9.2 4.600000 0.000000
Strategy;Creativity 4.4 4.400000 NaN
Strategy;Education 4.5 4.500000 NaN
Tools 2962.1 4.046585 0.616731
Tools;Education 4.5 4.500000 NaN
Travel & Local 924.6 4.109333 0.505816
Travel & Local;Action & Adventure 4.1 4.100000 NaN
Trivia 113.1 4.039286 0.808904
Video Players & Editors 642.1 4.063924 0.554566
Video Players & Editors;Creativity 8.2 4.100000 0.000000
Video Players & Editors;Music & Video 12.0 4.000000 0.000000
Weather 318.3 4.244000 0.331353
Word 123.5 4.410714 0.325849

[115 rows x 3 columns]

Iteration

To iterate over the rows of the DataFrame, we can use the following functions:

  • df.iteritems() − to iterate over the (key,value) pairs
  • df.iterrows() − iterate over the rows as (index,series) pairs
  • df.itertuples() − this method will return an iterator yielding a named tuple for each row in the DataFrame. The first element of the tuple will be the row’s corresponding index value, while the remaining values are the row values.
>>> iterated_df = pd.DataFrame(np.random.randn(4,3),columns=['col1','col2','col3'])
>>> for key,value in iterated_df.iteritems():
print(key)
print(value)
col1
0 -0.031657
1 -2.031456
2 2.820815
3 -0.153405
Name: col1, dtype: float64
col2
0 0.152370
1 -1.157595
2 2.817094
3 -0.227610
Name: col2, dtype: float64
col3
0 1.047588
1 -0.461455
2 -0.177125
3 -0.698067
Name: col3, dtype: float64

Operations with text data

String operations can be performed on Series in the format s.str.op where op can be:

  1. swapcase — Swaps the case lower/upper.
  2. lower() / upper() — Converts strings in the Series/Index to lower / upper case.
  3. len() — Computes String length.
  4. strip() — Helps strip whitespace(including newline) from each string in the Series/index from both the sides.
  5. split(‘ ‘) — Splits each string with the given pattern.
  6. cat(sep=’ ‘) — Concatenates the series/index elements with given separator.
  7. get_dummies() — Returns the DataFrame with One-Hot Encoded values.
  8. contains(pattern) — Returns Boolean True for each element if the substring contains in the element, else False.
  9. replace(a,b) — Replaces the value a with the value b.
  10. repeat(value) — Repeats each element with specified number of times.
  11. count(pattern) — Returns count of appearance of pattern in each element.
  12. startswith(pattern) / endswith(pattern) — Returns true if the element in the Series/Index starts / ends with the pattern.
  13. find(pattern) — Returns the first position of the first occurrence of the pattern. Returns -1 if not found.
  14. findall(pattern) — Returns a list of all occurrence of the pattern.
  15. islower() / isupper() / isnumeric() — Checks whether all characters in each string in the Series/Index in lower / upper case / numeric or not. Returns Boolean.
>>> s = pd.Series(['Tom ', ' William Rick', 'John', 'Alber@t'])
>>> print(s.str.contains(' '))
0     True
1 True
2 False
3 False
dtype: bool

Joining, Merging

  1. df1.append(df2) — Adds the rows in df1 to the end of df2 (columns should be identical)
  2. pd.concat([df1, df2], axis=1) — Adds the columns in df1 to the end of df2 (rows should be identical)
  3. pd.merge(left, right, how=’inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True) — where
  • left − A DataFrame object.
  • right − Another DataFrame object.
  • how − One of ‘left’, ‘right’, ‘outer’, ‘inner’. Defaults to inner. Each method has been described below.
  • on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
  • left_on − Columns from the left DataFrame to use as keys.
  • right_on − Columns from the right DataFrame to use as keys.
  • left_index − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join keys from the right DataFrame.
  • right_index − Same usage as left_index for the right DataFrame.
  • sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting to False will improve the performance substantially in many cases.
>>> left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayan'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
>>> right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Brock', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
>>> print(pd.concat([left,right],ignore_index=True)) # merging by rows
>>> print('='*50)
>>> print(left.append(right)) # also merging by rows
>>> print('='*50)
>>> print(pd.concat([left,right],axis=1)) # merging by columns
   id   Name subject_id
0 1 Alex sub1
1 2 Amy sub2
2 3 Allen sub4
3 4 Alice sub6
4 5 Ayan sub5
5 1 Billy sub2
6 2 Brian sub4
7 3 Brock sub3
8 4 Bryce sub6
9 5 Betty sub5
==================================================
id Name subject_id
0 1 Alex sub1
1 2 Amy sub2
2 3 Allen sub4
3 4 Alice sub6
4 5 Ayan sub5
0 1 Billy sub2
1 2 Brian sub4
2 3 Brock sub3
3 4 Bryce sub6
4 5 Betty sub5
==================================================
id Name subject_id id Name subject_id
0 1 Alex sub1 1 Billy sub2
1 2 Amy sub2 2 Brian sub4
2 3 Allen sub4 3 Brock sub3
3 4 Alice sub6 4 Bryce sub6
4 5 Ayan sub5 5 Betty sub5

# merge two dataframes on a key
>>> print(pd.merge(left,right,on='id'))
>>> print('='*50)
# merge two dataframes on multiple keys
>>> print(pd.merge(left,right,on=['id','subject_id']))
   id Name_x subject_id_x Name_y subject_id_y
0 1 Alex sub1 Billy sub2
1 2 Amy sub2 Brian sub4
2 3 Allen sub4 Brock sub3
3 4 Alice sub6 Bryce sub6
4 5 Ayan sub5 Betty sub5
==================================================
id Name_x subject_id Name_y
0 4 Alice sub6 Bryce
1 5 Ayan sub5 Betty

| Merge Method | SQL Equivalent | Description |
| left | LEFT OUTER JOIN | Use keys from left object |
| right | RIGHT OUTER JOIN | Use keys from right object |
| outer | FULL OUTER JOIN | Use union of keys |
| inner | INNER JOIN | Use intersection of keys |

>>> print(pd.merge(left, right, on='subject_id', how='inner'))
    id_x Name_x subject_id  id_y Name_y
0 2 Amy sub2 1 Billy
1 3 Allen sub4 2 Brian
2 4 Alice sub6 4 Bryce
3 5 Ayan sub5 5 Betty

Statistics

  • df.mean() — Returns the mean of all columns
  • df.corr() — Returns the correlation between columns in a DataFrame
  • df.count() — Returns the number of non-null values in each DataFrame column
  • df.max() — Returns the highest value in each column
  • df.min() — Returns the lowest value in each column
  • df.median() — Returns the median of each column
  • df.std() — Returns the standard deviation of each column
>>> df['Rating'].mean()
4.191837606837606

Visualization

The following charts we can generate straight from pandas:

  • bar or barh for bar plots
  • hist for histogram
  • area for area plots
  • scatter for scatter plots
>>> df_group_by[df_group_by['Type']=='Free']['count'].plot()
<matplotlib.axes._subplots.AxesSubplot at 0x7f0dd22b64e0>
>>> df_bar = df_high_rating[['Rating','Reviews']][:10]
>>> df_bar['Reviews'] = df_bar['Reviews'].astype(int)
>>> df_bar.plot.bar(stacked = True)
<matplotlib.axes._subplots.AxesSubplot at 0x7f0dd221cdd8>
>>> df_bar.plot.scatter(x='Rating', y='Reviews')
<matplotlib.axes._subplots.AxesSubplot at 0x7f0dd21487b8>

Exporting the Data

  • df.to_csv(filename) — Writes to a CSV file
  • df.to_excel(filename) — Writes to an Excel file
  • df.to_sql(table_name, connection_object) — Writes to a SQL table
  • df.to_json(filename) — Writes to a file in JSON format
  • df.to_html(filename) — Saves as an HTML table
>>> df_new_cols_name.to_csv('./google-play-store-apps/cleaned_googleplay_data.csv')

I hope you have liked this article. If yes, then do share it with your friends or colleagues who you think will be benefited.

Leave a Reply