Why Google Sheets and Pandas are a Good Combination?
Pandas is a powerful Python library that provides data manipulation and analysis tools. When used in conjunction with the Google API, Pandas becomes even more valuable for handling data-related tasks.
-
Data Retrieval: With the Google API, you can fetch data from various Google services like Google Sheets, Google Drive, or Google Analytics. Pandas provides functionality to read data from these sources into a DataFrame, allowing you to easily access and work with the data in a tabular format.
-
Data Manipulation: Pandas offers a wide range of functions and methods for manipulating data. You can use these tools to clean, transform, and reshape the data obtained from the Google API. Pandas' flexible data structures and intuitive syntax make it convenient to perform operations such as filtering, aggregating, merging, and sorting data.
-
Data Analysis: Pandas provides numerous statistical and analytical functions that enable you to gain insights from your data. You can compute descriptive statistics, perform data visualization, and conduct advanced data analysis using Pandas' built-in methods. This allows you to explore and understand the data retrieved from the Google API effectively.
-
Data Integration: Combining the Google API with Pandas allows you to seamlessly integrate Google data with other data sources. You can merge, join, or concatenate data from multiple Google Sheets or other external sources using Pandas, enabling you to create comprehensive datasets for analysis.
-
Data Export: After manipulating and analyzing the data using Pandas, you can export the results back to the Google API. Pandas supports writing data to various file formats, including CSV, Excel, and Google Sheets. This makes it convenient to store your processed data back into Google Sheets or other Google services for further use or sharing.
Import the following modules and use the service credentials created in a pervious article. Here we have created a workbook called Tsla Info , named so because we will be using Tesla stock data in order demonstrate usage of pandas and Google Sheets.
Write Pandas DataFrame to Google Sheets
#import packages
import yfinance as yf
import pandas as pd
import datetime as dt
import gspread
from oauth2client.service_account import ServiceAccountCredentials
#service account through credentials saved in working directory
gp = gspread.service_account(filename='credentials.json')
# pass in name of workbook we created
workbook = gp.open('Tsla Info')
#get the sheet
price_sheet = workbook.worksheet('Prices')
tsla = yf.Ticker('TSLA')
df = tsla.history(period='1mo').reset_index()
df['Date'] = [dt.datetime(x.year, x.month, x.day).isoformat() for x in df.Date]
price_sheet.update([df.columns.values.tolist()] + df.values.tolist())
Date | Open | High | Low | Close | Volume | Dividends | Stock Splits |
---|---|---|---|---|---|---|---|
2023-05-10T00:00:00 | 172.5500031 | 174.4299927 | 166.6799927 | 168.5399933 | 119840700 | 0 | 0 |
2023-05-11T00:00:00 | 168.6999969 | 173.5700073 | 166.7899933 | 172.0800018 | 103889900 | 0 | 0 |
2023-05-12T00:00:00 | 176.0700073 | 177.3800049 | 167.2299957 | 167.9799957 | 157577100 | 0 | 0 |
2023-05-15T00:00:00 | 167.6600037 | 169.7599945 | 164.5500031 | 166.3500061 | 105592500 | 0 | 0 |
Read Data as Pandas DataFrame
# retrieve sheet back from Google sheet
data =pd.DataFrame(price_sheet.get_all_records())
print(data.head())
'''
Date Open ... Dividends Stock Splits
0 2023-05-10T00:00:00 172.550003 ... 0 0
1 2023-05-11T00:00:00 168.699997 ... 0 0
2 2023-05-12T00:00:00 176.070007 ... 0 0
3 2023-05-15T00:00:00 167.660004 ... 0 0
4 2023-05-16T00:00:00 165.649994 ... 0 0
[5 rows x 8 columns]
'''