gspread-dataframe
gspread-dataframe (version 4.0.0) is a Python library that simplifies reading from and writing to Google Sheets using pandas DataFrames. It acts as an extension to the `gspread` library, providing convenient functions to convert worksheet data into DataFrames and vice-versa. The library is actively maintained, with a focus on seamless integration between Google Sheets and pandas.
Warnings
- breaking In version 4.0.0, the `get_as_dataframe` function's `drop_empty_rows` and `drop_empty_columns` parameters changed their default value to `True`. This means empty rows and columns at the end of a sheet will now be automatically discarded when reading, which might alter the DataFrame structure for existing code expecting these rows/columns.
- gotcha Version 4.0.0 and later of `gspread-dataframe` officially support Python 3 only. If you are using Python 2.7, you must use `gspread-dataframe` releases prior to 4.0.0 (e.g., 2.1.1 or earlier).
- gotcha `gspread-dataframe` requires specific versions of its core dependencies. For `gspread-dataframe` versions 4.0.0+, you need `gspread>=3.0.0` and `pandas>=0.24.0`. Using older `gspread-dataframe` versions (2.1.1 or earlier) is necessary if you are tied to older `gspread` versions.
- gotcha The `get_as_dataframe` function uses the 'python' engine for pandas' text parsing, which means only options supported by this engine can be passed via the `**options` argument (e.g., `parse_dates`, `skiprows`, `header`). Some advanced `pandas.read_csv` options might not work.
- gotcha The underlying `gspread` library, and by extension `gspread-dataframe`, requires careful handling of Google Sheets API authentication (e.g., Service Account or OAuth Client ID) and explicit sharing of the target spreadsheet with the authenticated client email. Misconfiguration is a common source of `gspread.exceptions.SpreadsheetNotFound` or permission errors.
- gotcha When reading data from Google Sheets, `get_as_dataframe` may initially return all columns with a pandas 'object' dtype. This requires manual type conversion (e.g., to `int`, `float`, `datetime`) if numeric or date-time operations are intended.
Install
-
pip install gspread-dataframe
Imports
- get_as_dataframe
from gspread_dataframe import get_as_dataframe
- set_with_dataframe
from gspread_dataframe import set_with_dataframe
Quickstart
import gspread
import pandas as pd
from gspread_dataframe import get_as_dataframe, set_with_dataframe
import os
# --- Gspread Authentication (replace with your actual setup) ---
# For service account authentication, ensure 'service_account.json' is in your path
# and shared with the client_email in that file.
# For a runnable example, we'll mock gspread client/worksheet objects.
# In a real application, you would use:
# gc = gspread.service_account(filename=os.environ.get('GOOGLE_APPLICATION_CREDENTIALS'))
# spreadsheet = gc.open('Your Spreadsheet Name')
# worksheet = spreadsheet.worksheet('Sheet1')
class MockWorksheet:
def __init__(self, data=None):
self._values = [list(row) for row in data] if data else []
def get_all_values(self):
return self._values
def update(self, range_name, values):
# Simple mock update for demonstration
if range_name == 'A1': # Assume A1 starts the update
for r_idx, row in enumerate(values):
if r_idx < len(self._values):
for c_idx, val in enumerate(row):
if c_idx < len(self._values[r_idx]):
self._values[r_idx][c_idx] = val
else:
self._values[r_idx].append(val)
else:
self._values.append(list(row))
# Create a mock worksheet with some initial data
mock_data = [
['Name', 'Age', 'City'],
['Alice', '30', 'New York'],
['Bob', '24', 'London'],
['Charlie', '35', 'Paris']
]
worksheet = MockWorksheet(mock_data)
# Read worksheet into a DataFrame
df = get_as_dataframe(worksheet)
print("DataFrame from Worksheet:")
print(df)
# Modify the DataFrame
df['Age'] = df['Age'].astype(int) + 1
df['Country'] = ['USA', 'UK', 'France']
# Write DataFrame back to worksheet
# Note: resize=True will clear existing data and resize the sheet.
# include_index=False by default.
set_with_dataframe(worksheet, df, resize=True, include_column_header=True)
print("\nUpdated Worksheet (mocked values):")
print(worksheet.get_all_values())
# Example of reading with pandas options
# df_parsed = get_as_dataframe(worksheet, parse_dates=['birth_date'], skiprows=1, header=None)
# print(df_parsed)