Google Sheet and Drive Integration Sync
Author(s)
- Reshmi Karan
Last Updated Date
2026-04-21
SRS References
Version History
| Version | Date | Changes | Author |
|---|---|---|---|
| 1.0 | 2026-04-21 | Initial draft for Google Sheet and Drive sync integration | Reshmi Karan |
Feature Overview
Objective:
Synchronize spreadsheet data from Google Drive into the application database so downstream services can query normalized row snapshots without reading Google APIs directly.
Scope:
This feature discovers a spreadsheet file from a configured Drive folder, resolves or converts it into a Google Sheet if needed, reads one or more tabs, normalizes row values, and stores them in Postgres. It also prunes rows that no longer exist in the source sheet when configured to do so.
Dependencies:
- Google Drive API and Google Sheets API
- Service account credentials
- Postgres database
- Dapper executor abstraction
- Hangfire recurring jobs
- Database migration for
tblgoogle_sheet_rowsand thegoogle-sheet-syncjob config
Requirements
- The system must authenticate to Google APIs using a service account JSON key.
- The system must discover the spreadsheet file from a configured Drive folder.
- The system must support both explicit sheet configuration and sync-all-sheets mode.
- The system must normalize headers and map flat rows into a consistent database record model.
- The system must store rows in
tblgoogle_sheet_rowsusingspreadsheet_id,sheet_name, androw_keyas the unique identity. - The system must update existing rows and insert new rows on every sync.
- The system must optionally delete rows that are no longer present in the source sheet.
- The system must run as a recurring background job.
- The system must log success, warning, and failure states without exposing secrets.
- The system must fail safely when configuration is missing or the source file cannot be resolved.
Design Specifications
-
UI/UX Design:
No user-facing UI is required. This is a background synchronization feature. Operational visibility is provided through logs and Hangfire job execution history. -
Data Models:
The sync stores row snapshots in a table-backed model mapped from spreadsheet data. The persisted record contains spreadsheet metadata plus selected business fields such as clientid, city, state, zip, purchase_date, vin, year, make, model, cust_type, match_type, and ql_number. The row key is derived from configured key columns or a fallback key strategy.public record GoogleSheetRowRecord
{
public string SpreadsheetId { get; init; } = string.Empty;
public string SheetName { get; init; } = string.Empty;
public string RowKey { get; init; } = string.Empty;
public int RowNumber { get; init; }
public string ClientId { get; init; } = string.Empty;
public string? City { get; init; }
public string? State { get; init; }
public string? Zip { get; init; }
public DateOnly? PurchaseDate { get; init; }
public string? Vin { get; init; }
public int? Year { get; init; }
public string? Make { get; init; }
public string? Model { get; init; }
public string? CustType { get; init; }
public string? MatchType { get; init; }
public string? QlNumber { get; init; }
public DateTime SyncedAt { get; init; } = DateTime.UtcNow;
} -
API Interfaces:
No public HTTP API is required for the core sync path. The feature is executed by a recurring Hangfire job through the integration service.Endpoint Method Parameters Response Response Status Codes None None None Background job only N/A -
Third-Party Integrations:
- Google Drive API for file discovery and optional file conversion
- Google Sheets API for tab and row retrieval
- Hangfire for recurring execution
-
Workflow:
- The recurring job
google-sheet-syncruns on thecdkqueue. - The job calls
SyncGoogleSheetsAsyncin the integration sync service. - The service validates Google Sheet configuration.
- The service resolves the target spreadsheet from the configured Drive folder.
- If the discovered file is not already a Google Sheet, the service converts it by copying it into Google Sheets format.
- The service resolves the set of tabs to sync from configuration or spreadsheet metadata.
- For each tab, the service reads the sheet range, normalizes headers, and maps each row into a
GoogleSheetRowRecord. - The DAL upserts rows into
tblgoogle_sheet_rowsusing thespreadsheet_id,sheet_name, androw_keyuniqueness rule. - If delete-missing behavior is enabled, rows that no longer exist in the sheet are removed from the table.
- The job logs row counts, errors, and status for operational tracking.
- The recurring job
Development Tasks & Estimates
| No | Task Name | Estimate (Hours) | Dependencies | Notes |
|---|---|---|---|---|
| 1 | Define sync workflow and config contract | 3 | None | Confirm sheet discovery and row-key rules |
| 2 | Implement Drive and Sheets read path | 5 | Google service account setup | Includes file resolution and tab loading |
| 3 | Implement database persistence and pruning | 4 | Table migration | Upsert plus delete-missing behavior |
| 4 | Add Hangfire schedule and job registration | 2 | Sync service complete | Uses google-sheet-sync job key |
| 5 | Add tests and validation | 4 | Implementation complete | Mapper, DAL, and service coverage |
| 6 | Total | 18 | All | Estimate only |
Testing & Quality Assurance
-
Unit Tests:
- Header normalization and row-key generation
- Row mapping from sheet values to
GoogleSheetRowRecord - Config validation and sheet definition selection
- Upsert and delete query behavior in the DAL
-
Integration Tests:
- Sync against a test spreadsheet in a sandbox Drive folder
- Verify rows are inserted, updated, and pruned correctly
- Verify recurring job execution path completes and logs counts
-
Acceptance Criteria:
- A valid spreadsheet in the configured Drive folder is discovered successfully
- Sheet rows are stored in
tblgoogle_sheet_rows - Re-running the sync updates existing rows instead of duplicating them
- Missing rows are deleted when delete-missing is enabled
- The job runs on schedule through Hangfire
- Failures are logged and do not leak secrets
-
Testing Tools:
- xUnit
- Mocking framework for service and DAL abstractions
- Hangfire dashboard or execution logs
- Test Google Drive folder and spreadsheet
Deployment Considerations
-
Configuration Changes:
- Set
GoogleSheet:ServiceAccountKeyPath - Set
GoogleSheet:DriveFolderId - Confirm
GoogleSheet:SyncAllSheetsand per-sheet settings - Ensure the service account has access to the target Drive folder and spreadsheet.
CREATE TABLE IF NOT EXISTS tblgoogle_sheet_rows (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
spreadsheet_id TEXT NOT NULL,
sheet_name TEXT NOT NULL,
row_key TEXT NOT NULL,
row_number INT NOT NULL,
clientid TEXT,
city TEXT,
state TEXT,
zip TEXT,
purchase_date DATE,
vin TEXT,
year INT,
make TEXT,
model TEXT,
cust_type TEXT,
match_type TEXT,
ql_number TEXT,
synced_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT uq_google_sheet_rows UNIQUE (spreadsheet_id, sheet_name, row_key)
);
CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_spreadsheet_id ON tblgoogle_sheet_rows (spreadsheet_id);
CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_sheet_name ON tblgoogle_sheet_rows (sheet_name);
CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_synced_at ON tblgoogle_sheet_rows (synced_at);
COMMENT ON TABLE tblgoogle_sheet_rows IS 'Row-level snapshots imported from Google Sheets tabs';
COMMENT ON COLUMN tblgoogle_sheet_rows.spreadsheet_id IS 'Google Spreadsheet ID';
COMMENT ON COLUMN tblgoogle_sheet_rows.sheet_name IS 'Worksheet/tab name';
COMMENT ON COLUMN tblgoogle_sheet_rows.row_key IS 'Stable business key derived from one or more columns';
COMMENT ON COLUMN tblgoogle_sheet_rows.city IS 'City value from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.state IS 'State value from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.zip IS 'ZIP value from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.purchase_date IS 'Purchase date from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.vin IS 'VIN from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.year IS 'Vehicle year from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.make IS 'Vehicle make from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.model IS 'Vehicle model from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.cust_type IS 'Customer type from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.match_type IS 'Match type from the spreadsheet row';
COMMENT ON COLUMN tblgoogle_sheet_rows.ql_number IS 'QL number from the spreadsheet row'; - Set
-
Rollout Plan:
- Deploy the database migration first
- Deploy the integration service with the new Google Sheet sync job
- Enable the recurring job in
tblintegration_job_config - Validate one test spreadsheet before enabling production data sources
Risks & Mitigations
| Risk | Impact | Likelihood | Mitigation Strategy |
|---|---|---|---|
| Incorrect row key selection causes churn or duplicates | High | Medium | Use stable key columns and validate source headers |
| Service account lacks Drive or Sheets access | High | Medium | Pre-approve access to the shared folder and spreadsheet |
| File conversion creates an unexpected duplicate spreadsheet | Medium | Low | Prefer native Google Sheets files and monitor conversion logs |
| Delete-missing mode removes valid rows due to source filtering changes | High | Medium | Restrict delete-missing to controlled sheets and review tab definitions |
Review & Approval
-
Reviewer: Ayon Das
-
Approval Date: 2026-04-30
Notes
The implementation persists row snapshots rather than raw files. The database schema and job schedule are already defined in DatabaseService/Scripts/000041-GoogleSheetSyncJobAndTable.sql, and the recurring job is mapped to google-sheet-sync in IntegrationService/Scheduler/Infrastructure/JobSchedulerService.cs.