Version: MuddVision

Google Sheet and Drive Integration Sync

Author(s)

Reshmi Karan

Last Updated Date

2026-04-21

SRS References

Version History

Version	Date	Changes	Author
1.0	2026-04-21	Initial draft for Google Sheet and Drive sync integration	Reshmi Karan

Feature Overview

Objective:
Synchronize spreadsheet data from Google Drive into the application database so downstream services can query normalized row snapshots without reading Google APIs directly.

Scope:
This feature discovers a spreadsheet file from a configured Drive folder, resolves or converts it into a Google Sheet if needed, reads one or more tabs, normalizes row values, and stores them in Postgres. It also prunes rows that no longer exist in the source sheet when configured to do so.

Dependencies:

Google Drive API and Google Sheets API
Service account credentials
Postgres database
Dapper executor abstraction
Hangfire recurring jobs
Database migration for tblgoogle_sheet_rows and the google-sheet-sync job config

Requirements

The system must authenticate to Google APIs using a service account JSON key.
The system must discover the spreadsheet file from a configured Drive folder.
The system must support both explicit sheet configuration and sync-all-sheets mode.
The system must normalize headers and map flat rows into a consistent database record model.
The system must store rows in tblgoogle_sheet_rows using spreadsheet_id, sheet_name, and row_key as the unique identity.
The system must update existing rows and insert new rows on every sync.
The system must optionally delete rows that are no longer present in the source sheet.
The system must run as a recurring background job.
The system must log success, warning, and failure states without exposing secrets.
The system must fail safely when configuration is missing or the source file cannot be resolved.

Design Specifications

UI/UX Design:
No user-facing UI is required. This is a background synchronization feature. Operational visibility is provided through logs and Hangfire job execution history.

Data Models:
The sync stores row snapshots in a table-backed model mapped from spreadsheet data. The persisted record contains spreadsheet metadata plus selected business fields such as clientid, city, state, zip, purchase_date, vin, year, make, model, cust_type, match_type, and ql_number. The row key is derived from configured key columns or a fallback key strategy.

public record GoogleSheetRowRecord
{
  public string SpreadsheetId { get; init; } = string.Empty;
  public string SheetName { get; init; } = string.Empty;
  public string RowKey { get; init; } = string.Empty;
  public int RowNumber { get; init; }
  public string ClientId { get; init; } = string.Empty;
  public string? City { get; init; }
  public string? State { get; init; }
  public string? Zip { get; init; }
  public DateOnly? PurchaseDate { get; init; }
  public string? Vin { get; init; }
  public int? Year { get; init; }
  public string? Make { get; init; }
  public string? Model { get; init; }
  public string? CustType { get; init; }
  public string? MatchType { get; init; }
  public string? QlNumber { get; init; }
  public DateTime SyncedAt { get; init; } = DateTime.UtcNow;
}

API Interfaces:
No public HTTP API is required for the core sync path. The feature is executed by a recurring Hangfire job through the integration service.

Endpoint Method Parameters Response Response Status Codes
None None None Background job only N/A
Third-Party Integrations:
- Google Drive API for file discovery and optional file conversion
- Google Sheets API for tab and row retrieval
- Hangfire for recurring execution
Workflow:
1. The recurring job google-sheet-sync runs on the cdk queue.
2. The job calls SyncGoogleSheetsAsync in the integration sync service.
3. The service validates Google Sheet configuration.
4. The service resolves the target spreadsheet from the configured Drive folder.
5. If the discovered file is not already a Google Sheet, the service converts it by copying it into Google Sheets format.
6. The service resolves the set of tabs to sync from configuration or spreadsheet metadata.
7. For each tab, the service reads the sheet range, normalizes headers, and maps each row into a GoogleSheetRowRecord.
8. The DAL upserts rows into tblgoogle_sheet_rows using the spreadsheet_id, sheet_name, and row_key uniqueness rule.
9. If delete-missing behavior is enabled, rows that no longer exist in the sheet are removed from the table.
10. The job logs row counts, errors, and status for operational tracking.

Endpoint	Method	Parameters	Response	Response Status Codes
None	None	None	Background job only	N/A

Development Tasks & Estimates

No	Task Name	Estimate (Hours)	Dependencies	Notes
1	Define sync workflow and config contract	3	None	Confirm sheet discovery and row-key rules
2	Implement Drive and Sheets read path	5	Google service account setup	Includes file resolution and tab loading
3	Implement database persistence and pruning	4	Table migration	Upsert plus delete-missing behavior
4	Add Hangfire schedule and job registration	2	Sync service complete	Uses `google-sheet-sync` job key
5	Add tests and validation	4	Implementation complete	Mapper, DAL, and service coverage
6	Total	18	All	Estimate only

Testing & Quality Assurance

Unit Tests:
- Header normalization and row-key generation
- Row mapping from sheet values to GoogleSheetRowRecord
- Config validation and sheet definition selection
- Upsert and delete query behavior in the DAL
Integration Tests:
- Sync against a test spreadsheet in a sandbox Drive folder
- Verify rows are inserted, updated, and pruned correctly
- Verify recurring job execution path completes and logs counts
Acceptance Criteria:
- A valid spreadsheet in the configured Drive folder is discovered successfully
- Sheet rows are stored in tblgoogle_sheet_rows
- Re-running the sync updates existing rows instead of duplicating them
- Missing rows are deleted when delete-missing is enabled
- The job runs on schedule through Hangfire
- Failures are logged and do not leak secrets
Testing Tools:
- xUnit
- Mocking framework for service and DAL abstractions
- Hangfire dashboard or execution logs
- Test Google Drive folder and spreadsheet

Deployment Considerations

Configuration Changes:

Set GoogleSheet:ServiceAccountKeyPath
Set GoogleSheet:DriveFolderId
Confirm GoogleSheet:SyncAllSheets and per-sheet settings
Ensure the service account has access to the target Drive folder and spreadsheet.

  CREATE TABLE IF NOT EXISTS tblgoogle_sheet_rows (
      id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
      spreadsheet_id TEXT NOT NULL,
      sheet_name TEXT NOT NULL,
      row_key TEXT NOT NULL,
      row_number INT NOT NULL,
      clientid TEXT,
      city TEXT,
      state TEXT,
      zip TEXT,
      purchase_date DATE,
      vin TEXT,
      year INT,
      make TEXT,
      model TEXT,
      cust_type TEXT,
      match_type TEXT,
      ql_number TEXT,
      synced_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
      created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
      updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
      CONSTRAINT uq_google_sheet_rows UNIQUE (spreadsheet_id, sheet_name, row_key)
  );
  
  CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_spreadsheet_id ON tblgoogle_sheet_rows (spreadsheet_id);
  CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_sheet_name ON tblgoogle_sheet_rows (sheet_name);
  CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_synced_at ON tblgoogle_sheet_rows (synced_at);
  
  COMMENT ON TABLE tblgoogle_sheet_rows IS 'Row-level snapshots imported from Google Sheets tabs';
  COMMENT ON COLUMN tblgoogle_sheet_rows.spreadsheet_id IS 'Google Spreadsheet ID';
  COMMENT ON COLUMN tblgoogle_sheet_rows.sheet_name IS 'Worksheet/tab name';
  COMMENT ON COLUMN tblgoogle_sheet_rows.row_key IS 'Stable business key derived from one or more columns';
  COMMENT ON COLUMN tblgoogle_sheet_rows.city IS 'City value from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.state IS 'State value from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.zip IS 'ZIP value from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.purchase_date IS 'Purchase date from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.vin IS 'VIN from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.year IS 'Vehicle year from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.make IS 'Vehicle make from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.model IS 'Vehicle model from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.cust_type IS 'Customer type from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.match_type IS 'Match type from the spreadsheet row';
  COMMENT ON COLUMN tblgoogle_sheet_rows.ql_number IS 'QL number from the spreadsheet row';

Rollout Plan:
- Deploy the database migration first
- Deploy the integration service with the new Google Sheet sync job
- Enable the recurring job in tblintegration_job_config
- Validate one test spreadsheet before enabling production data sources

Risks & Mitigations

Risk	Impact	Likelihood	Mitigation Strategy
Incorrect row key selection causes churn or duplicates	High	Medium	Use stable key columns and validate source headers
Service account lacks Drive or Sheets access	High	Medium	Pre-approve access to the shared folder and spreadsheet
File conversion creates an unexpected duplicate spreadsheet	Medium	Low	Prefer native Google Sheets files and monitor conversion logs
Delete-missing mode removes valid rows due to source filtering changes	High	Medium	Restrict delete-missing to controlled sheets and review tab definitions

Review & Approval

Reviewer: Ayon Das
Approval Date: 2026-04-30

Notes
The implementation persists row snapshots rather than raw files. The database schema and job schedule are already defined in DatabaseService/Scripts/000041-GoogleSheetSyncJobAndTable.sql, and the recurring job is mapped to google-sheet-sync in IntegrationService/Scheduler/Infrastructure/JobSchedulerService.cs.

Google Sheet and Drive Integration Sync

Author(s)​

Last Updated Date​

SRS References​

Version History​

Feature Overview​

Requirements​

Design Specifications​

Development Tasks & Estimates​

Testing & Quality Assurance​

Deployment Considerations​

Risks & Mitigations​

Review & Approval​