Skip to main content
Version: MuddVision

Google Sheet and Drive Integration Sync

Author(s)

  • Reshmi Karan

Last Updated Date

2026-04-21


SRS References


Version History

VersionDateChangesAuthor
1.02026-04-21Initial draft for Google Sheet and Drive sync integrationReshmi Karan

Feature Overview

Objective:
Synchronize spreadsheet data from Google Drive into the application database so downstream services can query normalized row snapshots without reading Google APIs directly.

Scope:
This feature discovers a spreadsheet file from a configured Drive folder, resolves or converts it into a Google Sheet if needed, reads one or more tabs, normalizes row values, and stores them in Postgres. It also prunes rows that no longer exist in the source sheet when configured to do so.

Dependencies:

  • Google Drive API and Google Sheets API
  • Service account credentials
  • Postgres database
  • Dapper executor abstraction
  • Hangfire recurring jobs
  • Database migration for tblgoogle_sheet_rows and the google-sheet-sync job config

Requirements

  1. The system must authenticate to Google APIs using a service account JSON key.
  2. The system must discover the spreadsheet file from a configured Drive folder.
  3. The system must support both explicit sheet configuration and sync-all-sheets mode.
  4. The system must normalize headers and map flat rows into a consistent database record model.
  5. The system must store rows in tblgoogle_sheet_rows using spreadsheet_id, sheet_name, and row_key as the unique identity.
  6. The system must update existing rows and insert new rows on every sync.
  7. The system must optionally delete rows that are no longer present in the source sheet.
  8. The system must run as a recurring background job.
  9. The system must log success, warning, and failure states without exposing secrets.
  10. The system must fail safely when configuration is missing or the source file cannot be resolved.

Design Specifications

  • UI/UX Design:
    No user-facing UI is required. This is a background synchronization feature. Operational visibility is provided through logs and Hangfire job execution history.

  • Data Models:
    The sync stores row snapshots in a table-backed model mapped from spreadsheet data. The persisted record contains spreadsheet metadata plus selected business fields such as clientid, city, state, zip, purchase_date, vin, year, make, model, cust_type, match_type, and ql_number. The row key is derived from configured key columns or a fallback key strategy.

    public record GoogleSheetRowRecord
    {
    public string SpreadsheetId { get; init; } = string.Empty;
    public string SheetName { get; init; } = string.Empty;
    public string RowKey { get; init; } = string.Empty;
    public int RowNumber { get; init; }
    public string ClientId { get; init; } = string.Empty;
    public string? City { get; init; }
    public string? State { get; init; }
    public string? Zip { get; init; }
    public DateOnly? PurchaseDate { get; init; }
    public string? Vin { get; init; }
    public int? Year { get; init; }
    public string? Make { get; init; }
    public string? Model { get; init; }
    public string? CustType { get; init; }
    public string? MatchType { get; init; }
    public string? QlNumber { get; init; }
    public DateTime SyncedAt { get; init; } = DateTime.UtcNow;
    }
  • API Interfaces:
    No public HTTP API is required for the core sync path. The feature is executed by a recurring Hangfire job through the integration service.

    EndpointMethodParametersResponseResponse Status Codes
    NoneNoneNoneBackground job onlyN/A
  • Third-Party Integrations:

    • Google Drive API for file discovery and optional file conversion
    • Google Sheets API for tab and row retrieval
    • Hangfire for recurring execution
  • Workflow:

    1. The recurring job google-sheet-sync runs on the cdk queue.
    2. The job calls SyncGoogleSheetsAsync in the integration sync service.
    3. The service validates Google Sheet configuration.
    4. The service resolves the target spreadsheet from the configured Drive folder.
    5. If the discovered file is not already a Google Sheet, the service converts it by copying it into Google Sheets format.
    6. The service resolves the set of tabs to sync from configuration or spreadsheet metadata.
    7. For each tab, the service reads the sheet range, normalizes headers, and maps each row into a GoogleSheetRowRecord.
    8. The DAL upserts rows into tblgoogle_sheet_rows using the spreadsheet_id, sheet_name, and row_key uniqueness rule.
    9. If delete-missing behavior is enabled, rows that no longer exist in the sheet are removed from the table.
    10. The job logs row counts, errors, and status for operational tracking.

Development Tasks & Estimates

NoTask NameEstimate (Hours)DependenciesNotes
1Define sync workflow and config contract3NoneConfirm sheet discovery and row-key rules
2Implement Drive and Sheets read path5Google service account setupIncludes file resolution and tab loading
3Implement database persistence and pruning4Table migrationUpsert plus delete-missing behavior
4Add Hangfire schedule and job registration2Sync service completeUses google-sheet-sync job key
5Add tests and validation4Implementation completeMapper, DAL, and service coverage
6Total18AllEstimate only

Testing & Quality Assurance

  • Unit Tests:

    • Header normalization and row-key generation
    • Row mapping from sheet values to GoogleSheetRowRecord
    • Config validation and sheet definition selection
    • Upsert and delete query behavior in the DAL
  • Integration Tests:

    • Sync against a test spreadsheet in a sandbox Drive folder
    • Verify rows are inserted, updated, and pruned correctly
    • Verify recurring job execution path completes and logs counts
  • Acceptance Criteria:

    • A valid spreadsheet in the configured Drive folder is discovered successfully
    • Sheet rows are stored in tblgoogle_sheet_rows
    • Re-running the sync updates existing rows instead of duplicating them
    • Missing rows are deleted when delete-missing is enabled
    • The job runs on schedule through Hangfire
    • Failures are logged and do not leak secrets
  • Testing Tools:

    • xUnit
    • Mocking framework for service and DAL abstractions
    • Hangfire dashboard or execution logs
    • Test Google Drive folder and spreadsheet

Deployment Considerations

  • Configuration Changes:

    • Set GoogleSheet:ServiceAccountKeyPath
    • Set GoogleSheet:DriveFolderId
    • Confirm GoogleSheet:SyncAllSheets and per-sheet settings
    • Ensure the service account has access to the target Drive folder and spreadsheet.
      CREATE TABLE IF NOT EXISTS tblgoogle_sheet_rows (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    spreadsheet_id TEXT NOT NULL,
    sheet_name TEXT NOT NULL,
    row_key TEXT NOT NULL,
    row_number INT NOT NULL,
    clientid TEXT,
    city TEXT,
    state TEXT,
    zip TEXT,
    purchase_date DATE,
    vin TEXT,
    year INT,
    make TEXT,
    model TEXT,
    cust_type TEXT,
    match_type TEXT,
    ql_number TEXT,
    synced_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT uq_google_sheet_rows UNIQUE (spreadsheet_id, sheet_name, row_key)
    );

    CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_spreadsheet_id ON tblgoogle_sheet_rows (spreadsheet_id);
    CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_sheet_name ON tblgoogle_sheet_rows (sheet_name);
    CREATE INDEX IF NOT EXISTS idx_google_sheet_rows_synced_at ON tblgoogle_sheet_rows (synced_at);

    COMMENT ON TABLE tblgoogle_sheet_rows IS 'Row-level snapshots imported from Google Sheets tabs';
    COMMENT ON COLUMN tblgoogle_sheet_rows.spreadsheet_id IS 'Google Spreadsheet ID';
    COMMENT ON COLUMN tblgoogle_sheet_rows.sheet_name IS 'Worksheet/tab name';
    COMMENT ON COLUMN tblgoogle_sheet_rows.row_key IS 'Stable business key derived from one or more columns';
    COMMENT ON COLUMN tblgoogle_sheet_rows.city IS 'City value from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.state IS 'State value from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.zip IS 'ZIP value from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.purchase_date IS 'Purchase date from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.vin IS 'VIN from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.year IS 'Vehicle year from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.make IS 'Vehicle make from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.model IS 'Vehicle model from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.cust_type IS 'Customer type from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.match_type IS 'Match type from the spreadsheet row';
    COMMENT ON COLUMN tblgoogle_sheet_rows.ql_number IS 'QL number from the spreadsheet row';
  • Rollout Plan:

    • Deploy the database migration first
    • Deploy the integration service with the new Google Sheet sync job
    • Enable the recurring job in tblintegration_job_config
    • Validate one test spreadsheet before enabling production data sources

Risks & Mitigations

RiskImpactLikelihoodMitigation Strategy
Incorrect row key selection causes churn or duplicatesHighMediumUse stable key columns and validate source headers
Service account lacks Drive or Sheets accessHighMediumPre-approve access to the shared folder and spreadsheet
File conversion creates an unexpected duplicate spreadsheetMediumLowPrefer native Google Sheets files and monitor conversion logs
Delete-missing mode removes valid rows due to source filtering changesHighMediumRestrict delete-missing to controlled sheets and review tab definitions

Review & Approval

  • Reviewer: Ayon Das

  • Approval Date: 2026-04-30


Notes
The implementation persists row snapshots rather than raw files. The database schema and job schedule are already defined in DatabaseService/Scripts/000041-GoogleSheetSyncJobAndTable.sql, and the recurring job is mapped to google-sheet-sync in IntegrationService/Scheduler/Infrastructure/JobSchedulerService.cs.