Skip to main content
Version: SyncExpress

Auto-Sync POC

Author(s)

  • Kaustubh Paul
  • Sarnick Chakraborty

Submission Date

2026-01-29


Version History

VersionDateChangesAuthor
1.0.02026-01-29Initial POC DocumentationKaustubh Paul, Sarnick Chakraborty

Objective

The objective of the Auto-Sync POC is to develop a desktop application that automates the synchronization of video depositions with their transcripts. The tool aims to eliminate manual timestamping by using audio extraction, AI-based alignment, and a "Human-in-the-Loop" editing interface, ensuring high-accuracy outputs (.smi, .dvt, .syn) for legal review platforms.


Scope

The POC covers the end-to-end workflow for a single user session:

  1. Media Ingestion: Drag-and-drop support for Video (.mp4) and Transcript (.txt) files.
  2. Preprocessing: Local audio extraction using FFmpeg and text sanitization.
  3. Synchronization: Automated alignment of text to audio via backend API and local Dynamic Time Warping (DTW).
  4. Review & Edit: A purely frontend editor to adjust timestamps and text before finalization.
  5. Project Persistence: Ability to save/load sessions and import existing packages via .syn metadata files.
  6. Export: Generation of industry-standard synchronized files (.smi, .dvt, .syn) zipped with media.

Out of Scope

  1. Multiple Video Handling
  2. Fault Tolerant Mechanism

  • Start Line Selection: Users can visually select a specific line in the transcript preview to define the start point for synchronization, ignoring header text.
  • Session Recovery: Auto-save functionality using localStorage to recover progress (video paths and transcript text) in case of app closure.
  • Import Mode: Re-opening a project by parsing the .syn file to load video metadata and subtitles into the editor without re-processing.
  • Job Summary: A post-process view displaying processing metrics (confidence, duration) with an option to loop back to the editor or let the user export the project.
  • Auto Update: User receives automatic version updates of the Desktop App which can be installed at any time

Technical Approach

The application is built using the Tauri framework for a secure, performant desktop experience.

  • Frontend: React (TypeScript) with Material UI for the interface.
  • Core Logic: Custom React Hooks (useTranscriptionWorkflow, usePackageImport) manage the complex state machines.
  • Local Processing:
    • FFmpeg Sidecar: Executed via Tauri shell to strip audio from video files locally.
    • File System: Direct OS-level file manipulation (read/write/mkdir) for project structure creation.
  • Backend: A C# API endpoint (/finaltranscript) receives audio and text to perform the heavy alignment calculation.

Workflow

Auto Sync Workflow


Dependencies

Frontend (Node/React)

Managed via package.json

CategoryKey PackagesVersionPurpose
Core Frameworkreact, react-dom^19.1.0UI Rendering
vite^7.0.4Build Tooling
Logic & Dataaxios^1.13.2API Requests
jszip^3.10.1Zip Generation
fuzzball^2.2.3Text Alignment (DTW)
UI Components@mui/material^7.3.7Design System
@dnd-kit/core^6.3.1Drag & Drop
Tauri Bindings@tauri-apps/api^2.0.0OS Interaction
@tauri-apps/plugin-shell^2.3.4FFmpeg Process
@tauri-apps/plugin-fs^2.4.5File System

Backend (Rust/Tauri)

Managed via src-tauri/Cargo.toml

CrateVersionPurpose
tauri2.0Core Application Framework
tauri-plugin-shell2.0Shell command execution
tauri-plugin-fs2.0File system operations

Development Tasks & Estimates

NoTask NameEstimate (Hours)DependenciesNotes
1Project Setup & Tauri Config4 hoursNoneConfig sidecars (FFmpeg) and permissions.
2UI Shell & Navigation6 hoursTask 1Setup App.tsx and routing.
3Transcription Workflow Hook12 hoursTask 2Implement useTranscriptionWorkflow.ts.
4FFmpeg Integration6 hoursTask 1Implement binary execution and error handling.
5Editor & Preview Components10 hoursTask 3Sync player and text correction UI.
6Import/Export Logic8 hoursTask 5Parsing .syn/.smi and Zipping output.

Testing & Validation

  • Basic Tests:

    • Upload: Verify drag-and-drop accepts only valid extensions (.mp4, .txt) and rejects others.
    • Sync Flow: Run a 5-minute video; confirm .smi output timestamps match audio.
    • Edit Loop: Modify a subtitle in "Step 3", export, and verify the change persists in the generated file.
    • Import: Load a folder containing a .syn file and ensure the editor populates correctly.
  • Validation Criteria:

    • The application must not crash if the API returns a 500 error (Error handling check).
    • Exported folders must contain the following items:
      • media/video.mp4
      • media/subtitle.smi
      • transcription/transcript.txt
      • project.dvt
      • project.syn
    • Resuming a session must restore the exact video list order.

Risks & Mitigations

RiskImpactLikelihoodMitigation Strategy
Misalignment of Start Time
User provides a transcript with headers (e.g., "Page 1") that don't match audio start.
HighHighPreview Selection: Allow user to scroll through the transcript preview and click the exact line where audio begins. Pass this startLine index to the backend to offset alignment.
Redundant Uploads in Edit Mode
User goes back from "Job Summary" to "Edit" and triggers a re-upload/re-process.
HighMediumState Persistence: The "Back to Results" path in TranscriptionPage maintains the mappedResult in memory. The workflow checks for existing processing data so the heavy API/FFmpeg steps are skipped during the loop.
Unmaintainable State Logic
Complex "Upload -> Sync -> Edit -> Export" flows leading to spaghetti code.
MediumHighModular Architecture: Separate concerns into distinct files (e.g., useTranscriptionWorkflow.ts for logic, ResultsDisplay.tsx for UI). Use useEffect cleanup to handle component unmounting.
Import Integrity Failure
User imports a folder with missing media or corrupt metadata.
HighLowMetadata-Driven Init: The process is initiated strictly by the .syn file. The usePackageImport hook validates that the .syn file exists and that the referenced media files are actually present in the /media directory before allowing the editor to load.

Conclusion

The POC demonstrates that a hybrid approach—leveraging local processing for heavy media handling (FFmpeg) and cloud APIs for intelligence—provides the optimal balance of performance and accuracy. The inclusion of a robust "Edit & Resume" loop ensures that users can refine AI outputs without restarting workflows, making the tool production-ready for legal professionals.