Auto-Sync POC
Author(s)
- Kaustubh Paul
- Sarnick Chakraborty
Submission Date
2026-01-29
Version History
| Version | Date | Changes | Author |
|---|---|---|---|
| 1.0.0 | 2026-01-29 | Initial POC Documentation | Kaustubh Paul, Sarnick Chakraborty |
Objective
The objective of the Auto-Sync POC is to develop a desktop application that automates the synchronization of video depositions with their transcripts. The tool aims to eliminate manual timestamping by using audio extraction, AI-based alignment, and a "Human-in-the-Loop" editing interface, ensuring high-accuracy outputs (.smi, .dvt, .syn) for legal review platforms.
Scope
The POC covers the end-to-end workflow for a single user session:
- Media Ingestion: Drag-and-drop support for Video (
.mp4) and Transcript (.txt) files. - Preprocessing: Local audio extraction using FFmpeg and text sanitization.
- Synchronization: Automated alignment of text to audio via backend API and local Dynamic Time Warping (DTW).
- Review & Edit: A purely frontend editor to adjust timestamps and text before finalization.
- Project Persistence: Ability to save/load sessions and import existing packages via
.synmetadata files. - Export: Generation of industry-standard synchronized files (
.smi,.dvt,.syn) zipped with media.
Out of Scope
- Multiple Video Handling
- Fault Tolerant Mechanism
Related Features
- Start Line Selection: Users can visually select a specific line in the transcript preview to define the start point for synchronization, ignoring header text.
- Session Recovery: Auto-save functionality using
localStorageto recover progress (video paths and transcript text) in case of app closure. - Import Mode: Re-opening a project by parsing the
.synfile to load video metadata and subtitles into the editor without re-processing. - Job Summary: A post-process view displaying processing metrics (confidence, duration) with an option to loop back to the editor or let the user export the project.
- Auto Update: User receives automatic version updates of the Desktop App which can be installed at any time
Technical Approach
The application is built using the Tauri framework for a secure, performant desktop experience.
- Frontend: React (TypeScript) with Material UI for the interface.
- Core Logic: Custom React Hooks (
useTranscriptionWorkflow,usePackageImport) manage the complex state machines. - Local Processing:
- FFmpeg Sidecar: Executed via Tauri shell to strip audio from video files locally.
- File System: Direct OS-level file manipulation (read/write/mkdir) for project structure creation.
- Backend: A C# API endpoint (
/finaltranscript) receives audio and text to perform the heavy alignment calculation.
Workflow
Dependencies
Frontend (Node/React)
Managed via package.json
| Category | Key Packages | Version | Purpose |
|---|---|---|---|
| Core Framework | react, react-dom | ^19.1.0 | UI Rendering |
vite | ^7.0.4 | Build Tooling | |
| Logic & Data | axios | ^1.13.2 | API Requests |
jszip | ^3.10.1 | Zip Generation | |
fuzzball | ^2.2.3 | Text Alignment (DTW) | |
| UI Components | @mui/material | ^7.3.7 | Design System |
@dnd-kit/core | ^6.3.1 | Drag & Drop | |
| Tauri Bindings | @tauri-apps/api | ^2.0.0 | OS Interaction |
@tauri-apps/plugin-shell | ^2.3.4 | FFmpeg Process | |
@tauri-apps/plugin-fs | ^2.4.5 | File System |
Backend (Rust/Tauri)
Managed via src-tauri/Cargo.toml
| Crate | Version | Purpose |
|---|---|---|
tauri | 2.0 | Core Application Framework |
tauri-plugin-shell | 2.0 | Shell command execution |
tauri-plugin-fs | 2.0 | File system operations |
Development Tasks & Estimates
| No | Task Name | Estimate (Hours) | Dependencies | Notes |
|---|---|---|---|---|
| 1 | Project Setup & Tauri Config | 4 hours | None | Config sidecars (FFmpeg) and permissions. |
| 2 | UI Shell & Navigation | 6 hours | Task 1 | Setup App.tsx and routing. |
| 3 | Transcription Workflow Hook | 12 hours | Task 2 | Implement useTranscriptionWorkflow.ts. |
| 4 | FFmpeg Integration | 6 hours | Task 1 | Implement binary execution and error handling. |
| 5 | Editor & Preview Components | 10 hours | Task 3 | Sync player and text correction UI. |
| 6 | Import/Export Logic | 8 hours | Task 5 | Parsing .syn/.smi and Zipping output. |
Testing & Validation
-
Basic Tests:
- Upload: Verify drag-and-drop accepts only valid extensions (
.mp4,.txt) and rejects others. - Sync Flow: Run a 5-minute video; confirm
.smioutput timestamps match audio. - Edit Loop: Modify a subtitle in "Step 3", export, and verify the change persists in the generated file.
- Import: Load a folder containing a
.synfile and ensure the editor populates correctly.
- Upload: Verify drag-and-drop accepts only valid extensions (
-
Validation Criteria:
- The application must not crash if the API returns a 500 error (Error handling check).
- Exported folders must contain the following items:
media/video.mp4media/subtitle.smitranscription/transcript.txtproject.dvtproject.syn
- Resuming a session must restore the exact video list order.
Risks & Mitigations
| Risk | Impact | Likelihood | Mitigation Strategy |
|---|---|---|---|
| Misalignment of Start Time User provides a transcript with headers (e.g., "Page 1") that don't match audio start. | High | High | Preview Selection: Allow user to scroll through the transcript preview and click the exact line where audio begins. Pass this startLine index to the backend to offset alignment. |
| Redundant Uploads in Edit Mode User goes back from "Job Summary" to "Edit" and triggers a re-upload/re-process. | High | Medium | State Persistence: The "Back to Results" path in TranscriptionPage maintains the mappedResult in memory. The workflow checks for existing processing data so the heavy API/FFmpeg steps are skipped during the loop. |
| Unmaintainable State Logic Complex "Upload -> Sync -> Edit -> Export" flows leading to spaghetti code. | Medium | High | Modular Architecture: Separate concerns into distinct files (e.g., useTranscriptionWorkflow.ts for logic, ResultsDisplay.tsx for UI). Use useEffect cleanup to handle component unmounting. |
| Import Integrity Failure User imports a folder with missing media or corrupt metadata. | High | Low | Metadata-Driven Init: The process is initiated strictly by the .syn file. The usePackageImport hook validates that the .syn file exists and that the referenced media files are actually present in the /media directory before allowing the editor to load. |
Conclusion
The POC demonstrates that a hybrid approach—leveraging local processing for heavy media handling (FFmpeg) and cloud APIs for intelligence—provides the optimal balance of performance and accuracy. The inclusion of a robust "Edit & Resume" loop ensures that users can refine AI outputs without restarting workflows, making the tool production-ready for legal professionals.