Crawler Empty-Result Screenshot Upload
Author(s)
- Alapan Das
Last Updated Date
2026-04-23
Version History
| Version | Date | Changes | Author |
|---|---|---|---|
| 1.0 | 2026-04-23 | Initial draft for crawler empty-result screenshot upload flow | Alapan |
Purpose
When a listing crawl returns no results or no usable page content, the crawler captures a screenshot and uploads it to an external API for later debugging.
Scope
This applies to list-page crawling in VehicleScraper.scrape_listings(...) for both Facebook Marketplace and Craigslist.
High-Level Flow
- Crawl page is loaded and extraction is attempted.
- If the scraper detects an empty or failed result, it captures a full-page screenshot.
- The screenshot is uploaded through a multipart/form-data API call.
- The upload target path is derived as
error/{jobId}/{image}.
Trigger Conditions
A screenshot upload is attempted when:
- Extracted listings are empty.
- Page content is missing, including no HTML.
- Craigslist list selector is not found.
Upload API Contract
The uploader sends a multipart request equivalent to:
import requests
url = f"{BASE_URL}/upload"
with open(file_path, "rb") as f:
files = {
"fileContent": (file_name, f, "image/png")
}
data = {
"fileName": file_name,
"folderName": f"error/{job_id}/{file_name}"
}
response = requests.post(url, files=files, data=data, timeout=30)
response.raise_for_status()
Field Mapping
fileContent: binary screenshot payload (image/png)fileName: screenshot filenamefolderName: logical storage path in backend storage, formatted aserror/{jobId}/{image}
URL Construction
Upload URL is built from crawler BASE_URL:
- Upload endpoint:
{BASE_URL}/upload
If BASE_URL is empty, upload is skipped and a warning is logged.
Naming and Storage
Screenshots are first written locally under:
crawler/artifacts/screenshots/
Filename format:
{platform}_{timestamp}_{random}.png
Examples:
fbm_20260422T093455123456Z_abc123.pngcgl_20260422T093500654321Z_def456.png
Job Correlation
job_id is passed from start_filter_crawl(...) into scrape_listings(..., job_id=job_id).
When available, the upload path uses:
error/{jobId}/{image}
If missing, the fallback segment unknown-job is used.
Logging
The feature logs:
- screenshot capture success or failure
- upload success or failure
- skip reason when
BASE_URLis not configured
Current Implementation Files
crawler/app/scraper/scraper.pycrawler/app/scraper/agent.py
Operational Note
This feature does not block crawl completion status publishing. If screenshot upload fails, crawl flow continues and the failure is logged.