Skip to main content
Version: MarketPulse

Crawler Empty-Result Screenshot Upload

Author(s)

  • Alapan Das

Last Updated Date

2026-04-23


Version History

VersionDateChangesAuthor
1.02026-04-23Initial draft for crawler empty-result screenshot upload flowAlapan

Purpose

When a listing crawl returns no results or no usable page content, the crawler captures a screenshot and uploads it to an external API for later debugging.

Scope

This applies to list-page crawling in VehicleScraper.scrape_listings(...) for both Facebook Marketplace and Craigslist.

High-Level Flow

  1. Crawl page is loaded and extraction is attempted.
  2. If the scraper detects an empty or failed result, it captures a full-page screenshot.
  3. The screenshot is uploaded through a multipart/form-data API call.
  4. The upload target path is derived as error/{jobId}/{image}.

Trigger Conditions

A screenshot upload is attempted when:

  • Extracted listings are empty.
  • Page content is missing, including no HTML.
  • Craigslist list selector is not found.

Upload API Contract

The uploader sends a multipart request equivalent to:

import requests

url = f"{BASE_URL}/upload"

with open(file_path, "rb") as f:
files = {
"fileContent": (file_name, f, "image/png")
}
data = {
"fileName": file_name,
"folderName": f"error/{job_id}/{file_name}"
}

response = requests.post(url, files=files, data=data, timeout=30)
response.raise_for_status()

Field Mapping

  • fileContent: binary screenshot payload (image/png)
  • fileName: screenshot filename
  • folderName: logical storage path in backend storage, formatted as error/{jobId}/{image}

URL Construction

Upload URL is built from crawler BASE_URL:

  • Upload endpoint: {BASE_URL}/upload

If BASE_URL is empty, upload is skipped and a warning is logged.

Naming and Storage

Screenshots are first written locally under:

  • crawler/artifacts/screenshots/

Filename format:

  • {platform}_{timestamp}_{random}.png

Examples:

  • fbm_20260422T093455123456Z_abc123.png
  • cgl_20260422T093500654321Z_def456.png

Job Correlation

job_id is passed from start_filter_crawl(...) into scrape_listings(..., job_id=job_id).

When available, the upload path uses:

  • error/{jobId}/{image}

If missing, the fallback segment unknown-job is used.

Logging

The feature logs:

  • screenshot capture success or failure
  • upload success or failure
  • skip reason when BASE_URL is not configured

Current Implementation Files

  • crawler/app/scraper/scraper.py
  • crawler/app/scraper/agent.py

Operational Note

This feature does not block crawl completion status publishing. If screenshot upload fails, crawl flow continues and the failure is logged.