Using Screenshots for AI Training Data Collection

Screenshots for AI Training

Machine learning models for web layout analysis, UI component detection, and visual understanding require large datasets of website screenshots. SnapAPI enables efficient, automated collection of diverse visual training data.

Layout Analysis Training Data

Train models to understand web page structure by capturing screenshots across diverse websites. Vary viewport sizes, color schemes, and content types to build robust datasets.

const urls = [
  'https://github.com', 'https://stripe.com', 'https://airbnb.com',
  'https://nytimes.com', 'https://reddit.com', 'https://stackoverflow.com'
];

for (const url of urls) {
  for (const width of [375, 768, 1280, 1920]) {
    const res = await fetch(
      `https://apisnap.dev/api/screenshot?url=${encodeURIComponent(url)}&width=${width}&format=png`,
      { headers: { Authorization: `Bearer ${API_KEY}` } }
    );
    const buffer = Buffer.from(await res.arrayBuffer());
    const filename = `${new URL(url).hostname}_${width}.png`;
    fs.writeFileSync(`dataset/${filename}`, buffer);
  }
}

UI Component Detection

Use SnapAPI's selector parameter to capture individual UI components for object detection training. Target buttons, navigation bars, forms, cards, and modals across different websites to build a diverse component dataset.

Data Augmentation

Capture the same pages at different viewport widths to create responsive layout variations. Use dark_mode=true for color scheme variants. This increases dataset diversity without needing more source URLs.

Batch Processing Tips

Rate limit your batch collection to avoid overwhelming the API. Use async/await with a concurrency limiter. Save metadata (URL, dimensions, timestamp) alongside each screenshot for proper dataset labeling.