Skip to main content

  • September 16, 2025

When 2500+ Pages Need Summaries: Automating Content Previews with Local AI

At Happy Cog, we've been working with a client on their expansive content ecosystem. Their team has been incredibly productive, creating hundreds of case studies, webinars, blog posts, and resources over the years. Decorative Illustration But there was one growing challenge: most of their content lacked the concise summaries needed for listing pages, search results, and content discovery.

With 2500+ active entries across multiple content types in their Craft CMS site, manually writing summaries would take weeks of dedicated work and likely never get prioritized. At roughly 2-3 minutes per summary, we were looking at 125+ hours of manual labor—and that's assuming perfect efficiency.

We needed an automated solution that could generate high-quality, consistent content previews at scale while keeping costs reasonable. Our answer? Local AI models running through Docker, eliminating per-request API costs while maintaining full control over our data.

Why not just use OpenAI or Anthropic?

This was a perfect use case for generative AI content summarization, but we quickly realized the potential cost pitfalls during our proof of concept. Cloud-based AI services like OpenAI's GPT-4 or Anthropic's Claude would charge per token, and with thousands of entries containing rich, block-based content, costs could easily spiral into hundreds or thousands of dollars -- especially given that we wanted to test multiple times, with different context prompts.

Enter Docker Model Runner (DMR). This tool streamlined the process of safely pulling, running, and serving AI models locally, making it possible to experiment and iterate on large datasets without incurring ongoing API costs. Once the model is downloaded, processing is essentially free—perfect for batch operations like ours.

How do we process 2500+ entries?

We built our solution around a custom Craft CLI command that automates the entire content summarization pipeline using PHP and the Guzzle HTTP client. Here's how it works:

  1. Query entries without summaries across all relevant content sections
  2. Extract and clean content from Craft's flexible Matrix block structure, removing HTML tags and formatting
  3. Generate AI summaries using Qwen2.5 model via local REST API calls
  4. Export results to JSON format for validation and import back into Craft using the FeedMe plugin

This batch-processing approach allowed us to validate results before committing changes and iterate on our prompts without accumulating API costs.

Which model should we use?

We chose the Qwen2.5 model for its excellent balance of performance, quality, and hardware efficiency. Unlike larger models that require expensive GPU infrastructure, Qwen2.5 runs smoothly on standard development machines while producing high-quality summaries.

After installing Docker Model Runner, pulling and running the model was straightforward. We validated the setup using Postman to test API responses before integrating into our Craft command. DMR serves a OpenAI-compatible API, making integration seamless.

How do we actually implement this?

Here's the core method that handles AI summary generation:

/**
 * Generate AI summary for given content using local model
 *
 * @param string $content Cleaned content text (no HTML)
 * @return string|null Generated summary or null on failure
 */
public function generateAiSummary(string $content): ?string
{
    // Environment variables:
    // LLM_BASE_URL: http://localhost:12434 (or http://model-runner.docker.internal for Docker)
    // LLM_MODEL_NAME: ai/qwen2.5

    $client = new GuzzleClient([
        'base_uri' => App::env('LLM_BASE_URL'),
        'timeout' => 30, // 30-second timeout for processing
    ]);

    try {
        $response = $client->request('POST', '/engines/v1/chat/completions', [
            'headers' => [
                'Content-Type' => 'application/json',
            ],
            'json' => [
                'model' => App::env('LLM_MODEL_NAME'),
                'messages' => [
                    // Sample system message
                    [
                        'role' => 'system',
                        'content' => 'You are content copywriter. You must write exactly one paragraph starting with "Discover" or "Learn about". Maximum 120 words. Use simple, clear language. No bullet points or special formatting.'
                    ],
                    // Sample prompt
                    [
                        'role' => 'user',
                        'content' => "Write ONE concise paragraph under 120 words summarizing this content:\n\n```\n{$content}\n```"
                    ]
                ],
                'max_tokens' => 150,
                'temperature' => 0.7, // Slight creativity while maintaining consistency
            ]
        ]);

        if ($response->getStatusCode() === 200) {
            $json = json_decode($response->getBody()->getContents(), true);

            if (isset($json['choices'][0]['message']['content'])) {
                return trim($json['choices'][0]['message']['content']);
            }
        }
    } catch (Exception $e) {
        // Log error for debugging
        Craft::error("AI summary generation failed: " . $e->getMessage());
    }

    return null;
}

Performance and Results

The system processes approximately 15 entries per minute, with each summary taking 3-4 seconds to generate. For our 2500+ entry backlog, this translated to about 3 hours of processing time—dramatically better than the 125+ hours of manual work we'd initially calculated.

More importantly, the quality was consistently high. Our validation process (using cloud AI to review generated summaries against our prompt requirements) showed an initial 79% success rate, which went up to 98% after we stripped the formatting and made further prompt refinements.

What would we do differently?

Working through this project taught us several valuable lessons that we'll carry forward to similar challenges:

The raw HTML approach was a mistake

Initially, we attempted to feed raw HTML content directly to the model, but this was inefficient and produced inconsistent results. We adopted an approach similar to our Algolia search implementations, extracting clean text from Craft's Matrix blocks and removing formatting that could confuse the model.

Prompt engineering is where the magic happens

We spent the majority of our development time refining prompts rather than writing code. Small changes in wording, structure, and instructions had significant impacts on output quality. Testing incrementally with small batches (10-20 entries) helped us isolate the effects of each change.

Start small, then scale

Each summary took 3-4 seconds to process, so we began with small test batches to establish baseline quality before scaling up. This approach helped us identify potential issues early and build confidence in the system's reliability.

Let AI validate AI

After generating our JSON export, we used more robust cloud AI models to validate the generated summaries against our original prompt instructions. This meta-validation approach helped us quickly identify outliers and further refine our prompts.

Reflecting on the Process

This project demonstrated that local AI can be a powerful, cost-effective solution for large-scale content processing tasks. Docker Model Runner's simplicity and the ability to run different models locally made implementation straightforward.

For teams facing similar challenges with large content libraries, local AI offers an attractive alternative to cloud services—especially when data privacy, cost control, and processing volume are key considerations.

The key is starting small, investing time in prompt engineering, and building robust validation processes. With the right approach, you can achieve professional-quality results while maintaining full control over your content and costs.

Back to Top