- Primary list source: https://www.bruzz.be/rss.xml.
BRUZZ scraping instructions
- Primary list source:
https://www.bruzz.be/rss.xml. - Important: the RSS feed has title/URL/date/short description only. It does not contain the full story.
- Workflow:
- Fetch RSS.
- Filter items to the target date and URLs under
https://www.bruzz.be/actua/. - Scrape every selected article page with Lightpanda:
/usr/local/bin/lightpanda fetch --dump markdown --obey_robots <article-url>. - Use the scraped page text for summaries/excerpts, falling back to RSS description only if page scraping fails.
- Do not use
https://www.bruzz.be/actuaas the main listing page: as of 2026-05-14 it returns a Drupal 404 page. - Keep Brussels news only; skip navigation, live radio blocks, ads, and repeated teasers.
- Language: source is Dutch. Translate summaries into the target review language when writing the final review.
- Cache paths:
- raw RSS:
docs/cache/press-raw/bruzz-rss.xml - article metadata:
docs/cache/press-raw/bruzz-articles.json - full scraped article pages:
docs/cache/press-raw/bruzz-pages/*.md
- raw RSS: