LAUNCHES

Lightfeed Releases Open-Source LLM Web Extractor in TypeScript

M megaone_admin Mar 26, 2026 1 min read
Engine Score 7/10 — Important

This is a highly actionable new developer tool for robust LLM-powered web extraction, directly impacting a segment of the developer community. Its novelty and timeliness are high for its target audience, despite a more niche industry impact.

Editorial illustration for: Lightfeed Releases Open-Source LLM Web Extractor in TypeScript

Lightfeed has released an open-source TypeScript library that combines large language models with browser automation to extract structured data from websites. The Lightfeed Extractor allows developers to use natural language prompts to navigate web pages and extract data according to predefined schemas.

The library integrates Playwright browser automation with LLM providers including OpenAI, Google Gemini, Anthropic, and local Ollama models. According to the project documentation, it converts HTML to “LLM-ready markdown” and uses “LLMs in JSON mode to extract structured data according to input Zod schema.”

Key technical features include stealth-mode browser automation with anti-bot patches, JSON recovery mechanisms for failed extractions, and URL validation for handling relative links. The system tracks token usage and includes limits for production deployments. The library can launch browsers locally, in serverless environments, or connect to remote browser servers.

The project targets e-commerce use cases, with example code showing product catalog extraction that captures names, brands, prices, and ratings from retail websites. The documentation states the tool is designed for “complete, accurate results with great token efficiency — critical for production data pipelines.”

The GitHub repository shows 208 stars and 8 forks, with 52 commits to the main branch. Lightfeed also operates a commercial platform at app.lightfeed.ai for retail competitor intelligence, suggesting the open-source release stems from their production web scraping infrastructure.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime

M
MegaOne AI Editorial Team

MegaOne AI monitors 200+ sources daily to identify and score the most important AI developments. Our editorial team reviews 200+ sources with rigorous oversight to deliver accurate, scored coverage of the AI industry. Every story is fact-checked, linked to primary sources, and rated using our six-factor Engine Score methodology.

About Us Editorial Policy