Lightfeed has released an open-source TypeScript library that combines large language models with browser automation to extract structured data from websites. The Lightfeed Extractor allows developers to use natural language prompts to navigate web pages and extract data according to predefined schemas.
The library integrates Playwright browser automation with LLM providers including OpenAI, Google Gemini, Anthropic, and local Ollama models. According to the project documentation, it converts HTML to “LLM-ready markdown” and uses “LLMs in JSON mode to extract structured data according to input Zod schema.”
Key technical features include stealth-mode browser automation with anti-bot patches, JSON recovery mechanisms for failed extractions, and URL validation for handling relative links. The system tracks token usage and includes limits for production deployments. The library can launch browsers locally, in serverless environments, or connect to remote browser servers.
The project targets e-commerce use cases, with example code showing product catalog extraction that captures names, brands, prices, and ratings from retail websites. The documentation states the tool is designed for “complete, accurate results with great token efficiency — critical for production data pipelines.”
The GitHub repository shows 208 stars and 8 forks, with 52 commits to the main branch. Lightfeed also operates a commercial platform at app.lightfeed.ai for retail competitor intelligence, suggesting the open-source release stems from their production web scraping infrastructure.
