WARC-GPT: An open-source tool for exploring web archives using AI


The Harvard Library Innovation Lab has released WARC-GPT, an open-source tool leveraging AI to explore web archives. WARC-GPT uses Retrieval Augmented Generation to create chatbots informed by web archive files, enabling natural language exploration of archived content without relying on traditional keyword searches.

  • WARC-GPT is a tool for exploring web archives using AI.
  • It enables creating chatbots with web archives as knowledge bases.
  • Uses Retrieval Augmented Generation for search and summarization.
  • Aims to complement traditional keyword-based search methods.
  • Open-source, customizable, and can interact with various LLM APIs.