Within the realm of cultural heritage, reports serve as repositories of invaluable spatial, temporal, and domain-specific knowledge pertaining to heritage sites. However, the analysis of these vast textual archives, characterized by their diverse structures, poses a formidable challenge. Addressing this challenge head-on, our project introduces a sophisticated solution that harnesses Information Retrieval (IR) and Large Language Model (LLM) techniques for text comprehension and generation.

Our approach not only streamlines the process of automatically identifying heritage site locations within these reports but also enriches the extracted data with relevant contextual information. We have meticulously crafted a user interface that transforms the generated and extracted data into accessible visual representations, fostering interactive dialogues about the report content.

The compelling results of our endeavor underscore the efficacy of the integrated IR, LLM, and geocoding methodologies. By offering seamless descriptions of heritage site locations, we aim to facilitate a more profound exploration of our cultural heritage. Dive into our project to witness how advanced technology can illuminate historical narratives and enhance our understanding of cultural treasures.

Check our GitHub repository showing the code, results and demo of our work: https://github.com/hamzeiehsan/LLM_Heritage_Documents

Special Thansk to:

  • Reza Arabsheibani
  • Mohammad Kazemi Byedokhti
  • Brian Armstrong
  • Rui Xing
  • Ming-Bin Chen

It is a pleasure working with you!