Object Recognition and Conversational AI in Real-World Contexts
Enhancing Museum Experiences through Interactive Systems
DOI:
https://doi.org/10.62161/sauc.v11.6010Keywords:
Artificial Intelligence, Generative AI, Cultural Heritage, Museums, Object Detection, Information Retrieval, Context Awareness, RAGAbstract
This project addresses the challenge of improving museum visitors’ experiences by moving beyond static and traditional information access methods. It presents the design and validation of an interactive system that combines real time object detection with a Retrieval Augmented Generation pipeline to offer a context-aware, personalized and immersive conversational guide.
The results verify an accurate spatial and conversational understanding, and a significant improvement in the veracity and relevance of generated responses in comparison with standard LLM responses. This project demonstrates the system’s potential to offer a dynamic and attractive access to cultural heritage.
Downloads
Global Statistics ℹ️
|
16
Views
|
9
Downloads
|
|
25
Total
|
|
References
Ask Mona. (2025). Ask Mona. https://www.askmona.fr
Breitner, A. R., & Bandung, Y. (2024). Development of visitor interest detection and tracking system in the museums. Journal of Sustainable Engineering: Proceedings Series, 2(1), 7–12. https://doi.org/10.35793/joseps.v2i1.1279
Bruch, S., Gai, S., & Ingber, A. (2023). An analysis of fusion functions for hybrid retrieval. ACM Transactions on Information Systems, 42(1), 1–35. https://doi.org/10.1145/3596512
Boulakal, F., & Hadi, W. M. E. (2025). Cultural & Knowledge Spaces: the Immersive Museums as a Challenge for KO and the Digital Humanities. Informatio, 30(1), e205. https://doi.org/10.35643/info.30.1.10
Bu, F., Wang, Z., Wang, S., & Liu, Z. (2025). An investigation into value misalignment in LLM-generated texts for cultural heritage. arXiv, 1. https://doi.org/10.48550/arXiv.2501.02039
Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. En Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336). ACM. https://doi.org/10.1145/290941.291025
Cetinić, E., Lipić, T., & Grgić, S. (2018). Fine-tuning convolutional neural networks for fine art classification. Expert Systems with Applications, 114, 107–118. https://doi.org/10.1016/j.eswa.2018.07.026
CVAT. (2025). CVAT: Computer Vision Annotation Tool [Computer software]. https://github.com/opencv/cvat
Damiano, R., Kuflik, T., Wecker, A. J., Striani, M., Lieto, A., Bruni, L. E., Kadastik, N., & Pedersen, T. A. (2022). Exploring values in museum artifacts in the SPICE project: A preliminary study. En Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’22 Adjunct) (pp. 391–396). Association for Computing Machinery. https://doi.org/10.1145/3511047.3537662
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. En Proceedings of NAACL-HLT 2019 (pp. 4171–4186). Association for Computational Linguistics. https://aclanthology.org/N19-1423/
Du, X., Zheng, G., Wang, K., Feng, J., Deng, W., Liu, M., Chen, B., Peng, X., Ma, T., & Lou, Y. (2024). Vul-RAG: Enhancing LLM-based vulnerability detection via knowledge-level RAG. arXiv, 1. https://doi.org/10.48550/arXiv.2406.11147
European Commission. (2019). Ethics guidelines for trustworthy AI. Independent High-Level Expert Group on Artificial Intelligence set up by the European Commission. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai
European Parliament & Council. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Official Journal of the European Union, L119, 1–88. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679
Fortuna-Cervantes, J. M., Soubervielle-Montalvo, C., Puente-Montejano, C. A., Pérez-Cham, O. E., & Peña-Gallardo, R. (2024). Evaluation of CNN models with transfer learning in art media classification in terms of accuracy and class relationship. Computación y Sistemas, 28(1), 233–244. https://doi.org/10.13053/cys-28-1-4895
Güven, Ç., Alishahi, A., Brighton, H., Nápoles, G., Olier, J. S., Šafář, M., Postma, E., Shterionov, D., De Sisto, M., & Vanmassenhove, E. (2025). AI in support of diversity and inclusion. arXiv. https://doi.org/10.48550/arXiv.2501.09534
Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. En Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769–6781). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.550
LangChain. (2025). LangChain [Computer software]. https://github.com/langchain-ai/langchain
Levy, M., Jacoby, A., & Goldberg, Y. (2024). Same task, more tokens: The impact of input length on the reasoning performance of large language models. En Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15339–15353). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.818
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. En European conference on computer vision (ECCV 2014) (pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. En European Conference on Computer Vision (ECCV 2016) (pp. 21–37). Springer. https://doi.org/10.1007/978-3-319-46448-9_2
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://aclanthology.org/2024.tacl-1.9/
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., Zhu, J., & Zhang, L. (2024). Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv. https://arxiv.org/abs/2303.05499
Loffredo, R., & De Santo, M. (2024). Using ontologies for LLM applications in cultural heritage. In CEUR Workshop Proceedings (Vol. 3865). https://ceur-ws.org/Vol-3865/06_paper.pdf
Lundgren, L., Stofer, K., Dunckel, B., Krieger, J., Lange, M., & James, V. (2019). Panel-based exhibit using participatory design elements may motivate behavior change. Journal of Science Communication, 18(02), A03. https://doi.org/10.22323/2.18020203
Luo, C., Li, X., Wang, L., He, J., Li, D., & Zhou, J. (2018). How does the data set affect CNN-based image classification performance? En 2018 5th International Conference on Systems and Informatics (ICSAI) (pp. 361–366). IEEE. https://doi.org/10.1109/ICSAI.2018.8599448
Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). Query rewriting for retrieval-augmented large language models. En Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 5303–5315). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.322
McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform manifold approximation and projection. Journal of OpenSource Software, 3(29), 861. https://doi.org/10.21105/joss.00861
Meyer, L. S., Engel Aaen, J., Tranberg, A. R., Kun, P., Freiberger, M., Risi, S., & Løvlie, A. S. (2024). Algorithmic ways of seeing: Using object detection to facilitate art exploration. En CHI ’24: Proceedings of the CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery. https://doi.org/10.1145/3613904.3642157
Museo del Louvre. (2025). Collections site JSON documentation [Website]. Retrieved January 20, 2025, from https://collections.louvre.fr/en/page/documentationJSON
Nubart. (2025). Nubart [Mobile application]. https://www.nubart.eu
Ollama. (2025). Ollama [Computer software]. https://github.com/ollama/ollama
Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. En 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6517–6525). IEEE. https://doi.org/10.1109/CVPR.2017.690
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28. https://arxiv.org/abs/1506.01497
Robben, H. [Wanderlust Travel Videos]. (2019, 31 de mayo). Louvre Museum Paris – Mona Lisa – walking tour | 4K [Video]. YouTube. https://youtu.be/6vuFh6NNa70
Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019
Rosa, G. M., Bonifacio, L. H., Jeronymo, V., Abonizio, H. Q., Fadaee, M., Lotufo, R. A., & Nogueira, R. (2022). In defense of cross-encoders for zero-shot retrieval. arXiv. https://arxiv.org/abs/2212.06121
Sahoo, P. K., Sharma, N., Mehta, P., Kumar, S., Garg, A., … & Pratama, M. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv. https://doi.org/10.48550/arXiv.2402.07927
Smartify. (2025). Smartify [Mobile application]. https://smartify.org
Smith, J. K., & Smith, L. F. (2001). Spending time on art. Empirical Studies of the Arts, 19(2), 229–236. https://doi.org/10.2190/5MQM-JWH6-V2P4-7DLK
Smith, J. K., Smith, L. F., & Tinio, P. P. L. (2017). Time spent viewing art and reading labels. Psychology of Aesthetics, Creativity, and the Arts, 11(1), 77–85. https://doi.org/10.1037/aca0000049
Smith, B., & Troynikov, A. (2024, 3 de julio). Evaluating chunking strategies for retrieval (Chroma Technical Report). Chroma. https://research.trychroma.com/evaluating-chunking-strategies-in-retrieval
Springmann, U., Lüdeling, A., & Ernst, F. (2017). OCR of historical printings with an application to building diachronic corpora: The RIDGES herbal corpus. Digital Humanities Quarterly, 11(2). http://www.digitalhumanities.org/dhq/vol/11/2/000291/000291.html
United Nations General Assembly. (2015). Transforming our world: The 2030 agenda for sustainable development (A/RES/70/1). https://sustainabledevelopment.un.org/post2015/transformingourworld/publication
Vastakas, L. (2024). Cultural heritage search with large language models: Enhancing the discoverability of cultural heritage artifacts through large language model-based search systems [Master’s thesis, Linnaeus University]. DiVA portal. https://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-132431
Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., & Ding, G. (2024). YOLOv10: Real-time end-to-end object detection. arXiv. https://arxiv.org/abs/2405.14458
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. En NeurIPS 2020. https://proceedings.neurips.cc/paper/2020/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Wu, J., Zhu, J., Qi, Y., Chen, J., Xu, M., Menolascina, F., & Grau, V. (2024). Medical graph RAG: Towards safe medical large language model via graph retrieval-augmented generation. arXiv. https://doi.org/10.48550/arXiv.2408.04187
Yano, T., & Kang, M. (2008). Taking advantage of Wikipedia in natural language processing. Language Technologies Institute, Carnegie Mellon University. https://www.cs.cmu.edu/~taey/pub/wiki.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Authors retain copyright and transfer to the journal the right of first publication and publishing rights

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
Those authors who publish in this journal accept the following terms:
-
Authors retain copyright.
-
Authors transfer to the journal the right of first publication. The journal also owns the publishing rights.
-
All published contents are governed by an Attribution-NoDerivatives 4.0 International License.
Access the informative version and legal text of the license. By virtue of this, third parties are allowed to use what is published as long as they mention the authorship of the work and the first publication in this journal. If you transform the material, you may not distribute the modified work. -
Authors may make other independent and additional contractual arrangements for non-exclusive distribution of the version of the article published in this journal (e.g., inclusion in an institutional repository or publication in a book) as long as they clearly indicate that the work was first published in this journal.
- Authors are allowed and recommended to publish their work on the Internet (for example on institutional and personal websites), following the publication of, and referencing the journal, as this could lead to constructive exchanges and a more extensive and quick circulation of published works (see The Effect of Open Access).







