Object Recognition and Conversational AI in Real-World Contexts

Enhancing Museum Experiences through Interactive Systems

Authors

DOI:

https://doi.org/10.62161/sauc.v11.6010

Keywords:

Artificial Intelligence, Generative AI, Cultural Heritage, Museums, Object Detection, Information Retrieval, Context Awareness, RAG

Abstract

This project addresses the challenge of improving museum visitors’ experiences by moving beyond static and traditional information access methods. It presents the design and validation of an interactive system that combines real time object detection with a Retrieval Augmented Generation pipeline to offer a context-aware, personalized and immersive conversational guide.

The results verify an accurate spatial and conversational understanding, and a significant improvement in the veracity and relevance of generated responses in comparison with standard LLM responses. This project demonstrates the system’s potential to offer a dynamic and attractive access to cultural heritage.

Downloads

Download data is not yet available.

Global Statistics ℹ️

Cumulative totals since publication
16
Views
9
Downloads
25
Total
Downloads by format:
PDF (Español (España)) 4 PDF 5

References

Ask Mona. (2025). Ask Mona. https://www.askmona.fr

Breitner, A. R., & Bandung, Y. (2024). Development of visitor interest detection and tracking system in the museums. Journal of Sustainable Engineering: Proceedings Series, 2(1), 7–12. https://doi.org/10.35793/joseps.v2i1.1279

Bruch, S., Gai, S., & Ingber, A. (2023). An analysis of fusion functions for hybrid retrieval. ACM Transactions on Information Systems, 42(1), 1–35. https://doi.org/10.1145/3596512

Boulakal, F., & Hadi, W. M. E. (2025). Cultural & Knowledge Spaces: the Immersive Museums as a Challenge for KO and the Digital Humanities. Informatio, 30(1), e205. https://doi.org/10.35643/info.30.1.10

Bu, F., Wang, Z., Wang, S., & Liu, Z. (2025). An investigation into value misalignment in LLM-generated texts for cultural heritage. arXiv, 1. https://doi.org/10.48550/arXiv.2501.02039

Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. En Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336). ACM. https://doi.org/10.1145/290941.291025

Cetinić, E., Lipić, T., & Grgić, S. (2018). Fine-tuning convolutional neural networks for fine art classification. Expert Systems with Applications, 114, 107–118. https://doi.org/10.1016/j.eswa.2018.07.026

CVAT. (2025). CVAT: Computer Vision Annotation Tool [Computer software]. https://github.com/opencv/cvat

Damiano, R., Kuflik, T., Wecker, A. J., Striani, M., Lieto, A., Bruni, L. E., Kadastik, N., & Pedersen, T. A. (2022). Exploring values in museum artifacts in the SPICE project: A preliminary study. En Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’22 Adjunct) (pp. 391–396). Association for Computing Machinery. https://doi.org/10.1145/3511047.3537662

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. En Proceedings of NAACL-HLT 2019 (pp. 4171–4186). Association for Computational Linguistics. https://aclanthology.org/N19-1423/

Du, X., Zheng, G., Wang, K., Feng, J., Deng, W., Liu, M., Chen, B., Peng, X., Ma, T., & Lou, Y. (2024). Vul-RAG: Enhancing LLM-based vulnerability detection via knowledge-level RAG. arXiv, 1. https://doi.org/10.48550/arXiv.2406.11147

European Commission. (2019). Ethics guidelines for trustworthy AI. Independent High-Level Expert Group on Artificial Intelligence set up by the European Commission. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

European Parliament & Council. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Official Journal of the European Union, L119, 1–88. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679

Fortuna-Cervantes, J. M., Soubervielle-Montalvo, C., Puente-Montejano, C. A., Pérez-Cham, O. E., & Peña-Gallardo, R. (2024). Evaluation of CNN models with transfer learning in art media classification in terms of accuracy and class relationship. Computación y Sistemas, 28(1), 233–244. https://doi.org/10.13053/cys-28-1-4895

Güven, Ç., Alishahi, A., Brighton, H., Nápoles, G., Olier, J. S., Šafář, M., Postma, E., Shterionov, D., De Sisto, M., & Vanmassenhove, E. (2025). AI in support of diversity and inclusion. arXiv. https://doi.org/10.48550/arXiv.2501.09534

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. En Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769–6781). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.550

LangChain. (2025). LangChain [Computer software]. https://github.com/langchain-ai/langchain

Levy, M., Jacoby, A., & Goldberg, Y. (2024). Same task, more tokens: The impact of input length on the reasoning performance of large language models. En Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15339–15353). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.818

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. En European conference on computer vision (ECCV 2014) (pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. En European Conference on Computer Vision (ECCV 2016) (pp. 21–37). Springer. https://doi.org/10.1007/978-3-319-46448-9_2

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://aclanthology.org/2024.tacl-1.9/

Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., Zhu, J., & Zhang, L. (2024). Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv. https://arxiv.org/abs/2303.05499

Loffredo, R., & De Santo, M. (2024). Using ontologies for LLM applications in cultural heritage. In CEUR Workshop Proceedings (Vol. 3865). https://ceur-ws.org/Vol-3865/06_paper.pdf

Lundgren, L., Stofer, K., Dunckel, B., Krieger, J., Lange, M., & James, V. (2019). Panel-based exhibit using participatory design elements may motivate behavior change. Journal of Science Communication, 18(02), A03. https://doi.org/10.22323/2.18020203

Luo, C., Li, X., Wang, L., He, J., Li, D., & Zhou, J. (2018). How does the data set affect CNN-based image classification performance? En 2018 5th International Conference on Systems and Informatics (ICSAI) (pp. 361–366). IEEE. https://doi.org/10.1109/ICSAI.2018.8599448

Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). Query rewriting for retrieval-augmented large language models. En Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 5303–5315). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.322

McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform manifold approximation and projection. Journal of OpenSource Software, 3(29), 861. https://doi.org/10.21105/joss.00861

Meyer, L. S., Engel Aaen, J., Tranberg, A. R., Kun, P., Freiberger, M., Risi, S., & Løvlie, A. S. (2024). Algorithmic ways of seeing: Using object detection to facilitate art exploration. En CHI ’24: Proceedings of the CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery. https://doi.org/10.1145/3613904.3642157

Museo del Louvre. (2025). Collections site JSON documentation [Website]. Retrieved January 20, 2025, from https://collections.louvre.fr/en/page/documentationJSON

Nubart. (2025). Nubart [Mobile application]. https://www.nubart.eu

Ollama. (2025). Ollama [Computer software]. https://github.com/ollama/ollama

Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. En 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6517–6525). IEEE. https://doi.org/10.1109/CVPR.2017.690

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28. https://arxiv.org/abs/1506.01497

Robben, H. [Wanderlust Travel Videos]. (2019, 31 de mayo). Louvre Museum Paris – Mona Lisa – walking tour | 4K [Video]. YouTube. https://youtu.be/6vuFh6NNa70

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019

Rosa, G. M., Bonifacio, L. H., Jeronymo, V., Abonizio, H. Q., Fadaee, M., Lotufo, R. A., & Nogueira, R. (2022). In defense of cross-encoders for zero-shot retrieval. arXiv. https://arxiv.org/abs/2212.06121

Sahoo, P. K., Sharma, N., Mehta, P., Kumar, S., Garg, A., … & Pratama, M. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv. https://doi.org/10.48550/arXiv.2402.07927

Smartify. (2025). Smartify [Mobile application]. https://smartify.org

Smith, J. K., & Smith, L. F. (2001). Spending time on art. Empirical Studies of the Arts, 19(2), 229–236. https://doi.org/10.2190/5MQM-JWH6-V2P4-7DLK

Smith, J. K., Smith, L. F., & Tinio, P. P. L. (2017). Time spent viewing art and reading labels. Psychology of Aesthetics, Creativity, and the Arts, 11(1), 77–85. https://doi.org/10.1037/aca0000049

Smith, B., & Troynikov, A. (2024, 3 de julio). Evaluating chunking strategies for retrieval (Chroma Technical Report). Chroma. https://research.trychroma.com/evaluating-chunking-strategies-in-retrieval

Springmann, U., Lüdeling, A., & Ernst, F. (2017). OCR of historical printings with an application to building diachronic corpora: The RIDGES herbal corpus. Digital Humanities Quarterly, 11(2). http://www.digitalhumanities.org/dhq/vol/11/2/000291/000291.html

United Nations General Assembly. (2015). Transforming our world: The 2030 agenda for sustainable development (A/RES/70/1). https://sustainabledevelopment.un.org/post2015/transformingourworld/publication

Vastakas, L. (2024). Cultural heritage search with large language models: Enhancing the discoverability of cultural heritage artifacts through large language model-based search systems [Master’s thesis, Linnaeus University]. DiVA portal. https://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-132431

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., & Ding, G. (2024). YOLOv10: Real-time end-to-end object detection. arXiv. https://arxiv.org/abs/2405.14458

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. En NeurIPS 2020. https://proceedings.neurips.cc/paper/2020/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Wu, J., Zhu, J., Qi, Y., Chen, J., Xu, M., Menolascina, F., & Grau, V. (2024). Medical graph RAG: Towards safe medical large language model via graph retrieval-augmented generation. arXiv. https://doi.org/10.48550/arXiv.2408.04187

Yano, T., & Kang, M. (2008). Taking advantage of Wikipedia in natural language processing. Language Technologies Institute, Carnegie Mellon University. https://www.cs.cmu.edu/~taey/pub/wiki.pdf

Published

2025-11-28

How to Cite

Ortiz, A., Illana, Álvaro, & Salas, M. (2025). Object Recognition and Conversational AI in Real-World Contexts: Enhancing Museum Experiences through Interactive Systems. Street Art & Urban Creativity, 11(7), 21–47. https://doi.org/10.62161/sauc.v11.6010