Object Recognition and Conversational AI in Real-World Contexts: Enhancing Museum Experiences through Interactive Systems

Adrián Ortiz; Álvaro Illana; Marta Salas

doi:10.62161/sauc.v11.6010

Autores/as

Adrián Ortiz UFV https://orcid.org/0009-0001-2059-2424
Álvaro Illana UFV https://orcid.org/0009-0000-4268-8147
Marta Salas UFV https://orcid.org/0009-0000-6624-6896

DOI:

https://doi.org/10.62161/sauc.v11.6010

Palabras clave:

Inteligencia artificial, IA generativa, Patrimonio cultural, Museos, Detección de objetos, Recuperación de información, Conciencia del contexto, RAG

Resumen

Este proyecto busca mejorar la experiencia museística de los visitantes, superando los métodos tradicionales de acceso a la información. Presenta un sistema interactivo que combina la detección de objetos en tiempo real con la generación aumentada por recuperación (Retrieval Augmented Generation) para ofrecer una guía conversacional inmersiva, personalizada y sensible al contexto.

Los resultados evidencian una comprensión espacial y conversacional precisa, así como una mejora significativa en la veracidad y la relevancia de las respuestas generadas frente a las de un LLM estándar. Este proyecto demuestra el potencial del sistema para ofrecer un acceso dinámico y atractivo al patrimonio cultural.

Descargas

Los datos de descargas todavía no están disponibles.

Estadísticas globales

784 Visualizaciones	225 Descargas
1009 Total

Descargas por formato:

PDF 128 PDF (English) 97

Citas

Ask Mona. (2025). Ask Mona. https://www.askmona.fr

Breitner, A. R., & Bandung, Y. (2024). Development of visitor interest detection and tracking system in the museums. Journal of Sustainable Engineering: Proceedings Series, 2(1), 7–12. https://doi.org/10.35793/joseps.v2i1.1279

Bruch, S., Gai, S., & Ingber, A. (2023). An analysis of fusion functions for hybrid retrieval. ACM Transactions on Information Systems, 42(1), 1–35. https://doi.org/10.1145/3596512

Boulakal, F., & Hadi, W. M. E. (2025). Cultural & Knowledge Spaces: the Immersive Museums as a Challenge for KO and the Digital Humanities. Informatio, 30(1), e205. https://doi.org/10.35643/info.30.1.10

Bu, F., Wang, Z., Wang, S., & Liu, Z. (2025). An investigation into value misalignment in LLM-generated texts for cultural heritage. arXiv, 1. https://doi.org/10.48550/arXiv.2501.02039

Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. En Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 335–336). ACM. https://doi.org/10.1145/290941.291025

Cetinić, E., Lipić, T., & Grgić, S. (2018). Fine-tuning convolutional neural networks for fine art classification. Expert Systems with Applications, 114, 107–118. https://doi.org/10.1016/j.eswa.2018.07.026

CVAT. (2025). CVAT: Computer Vision Annotation Tool [Computer software]. https://github.com/opencv/cvat

Damiano, R., Kuflik, T., Wecker, A. J., Striani, M., Lieto, A., Bruni, L. E., Kadastik, N., & Pedersen, T. A. (2022). Exploring values in museum artifacts in the SPICE project: A preliminary study. En Adjunct Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’22 Adjunct) (pp. 391–396). Association for Computing Machinery. https://doi.org/10.1145/3511047.3537662

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. En Proceedings of NAACL-HLT 2019 (pp. 4171–4186). Association for Computational Linguistics. https://aclanthology.org/N19-1423/

Du, X., Zheng, G., Wang, K., Feng, J., Deng, W., Liu, M., Chen, B., Peng, X., Ma, T., & Lou, Y. (2024). Vul-RAG: Enhancing LLM-based vulnerability detection via knowledge-level RAG. arXiv, 1. https://doi.org/10.48550/arXiv.2406.11147

European Commission. (2019). Ethics guidelines for trustworthy AI. Independent High-Level Expert Group on Artificial Intelligence set up by the European Commission. https://digital-strategy.ec.europa.eu/en/library/ethics-guidelines-trustworthy-ai

European Parliament & Council. (2016). Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation). Official Journal of the European Union, L119, 1–88. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679

Fortuna-Cervantes, J. M., Soubervielle-Montalvo, C., Puente-Montejano, C. A., Pérez-Cham, O. E., & Peña-Gallardo, R. (2024). Evaluation of CNN models with transfer learning in art media classification in terms of accuracy and class relationship. Computación y Sistemas, 28(1), 233–244. https://doi.org/10.13053/cys-28-1-4895

Güven, Ç., Alishahi, A., Brighton, H., Nápoles, G., Olier, J. S., Šafář, M., Postma, E., Shterionov, D., De Sisto, M., & Vanmassenhove, E. (2025). AI in support of diversity and inclusion. arXiv. https://doi.org/10.48550/arXiv.2501.09534

Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., Chen, D., & Yih, W.-t. (2020). Dense passage retrieval for open-domain question answering. En Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6769–6781). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.550

LangChain. (2025). LangChain [Computer software]. https://github.com/langchain-ai/langchain

Levy, M., Jacoby, A., & Goldberg, Y. (2024). Same task, more tokens: The impact of input length on the reasoning performance of large language models. En Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 15339–15353). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.818

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. En European conference on computer vision (ECCV 2014) (pp. 740–755). Springer. https://doi.org/10.1007/978-3-319-10602-1_48

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. En European Conference on Computer Vision (ECCV 2016) (pp. 21–37). Springer. https://doi.org/10.1007/978-3-319-46448-9_2

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://aclanthology.org/2024.tacl-1.9/

Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Jiang, Q., Li, C., Yang, J., Su, H., Zhu, J., & Zhang, L. (2024). Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv. https://arxiv.org/abs/2303.05499

Loffredo, R., & De Santo, M. (2024). Using ontologies for LLM applications in cultural heritage. In CEUR Workshop Proceedings (Vol. 3865). https://ceur-ws.org/Vol-3865/06_paper.pdf

Lundgren, L., Stofer, K., Dunckel, B., Krieger, J., Lange, M., & James, V. (2019). Panel-based exhibit using participatory design elements may motivate behavior change. Journal of Science Communication, 18(02), A03. https://doi.org/10.22323/2.18020203

Luo, C., Li, X., Wang, L., He, J., Li, D., & Zhou, J. (2018). How does the data set affect CNN-based image classification performance? En 2018 5th International Conference on Systems and Informatics (ICSAI) (pp. 361–366). IEEE. https://doi.org/10.1109/ICSAI.2018.8599448

Ma, X., Gong, Y., He, P., Zhao, H., & Duan, N. (2023). Query rewriting for retrieval-augmented large language models. En Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 5303–5315). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.322

McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform manifold approximation and projection. Journal of OpenSource Software, 3(29), 861. https://doi.org/10.21105/joss.00861

Meyer, L. S., Engel Aaen, J., Tranberg, A. R., Kun, P., Freiberger, M., Risi, S., & Løvlie, A. S. (2024). Algorithmic ways of seeing: Using object detection to facilitate art exploration. En CHI ’24: Proceedings of the CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery. https://doi.org/10.1145/3613904.3642157

Museo del Louvre. (2025). Collections site JSON documentation [Website]. Retrieved January 20, 2025, from https://collections.louvre.fr/en/page/documentationJSON

Nubart. (2025). Nubart [Mobile application]. https://www.nubart.eu

Ollama. (2025). Ollama [Computer software]. https://github.com/ollama/ollama

Redmon, J., & Farhadi, A. (2017). YOLO9000: Better, faster, stronger. En 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 6517–6525). IEEE. https://doi.org/10.1109/CVPR.2017.690

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems, 28. https://arxiv.org/abs/1506.01497

Robben, H. [Wanderlust Travel Videos]. (2019, 31 de mayo). Louvre Museum Paris – Mona Lisa – walking tour | 4K [Video]. YouTube. https://youtu.be/6vuFh6NNa70

Robertson, S., & Zaragoza, H. (2009). The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389. https://doi.org/10.1561/1500000019

Rosa, G. M., Bonifacio, L. H., Jeronymo, V., Abonizio, H. Q., Fadaee, M., Lotufo, R. A., & Nogueira, R. (2022). In defense of cross-encoders for zero-shot retrieval. arXiv. https://arxiv.org/abs/2212.06121

Sahoo, P. K., Sharma, N., Mehta, P., Kumar, S., Garg, A., … & Pratama, M. (2024). A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv. https://doi.org/10.48550/arXiv.2402.07927

Smartify. (2025). Smartify [Mobile application]. https://smartify.org

Smith, J. K., & Smith, L. F. (2001). Spending time on art. Empirical Studies of the Arts, 19(2), 229–236. https://doi.org/10.2190/5MQM-JWH6-V2P4-7DLK

Smith, J. K., Smith, L. F., & Tinio, P. P. L. (2017). Time spent viewing art and reading labels. Psychology of Aesthetics, Creativity, and the Arts, 11(1), 77–85. https://doi.org/10.1037/aca0000049

Smith, B., & Troynikov, A. (2024, 3 de julio). Evaluating chunking strategies for retrieval (Chroma Technical Report). Chroma. https://research.trychroma.com/evaluating-chunking-strategies-in-retrieval

Springmann, U., Lüdeling, A., & Ernst, F. (2017). OCR of historical printings with an application to building diachronic corpora: The RIDGES herbal corpus. Digital Humanities Quarterly, 11(2). http://www.digitalhumanities.org/dhq/vol/11/2/000291/000291.html

United Nations General Assembly. (2015). Transforming our world: The 2030 agenda for sustainable development (A/RES/70/1). https://sustainabledevelopment.un.org/post2015/transformingourworld/publication

Vastakas, L. (2024). Cultural heritage search with large language models: Enhancing the discoverability of cultural heritage artifacts through large language model-based search systems [Master’s thesis, Linnaeus University]. DiVA portal. https://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-132431

Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., & Ding, G. (2024). YOLOv10: Real-time end-to-end object detection. arXiv. https://arxiv.org/abs/2405.14458

Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. En NeurIPS 2020. https://proceedings.neurips.cc/paper/2020/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Wu, J., Zhu, J., Qi, Y., Chen, J., Xu, M., Menolascina, F., & Grau, V. (2024). Medical graph RAG: Towards safe medical large language model via graph retrieval-augmented generation. arXiv. https://doi.org/10.48550/arXiv.2408.04187

Yano, T., & Kang, M. (2008). Taking advantage of Wikipedia in natural language processing. Language Technologies Institute, Carnegie Mellon University. https://www.cs.cmu.edu/~taey/pub/wiki.pdf

Reconocimiento de objetos e inteligencia artificial conversacional en contextos del mundo real

Mejorar la experiencia en los museos mediante sistemas interactivos

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Estadísticas globales

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Asistente de IA

Idioma

SJR

logos

ISSN 2183-9956

Enviar un artículo

redes sociales

Colaboradores

Información

Número actual

Reconocimiento de objetos e inteligencia artificial conversacional en contextos del mundo real

Mejorar la experiencia en los museos mediante sistemas interactivos

Autores/as

DOI:

Palabras clave:

Resumen

Descargas

Estadísticas globales ℹ️

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Asistente de IA

Idioma

SJR

logos

ISSN 2183-9956

Enviar un artículo

redes sociales

Colaboradores

Información

Número actual

Estadísticas globales