Mind the gaps: cognitive-inspired AI for high-level visual sensemaking. Towards abstract concept image classification

Martinez Pandiani, Delfina Sol (2024) Mind the gaps: cognitive-inspired AI for high-level visual sensemaking. Towards abstract concept image classification, [Dissertation thesis], Alma Mater Studiorum Università di Bologna. Dottorato di ricerca in Computer science and engineering, 36 Ciclo. DOI 10.48676/unibo/amsdottorato/11284.
Documenti full-text disponibili:
[img] Documento PDF (English) - Richiede un lettore di PDF come Xpdf o Adobe Acrobat Reader
Disponibile con Licenza: Creative Commons Attribution Non-commercial No Derivatives 4.0 (CC BY-NC-ND 4.0) .
Download (192MB)

Abstract

The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.

Abstract
Tipologia del documento
Tesi di dottorato
Autore
Martinez Pandiani, Delfina Sol
Supervisore
Co-supervisore
Dottorato di ricerca
Ciclo
36
Coordinatore
Settore disciplinare
Settore concorsuale
Parole chiave
computer vision, multimodal understanding, abstract concepts, computational studies, high level semantics, semantic technologies, explainable AI, interpretability, cultural heritage, cultural images, cognitive science, knowledge representation, cognitive semantics, semantic web, situannotate, artstract dataset
URN:NBN
DOI
10.48676/unibo/amsdottorato/11284
Data di discussione
21 Marzo 2024
URI

Altri metadati

Statistica sui download

Gestione del documento: Visualizza la tesi

^