ICPRAM 2026 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 21
Title:

A CNN-Based Hybrid Biometric Verification System: Towards Green and Robust Identity Authentication

Authors:

Natalia Ellen Nicholas and Adejuyigbe O. Fajemisin

Abstract: Existing biometric authentication systems face challenges in trait diversity, missing-modality handling, and energy efficiency. This study proposes a hybrid multimodal framework using Siamese Convolutional Neural Networks (CNNs) across nine traits (ear, face, ECG, opisthenar, periocular, palm touch, palm touchless, voice, and finger), one of the broadest scopes in biometric research. Feature- and score-level fusion improves verification accuracy and robustness. Trait-specific models were trained and evaluated on the LUT Biometric (LUTBIO) dataset, with high-performing traits—face, finger, palm touchless, periocular (VGG16), and ECG (MobileNetV3Small)—achieving AUC > 0.98. Feature-level fusion across all nine traits attains AUC 0.9547, while score-level fusion via XGBoost improves performance to AUC 0.9895, demonstrating that selective late fusion is robust to noisy or unstable traits. Energy and carbon-footprint profiling using CodeCarbon shows emissions ≤ 0.03 kg CO2 per trait, supporting feasibility on modest hardware. The system offers a flexible, sustainable approach for identity verification in healthcare, education, and border control, with full deployment and field validation left for future work.
Download

Paper Nr: 52
Title:

Dist-CBMIR: A Semantics-Guided Distributed Metric Learning Framework for Lung Nodule Retrieval in CT Scans

Authors:

Mahbouba Hattab and Ahmed Maalel

Abstract: Content-Based Medical Image Retrieval (CBMIR) plays an increasing role in assisting clinical decision-making by identifying visually and semantically related cases from large imaging repositories. A persistent obstacle, however, is the semantic gap between the visual patterns captured by algorithms and the diagnostic concepts interpreted by radiologists. This work proposes a semantic metric learning framework based on a Siamese Triplet Network designed for the retrieval of lung nodules in CT scans. Unlike conventional triplet formation, the proposed approach employs radiological annotations-specifically malignancy scores and morphological descriptors-to guide semantic triplet generation, ensuring clinical relevance in the learned representations. The framework further integrates batch-hard mining to refine discrimination among embeddings and a scalable FAISS retrieval backend for efficient similarity search on large datasets. Experiments conducted on the LIDC-IDRI dataset demonstrate superior retrieval accuracy over radiomics-based models, CNN classifiers, and recent self-supervised methods such as SimCLR and Vision Transformers. Beyond quantitative gains, qualitative analysis confirms that the learned embeddings align with radiologists’ reasoning, underlining the potential of semantics-guided metric learning to enhance both the interpretability and reliability of CBMIR systems.
Download

Paper Nr: 54
Title:

DisasterSynth: High-Resolution Disaster Scene Generation and Reliable Pseudo Labeling with Masked Feature Reconstruction

Authors:

Wang Yubo, Ishii Hiroyuki and Ohya Jun

Abstract: Aerial image-based post-disaster analysis faces a critical challenge: data scarcity. Collecting and annotating disaster scene data is extremely laborious and time-consuming. To address this problem, we propose a comprehensive approach that combines synthetic data generation with enhanced pseudo labeling. First, we introduce DisasterSynth, an image-to-image diffusion model that generates realistic 1024 × 1024 post-disaster scenes. It takes real pre-disaster images, disaster types, and remote sensing metadata as conditions, without requiring additional image condition encoders. Second, we propose Self-Masked Feature Reconstruction (SMFREC), a generic self-supervised mechanism that improves pseudo label quality through masked autoencoder-based feature reconstruction within the teacher-student framework. Extensive experiments on the xBD dataset show that our method achieves +1.35% mIoU on the Minor Damage class and +1.00% mIoU overall. Extended experiments on PASCAL VOC 2012 further demonstrate the generality of our approach. Our methodology offers a practical solution for remote sensing tasks such as city surveillance and disaster response in unseen scenarios.
Download

Paper Nr: 66
Title:

Distance Alignment Loss for Single Image Geolocalization

Authors:

Naoto Suzuki and Tetsuya Suzuki

Abstract: Classification-based methods dominate single-image geolocalization due to their training efficiency and inference speed. However, Cross Entropy (CE) loss optimizes cell separation without considering the geographic distance structure in the embedding space. This leads to large-scale errors where misclassifications predict cells thousands of kilometers away from the true location. To address this limitation, we propose Distance Alignment Loss (DAL), an auxiliary loss that aligns angular distances in embedding space with great-circle distances between coordinates. While preserving the classification framework, DAL regularizes the embedding space toward geographic distance structures. Experiments on OpenStreetView-5M (OSV5M) show that DAL reduces mean error by 63.2 km and improves large-scale error metrics (Accuracy@2500–15000 km). However, short-range error metrics degrade, indicating a distance-dependent trade-off. This work demonstrates that large-scale errors can be suppressed through alignment between embedding space and geographic distance structures.
Download

Paper Nr: 68
Title:

Sensor Generalization for Adaptive Sensing in Event-Based Object Detection via Joint Distribution Training

Authors:

Aheli Saha, René Schuster and Didier Stricker

Abstract: Bio-inspired event cameras have recently attracted significant research due to their asynchronous and low-latency capabilities. These features provide a high dynamic range and significantly reduce motion blur. However, because of the novelty in the nature of their output signals, there is a gap in the variability of available data and a lack of extensive analysis of the parameters characterizing their signals. This paper addresses these issues by providing readers with an in-depth understanding of how intrinsic parameters affect the performance of a model trained on event data, specifically for object detection. We also use our findings to expand the capabilities of the downstream model towards sensor-agnostic robustness.
Download

Paper Nr: 71
Title:

Optimizing Knowledge Placement in Small Language Models: Combining Fine-Tuning and Retrieval-Augmented Generation

Authors:

Sigal Shaked

Abstract: We explore how small language models can best distribute knowledge between fine-tuned parameters and external retrieval mechanisms. Our hybrid approach fine-tunes on stable topics while retrieving dynamic facts at inference time. Using HotpotQA with topic-based partitions, the hybrid system outperforms fine-tuning-only and retrieval-only baselines in accuracy, while achieving a favourable balance between semantic richness and factual grounding. The results show that combining understanding (stable, learned reasoning) with remembering (retrieved, dynamic evidence) enables generalization to unseen domains-much like a person who searches for new information before applying prior knowledge to solve a novel problem.
Download

Paper Nr: 80
Title:

Assessing Reconstruction Techniques for Estimating Evapotranspiration Time Series under Varying Data Availability

Authors:

José Ramón Torres-Martín, Yolanda Carrión-García, José Manuel Velarde-Gestera, Mihaela I. Chidean and Inmaculada Mora-Jiménez

Abstract: Evapotranspiration is widely used in agriculture to determine crop water requirements and supporting irrigation scheduling, water budgeting, and drought monitoring. Accurate evapotranspiration estimation requires complete and reliable meteorological and soil data, typically collected by IoT sensors in orchards and commercial fields. However, sensors often produce missing values due to faults, calibration drift, power outages, and network latency, which can compromise estimation accuracy. In this work, we propose a reconstruction method for the four key meteorological variables (air temperature, relative humidity, wind speed and solar radiation) required for reference evapotranspiration estimation using the FAO-Penman-Monteith equation. The method operates on the residual component of each time series after removing linear and seasonal trends at a 20-minute sampling resolution. Its performance is evaluated against two baseline approaches, Linear Interpolation and Last Observation Carried Forward, using IoT data (2022–2025) from an orchard in Seville, Spain, and a nearby reference weather station as ground-truth. Results show lower reconstruction errors and higher correlation across all data-availability scenarios, particularly when availability drops below 70%. Correlation improves by up to 40%, and the error measurements are notably reduced. Reconstructed reference evapotranspiration values preserve seasonal dynamics even with extended gaps, demonstrating robustness and suitability for precision irrigation systems.
Download

Paper Nr: 82
Title:

Reducing Spurious Detections in Video-Based Aerial Object Recognition: A Multi-Stage Contextual, Temporal, and Multi-Scale Attention Architecture

Authors:

Shubham Kumar Dubey, J V Satyanarayana and C Krishna Mohan

Abstract: Detecting objects in video, especially when they are small, rapidly moving, or partially hidden, is still highly vulnerable to false alarms. Such erroneous detections directly compromise the trustworthiness of UAV surveillance, automated inspection systems, and other safety-sensitive perception pipelines. In this work, we introduce a multi-stage refinement architecture specifically designed to suppress false positives by exploiting complementary cues: spatial context, temporal coherence, anchor adaptation, non-maximum suppression behavior, and multi-scale feature enhancement. Starting from a Faster R-CNN backbone with an FPN, we augment the detector with a Graph Attention Contextual R-CNN to model relations between proposals, a bidirectional LSTM module for temporal feature aggregation, a K-means++ driven dynamic anchor generator, a differentiable Soft-NMS layer for score modulation, and a Multi-Level Dual Attention Refinement (MLDAR) block that operates across feature pyramid levels. Each component is grounded in a clear design motivation, accompanied by a mathematical formulation, and evaluated empirically on the Drone-vs-Bird (DVB) benchmark. The resulting system achieves notable gains in mAP, precision, recall, and false-positive rate compared with YOLOv8s, Faster R-CNN, and RetinaNet, with especially strong improvements for small airborne targets.
Download

Paper Nr: 88
Title:

A Comparative Study of Autoencoder Models on Latent Space Organization: An Evaluation with SVHN

Authors:

Wilson Bagni Junior, Gabriel Bianchin de Oliveira, Helio Pedrini and Zanoni Dias

Abstract: Autoencoders are neural networks used for compression, reconstruction, and pattern extraction in unlabeled data, and are fundamental to dimensionality reduction tasks. This paper presents a comparative analysis of four variants - FCAE, CAE, VAE, and AAE - applied to the Street View House Numbers (SVHN) dataset, using standardized architectures and different latent dimensions. The study evaluates data replication efficiency and the quality of the latent space through specific metrics, including neighborhood preservation. The results indicate that CAE excels in reconstruction and neighborhood preservation, FCAE exhibits greater sparsity and stability, while VAE and AAE produce more organized and robust latent spaces. The conclusions guide the choice of architectures in computer vision and unsupervised learning applications.
Download

Paper Nr: 90
Title:

X-CViT: An Explainable Vision Transformer Architecture for Classification of Cloud Images

Authors:

Ştefan Alexandrescu, Gabriela Czibula, Alexandra-Ioana Albu and Eugen Mihuleţ

Abstract: As a source of precipitation and affecting Earth’s energy balance, clouds have an important role in our planet’s weather and climate, and cloud classification is essential in weather forecasting and climate monitoring. In this paper, we are proposing X-CViT, an explainable vision transformer architecture for uncovering relevant features of different cloud types. We also present an analysis of the interpretations provided by the Local Interpretable Model-agnostic Explanations from both computational and meteorological perspectives. The experiments highlighted a statistically significant performance improvement compared to other deep learning architectures used in the literature. In terms of the Area Under the Receiver Operator Characteristic Curve performance metric, our approach outperforms by 0.2%-10.5% other methods from the literature that were replicated and tested using our proposed methodology.
Download

Short Papers
Paper Nr: 11
Title:

RoBERTa-HS: A Fine Tuned Transformer Model for Hate Speech Detection with Sentiment and Contextual Features

Authors:

Sehrash Safdar, Muhammad Wasim, Paulo Jorge Coelho and Ivan Miguel Pires

Abstract: Reducing harmful content on social media platforms requires the detection of hate speech. Higher misclassification rates result from traditional machine learning algorithms like Logistic Regression, Naïve Bayes, and Support Vector Machines (SVM) frequently failing to recognize the nuanced linguistic patterns connected to hate speech. The impact of extra language variables, such as sentiment scores and contextual embeddings, on classification accuracy is not well evaluated in current research, even though deep learning models, especially those based on transformer architectures, have increased detection performance. To overcome this problem, we improved the RoBERTa model and assessed its performance on the Twitter Hate Speech dataset in comparison to both traditional and state-of-the-art techniques. Our results demonstrate that the RoBERTa-HS outperformed previous approaches with a maximum accuracy of 0.92 and an F1-score of 0.91. Furthermore, the model demonstrated outstanding discrimination abilities with the highest AUC-ROC score of 0.91. Sentiment scores and word2Vec embeddings together improved classification performance, emphasizing the value of using a variety of linguistic cues. These findings highlight how well transformer-based algorithms identify hate speech.
Download

Paper Nr: 13
Title:

Two-Stage Angular Alignment for Positive-Unlabeled Learning

Authors:

Vasileios Sevetlidis, George Pavlidis and Antonios Gasteratos

Abstract: Positive-Unlabeled (PU) learning addresses the binary classification problem where only positive and unlabeled data are available-a setting common in applications such as medical diagnosis and web mining. We introduce a novel two-stage approach based on angular alignment in feature space, where a learnable prototype vector represents the directional centroid of the positive class. In the first stage, the model aligns labeled positives toward this prototype to promote angular compactness; in the second, it repels overly similar unlabeled instances to refine the decision boundary without prematurely assigning negative labels. Our method employs a directional loss inspired by von Mises–Fisher geometry, a dynamic stage-switching curriculum, and maintains a highly parameter-efficient design. Experiments on CIFAR-10 and SVHN demonstrate strong performance and competitive results compared to state-of-the-art PU learning methods. The approach also yields semantically structured latent spaces, highlighting the value of angular geometry for interpretable and effective representation-based PU learning in visual domains.
Download

Paper Nr: 18
Title:

Going beyond Majority Vote: Composite-Aware Soft Voting in Random Forest

Authors:

Cody Laurie, Martin Ha, Elijah Sagaran and Rashida Hasan

Abstract: Ensemble learning methods, particularly Random Forests (RF), are widely adopted for their interpretability, robustness, and strong performance across diverse domains. However, traditional majority voting schemes in RF assume uniform predictive competence among constituent trees-a limitation that undermines performance in imbalanced or noisy datasets and high-stakes applications. In this work, we propose a novel voting strategy to enhance ensemble decision-making, called Softmax-Weighted Average Voting (SWAV), which leverages temperature-scaled softmax to weight trees based on composite performance metrics, including accuracy, precision, recall, and F1-score. Extensive experiments across 22 public datasets from domains including healthcare, business, and beyond validate the proposed methods. The results demonstrated superior or comparable performance to Baseline RF; this allows us to scale down forest size without losing quality. Our approach incurs minimal computational overhead and improves scalability, making it a viable enhancement for real-world ensemble learning applications.
Download

Paper Nr: 24
Title:

Lie to Me: Knowledge Graphs for Robust Hallucination Self-Detection in LLMs

Authors:

Sahil Kale and Antonio Luca Alfeo

Abstract: Hallucinations, the generation of apparently convincing yet false statements, remain a major barrier to the safe deployment of LLMs. Building on the strong performance of self-detection methods, we examine the use of structured knowledge representations, namely knowledge graphs, to improve hallucination self-detection. Specifically, we propose a simple yet powerful approach that enriches hallucination self-detection by (i) converting LLM responses into knowledge graphs of entities and relations, and (ii) using these graphs to estimate the likelihood that a response contains hallucinations. We evaluate the proposed approach using two widely used LLMs, GPT-4o and Gemini-2.5-Flash, across two hallucination detection datasets. To support more reliable future benchmarking, one of these datasets has been manually curated and enhanced, and will be released as a secondary outcome of this work. Compared to standard self-detection methods and SelfCheckGPT, a state-of-the-art approach, our method achieves up to 16% relative improvement in accuracy and 20% in F1-score. Our results show that LLMs can better analyse atomic facts when they are structured as knowledge graphs, even when initial outputs contain inaccuracies. This low-cost, model-agnostic approach paves the way toward safer and more trustworthy language models. Our dataset and code are publicly availablea.
Download

Paper Nr: 25
Title:

Real-Time Proactive Anomaly Detection via Forward and Backward Forecast Modeling

Authors:

Luis Olmos and Rashida Hasan

Abstract: Reactive anomaly detection methods, which are commonly deployed to identify anomalies after they occur based on observed deviations, often fall short in applications that demand timely intervention, such as industrial monitoring, finance, and cybersecurity. Proactive anomaly detection, by contrast, aims to detect early warning signals before failures fully manifest, but existing methods struggle with handling heterogeneous multivariate data and maintaining precision under noisy or unpredictable conditions. In this work, we introduce two proactive anomaly detection frameworks: the Forward Forecasting Model (FFM) and the Backward Reconstruction Model (BRM). Both models leverage a hybrid architecture combining Temporal Convolutional Networks (TCNs), Gated Recurrent Units (GRUs), and Transformer encoders to model directional temporal dynamics. FFM forecasts future sequences to anticipate disruptions, while BRM reconstructs recent history from future context to uncover early precursors. Anomalies are flagged based on forecasting error magnitudes and directional embedding discrepancies. Our models support both continuous and discrete multivariate features, enabling robust performance in real-world settings. Extensive experiments on four benchmark datasets, MSL, SMAP, SMD, and PSM, demonstrate that FFM and BRM outperform state-of-the-art baselines across detection metrics and significantly improve the timeliness of anomaly anticipation. These properties make our approach well-suited for deployment in time-sensitive domains requiring proactive monitoring.
Download

Paper Nr: 38
Title:

HDLSS Raman Spectroscopy Data Generation Using GANs and Genetic Algorithms

Authors:

Thomas Poudevigne-Durance, Sahil Sharma, Sayantan Tripathy, Ng Ka Wai, Muskaan Singh, Liam McDaid, Gerard L. Cote, Samuel B. Mabbott and Saugat Bhattacharyya

Abstract: Early detection of acute myocardial infarction (AMI) is critical for improving clinical outcomes. Raman spectroscopy (RS), especially surface-enhanced resonance Raman scattering (SERRS), can detect ultra-low amounts of cardiac troponin I (cTnI), an AMI biomarker. High-dimensional, low-sample-size (HDLSS) data currently produces poor prediction models. Generative models such as Generative Adversarial Networks (GANs) can enhance sample size by synthesising the same data. This study explores the use of variants of Genetic Algorithms (GAs), a basic GA and NSGA-II, to optimise hyperparameters for three GANs (WGAN-GP, CTGAN, and CTAB-GAN+) aimed at generating synthetic SERRS data. GAN performance was assessed using Cosine similarity, Likeness score, Wasserstein distance, and Euclidean distance. These measures were combined with a weighted sum to deliver the fitness function of the Genetic Algorithm. GA-optimised models outperform manually modified models, with NSGA-II being most consistent. The most successful combination was WGAN-GP and NSGA-II for HDLSS data. GA-optimised GANs can improve early-stage diagnostic models and biomedical datasets. GA-optimised models outperform manually adjusted models, with NSGA-II performing best and most consistently. WGAN-GP paired with NSGA-II was particularly effective for HDLSS data, outperforming domain-specific GANs. These findings demonstrate the potential of GA-optimised GANs for augmenting biomedical datasets and improving early-stage diagnostic modelling.
Download

Paper Nr: 40
Title:

DASNet: A Dual Adaptive Subtle-Feature Network for Diabetic Retinopathy Detection in Fundus Images

Authors:

Yadynesh D. Sonale, Preeth Raguraman and Muneeswaran Packiaraj

Abstract: The task of identifying Diabetic Retinopathy (DR) plays a crucial role in medical image processing. Early detection of DR is essential for preventing irreversible vision loss. A significant challenge in this task arises from subtle lesions, such as microaneurysms and hemorrhages, in fundus images, as well as from variability in image quality due to different imaging devices. Existing deep learning approaches typically employ single-pooling CNN architectures, which struggle to effectively extract both fine-grained and prominent features. In this paper, we propose DASNet (Dual Adaptive Subtle-feature Network), designed to capture complex feature representations in fundus images, using a preprocessing pipeline to enhance image quality. The proposed architecture employs a dual-branch convolutional neural network that integrates MaxPooling, Adaptive MaxPooling, and Spatial Pyramid Pooling to simultaneously capture subtle and dominant retinal features. The preprocessing pipeline includes HSV color conversion, CLAHE enhancement on the Value channel, and RGB reconversion to improve contrast under varying lighting conditions. DASNet achieves accuracies of 95.34% on the BiDR dataset, 95.65% on APTOS, and 97.46% on the Eye Disease Image dataset. Evaluation was based on metrics such as F1 score, Precision, and Recall demonstrates that DASNet outperforms existing techniques, with accuracy improvements ranging from 1.85-6.69% for DenseNet121, 2.08-5.27% for VGG16, 2.80-11.28% for ResNet50, and 2.74-8.06% for InceptionV3. These results highlight DASNet’s potential for scalable, automated DR screening in telemedicine systems.
Download

Paper Nr: 43
Title:

Sugar-Beet Stress Detection Using Satellite Image Time Series

Authors:

Bhumika Laxman Sadbhave, Philipp Vaeth, Denise Dejon, Gunther Schorcht and Magda Gregorová

Abstract: Satellite Image Time Series (SITS) data has proven effective for agricultural tasks due to its rich spectral and temporal nature. In this study, we tackle the task of stress detection in sugar-beet fields using a fully unsupervised approach. We propose a 3D convolutional autoencoder model to extract meaningful features from Sentinel-2 image sequences, combined with acquisition-date-specific temporal encodings to better capture the growth dynamics of sugar-beets. The learned representations are used in a downstream clustering task to separate stressed from healthy fields. The resulting stress detection system can be directly applied to data from different years, offering a practical and accessible tool for stress detection in sugar-beets.
Download

Paper Nr: 44
Title:

Flexible Distributed System for Multi-Camera 3D Human Pose Estimation

Authors:

Rúben Costa Viana, Ana Filipa Rodrigues Nogueira, Hélder P. Oliveira and Luís F. Teixeira

Abstract: 3D Human pose estimation is an essential task for many real-world applications, such as surveillance systems or action recognition. Nonetheless, to be deployed in real-world settings, the solutions must balance accuracy and efficiency. Therefore, in this study, we will focus on improving the efficiency of current models for estimating the 3D pose. Hence, we propose a distributed system that is capable of distributing the computation workload between the various components of the system, allowing it to be easily scalable and provide real-time inference. In contrast with current distributed systems, which estimate the 3D pose using simpler geometric-based methods, our system uses a state-of-the-art model Faster VoxelPose, to accurately get the 3D pose. By performing real-time tests with 5 views, we were able to obtain a constant inference time of 30 FPS, with our edge devices working at 45 FPS.
Download

Paper Nr: 48
Title:

Feature Selection for Hyperspectral Data Using Genetic Algorithms and Simulated Annealing

Authors:

Diaa Addeen Abuhani, Raghad Aldamani, Meriem Aoudia and Omar Arif

Abstract: Hyperspectral images (HSI) provide rich spectral detail, but their high dimensionality critically challenges machine learning classifiers. This paper addresses this by investigating meta-heuristic feature selection methods, Genetic Algorithms (GA) and Simulated Annealing (SA), as robust alternatives to conventional filter-based techniques. This constitutes a significant AI contribution by enhancing the effectiveness of machine learning in processing complex, high-dimensional data. Using five benchmark HSI datasets, we rigorously benchmarked GA and SA against Chi-square, Information Gain, and ReliefF. Empirical results show that GA and SA achieved mean classification accuracies of up to 96% (Pavia) and 89% (Salinas), outperforming Chi-square and Information Gain, which typically ranged between 51% and 83%. Despite its improved accuracy, GA exhibited high computational overhead, with convergence times exceeding 375 seconds on Indian Pines, whereas SA maintained competitive accuracy with substantially faster convergence (around 29 seconds). Across datasets, SA provided a balanced trade-off between accuracy and efficiency, achieving an aggregated mean accuracy of 89% with convergence times under 22 seconds on average. These findings underscore the effectiveness of meta-heuristics in optimizing feature selection for HSI. Highlighting SA as a practical choice for real-time or resource-constrained scenarios provides direct engineering applications in fields like remote sensing. Future work will explore hybrid strategies to further enhance scalability, robustness, and computational speed. The work presented in this paper can be found at: https://github.com/Diaa340/Feature- Selection-for-Hyperspectral-Data-using-Meta-heuristics-Algorithms.
Download

Paper Nr: 49
Title:

AAHL: Attention-Guided Adaptive Hypergraph Learning for Multi-Label Image Classification

Authors:

Dong Wang, Hao Zhu, Ziyi Zhang, Zhengshen Gu and Songhua Xu

Abstract: Multi-label image classification is a challenging task due to issues such as severe label imbalance and low-saliency target recognition. Most existing methods rely on statistical pairwise label co-occurrence and fail to capture dynamic high-order label dependencies. To address this limitation, we propose an Attention-guided Adaptive Hypergraph Learning (AAHL) model. This model employs a channel-attentive adaptive hypergraph to dynamically weight vertex and hyperedge features, effectively modeling high-order label co-occurrences and enhancing semantic representation. Additionally, we design a cross-attentive multi-scale feature fusion module to explicitly capture interactions between high-level semantic features and low-level detailed features. Experiments on two benchmark datasets demonstrate that AAHL outperforms most existing methods, validating its effectiveness and robustness.

Paper Nr: 56
Title:

A Fast Algorithm for Euclidean Maximum Weight Non-Bipartite Matching

Authors:

Philipp Baumann, Olivier Goldschmidt and Dorit S. Hochbaum

Abstract: The maximum weight matching (MWM) on general, non-bipartite, graphs is solvable in polynomial time. Yet none of the exact optimization algorithms for the problem scales well enough for dealing with large data sets. We present here a heuristic algorithm for solving MWM on graphs where the nodes are vectors in a d-dimensional space and the edge weights are Euclidean distances or distances induced by any other metric norm. The algorithm is based on a recent anticlustering method called Assignment-Based Anticlustering (ABA) that for anticlusters of size 2 generates a maximum weight matching of high quality, often within less than 1% of the optimum, at a speed that is several orders of magnitude faster than exact optimization algorithms, and other heuristic algorithms. This significant performance advantage makes the ABA algorithm a powerful tool for large-scale geometric applications.
Download

Paper Nr: 57
Title:

Invisible yet Lethal: Revealing Least Significant Bit Backdoors in Medical Images through Explainable Segmentation

Authors:

Arturs Nikulins, Kaspars Sudars and Inese Poļaka

Abstract: Backdoor attacks represent a severe threat to the reliability and safety of artificial intelligence models, particularly in the medical domain where misclassification can directly affect patient outcomes. This paper introduces a hybrid segmentation–explainability framework designed to detect and localize invisible Least Significant Bit (LSB) backdoors in medical imaging models. The LSB trigger investigated converts all zero-valued pixels to ones - an imperceptible alteration to the human eye yet easily exploitable by neural networks. The proposed method integrates saliency maps generated by Explainable AI (XAI) tools with a segmentation network trained to recognize trigger regions only when the classification model is compromised. Using ResNet-50 classifiers and U-Net segmentation architecture on the BraTS dataset, we evaluate the ability of the segmentation model to generalize across different LSB trigger types and across distinct poisoned classifiers. The experimental evaluation shows that the hybrid framework achieved over 90% generalization accuracy in distinguishing saliency maps of poisoned and clean models when employing the Attention U-Net architecture.
Download

Paper Nr: 61
Title:

Explanation Uncertainty in Tabular Data Models

Authors:

Yingsi Gao, Ömer Tarik Özyilmaz and Matias Valdenegro-Toro

Abstract: This study investigates the interpretability of models suited for tabular data, focusing on gradient boosted decision trees (GBDT) and transformers, across regression and classification tasks, when explainable AI (XAI) methods are combined with uncertainty estimation methods. The study is motivated by the significance of explaining tabular data models due to their extensive real-world applications, such as in healthcare and finance, emphasizing the necessity for reliable and interpretable AI models in these critical domains. Each model incorporates uncertainty estimation methods: the ensemble method for GBDT; and dropout for transformers. And XAI methods were applied: model-agnostic explanation method LIME and data-type-specific explanation SHAP were applied to both models. The study examines the consistency of explanations with associated uncertainties. Findings indicate that for simpler dataset explanations from different XAI methods for the same model on the same instance are generally in agreement, where features of high contribution and low uncertainties are in consensus or related. Explanations across different models for the same task and same instance can be in accordance, where features of high contribution and low uncertainties are in accordance. For complex dataset the explanations are harder to interpret without expertise in domain knowledge.
Download

Paper Nr: 64
Title:

A Unified Framework Combining Clustering Algorithms, SMT-Based Reasoning, and Automated Semantic Analysis to Reduce False Positives in Video Analysis

Authors:

Shubham Kumar Dubey, J. V. Satyanarayana and C. Krishna Mohan

Abstract: False positives remain a major obstacle to deploying reliable video analysis systems in safety-critical domains such as UAV surveillance, industrial inspection, and traffic monitoring. Purely data-driven approaches reduce error rates but struggle to exploit explicit structural knowledge about the scene and domain. In contrast, symbolic reasoning techniques such as Satisfiability Modulo Theory (SMT) solvers and semantic analysis are powerful at handling constraints but cannot operate directly on raw high-dimensional visual data. This paper proposes a unified three-stage framework that combines (i) clustering-based structural grouping of video features, (ii) SMT-based reasoning over spatial–temporal constraints, and (iii) automated semantic analysis for high-level context validation. Clustering groups visually similar detections and isolates outliers; SMT-based reasoning prunes predictions that violate hard logical constraints; and semantic analysis enforces domain-specific behavioral rules over surviving candidates. We evaluate our method on the Drone-vs-Bird (DvB) dataset and demonstrate a substantial reduction in false positives compared to clustering-only, SMT-only, and semantic-only baselines. The framework offers a principled and interpretable route to bridge sub-symbolic perception and symbolic reasoning for robust video understanding.
Download

Paper Nr: 65
Title:

An Enhanced Customer Segmentation Algorithm Based on the RFM Model Using K-Means Clustering

Authors:

Ahmed Omrane, Mohamed Amine Mezghich and Slim Mhiri

Abstract: Customer segmentation plays a crucial role in designing effective marketing strategies and understanding consumer behavior. While traditional RFM (Recency, Frequency, Monetary) analysis combined with K-Means clustering has been widely used to group customers based on past purchasing patterns, it often lacks the ability to capture the temporal evolution of customer behavior. In this study, we present an applied and progressive extension of the traditional RFM framework by integrating additional behavioral and temporal indicators, namely the trend in purchase frequency (Slope F), the trend in monetary value (Slope M), the average transaction value, and the average basket size. These features are designed to capture customer behavioral dynamics that are typically absent from static RFM models. K-Means clustering is applied to the enriched feature space, and the resulting segments are evaluated using standard internal validation metrics. The experimental results show that the inclusion of behavioral and trend-based variables leads to more coherent and better-separated customer segments, thereby supporting more effective and targeted marketing strategies.
Download

Paper Nr: 72
Title:

Question-Answering System by Large Language Models Applied to Spanish Electronic Health Records in Maternal Healthcare

Authors:

D. Vallejo-Sanchez, A. F. Giraldo-Forero and A. Orozco-Duque

Abstract: High maternal mortality reflects persistent health inequalities in Latin America and the Caribbean. Innovating in systems that improve access to clinical information would help reduce the current inequality gap. However, extracting information from electronic health records (EHRs) is difficult due to the unstructured nature of this type of information. To address this gap, we built a Spanish Question Answer (QA) dataset and evaluated a set of prompts across different large language models (LLMs). The results show that Llama3.3 with a basic zero-shot obtains a 79% F1-score, along with 79% Precision and 80% Recall scores, being the model and prompt with the best performance. Moreover, our study larger models or newer knowledge cutoffs do not necessarily improve performance when answers must rely on documents provided by the prompt.
Download

Paper Nr: 84
Title:

ML-Assisted Fast Segment Detection and PID Gain Scheduling for Competitive Line-Following Robots

Authors:

Mohanad Abu-Romoh

Abstract: Line-following robots (LFRs) must be fast and stable across varied track segments, yet a single fixed PID configuration cannot satisfy all geometries. We propose a Proportional-Derivative (PD) gain scheduling assisted by a real-time lightweight neural network (NN) or decision tree (DT) classifier to detect track segments. The NN detector attains 95.48% accuracy, while the DT detector scores 88.88% in accuracy. The machine-learning-based detectors are trained offline over 5.5 million PID-controlled runs with randomized offsets, tilts, and sensor noise. To our knowledge, this work represents the first large-scale study demonstrating real-time PD gain scheduling via on-board segment detection for competitive LFRs.
Download

Paper Nr: 85
Title:

A Zero-Reference Approach Employing γ-Correction + Dilated-ZeroDCE++ for Handwritten Cheque Image Enhancement

Authors:

Prabhat Dansena, Ashish Ranjan, Soumen Bag and Prasun Chandra Tripathi

Abstract: Cheque Truncation System (CTS) is the core of the banking sector that helps in the digital processing of the cheques in India. However, the CTS is very much prone to poor results due to the illumination-based limiting conditions of the cheque images. Various conditions, for example, include poor, shadowed, and uneven illuminations. Even more, there exists no prior work in this regard, and the lack of datasets capturing these conditions remains a bigger challenge. Henceforth, a new dataset and a novel (γ-correction + DilatedZeroDCE++) method is introduced. The dataset covers several cheque images with varying illuminations, and proposed methods can efficiently handle these illumination conditions. The comparison with several popular models for low-light images, such as ZeroDCE and ZeroDCE++, strongly indicates the potential of the proposed approach. Notably, the proposed solution is extremely lightweight and is based on the notion of zero-reference that makes it suitable for even the small devices.
Download

Paper Nr: 94
Title:

A Graph Theory-Driven Analysis of EEG Patterns in Epilepsy Patients versus Healthy Controls

Authors:

Divya Eshwar, Jeevana Reddy C, Greeshma D Holeyannavar, B Harini and Surabhi Narayan

Abstract: Epilepsy is a chronic brain disorder characterised by abnormal electric activity in the brain, thereby triggering seizures. Electroencephalography (EEG) measures the brain’s electrical activity by recording the voltage fluctuations in neurons. This paper proposes a graph network-driven pattern analysis using spectral features and functional connectivity measures derived from EEG of epilepsy patients and healthy controls. Topographic mapping of spectral features show that epilepsy patients have significantly lower gamma relative power and spectral entropy, notably in the central-parietal and temporal regions. Functional connectivity measures characterise the interaction between brain regions. Graph metrics computed from these functional connectivity measures indicate pathological hyperconnectivity characterised by focal hubs and hypersynchronisation in gamma bands, especially in frontal-central regions for epilepsy patients, whereas balanced, diffused connectivity is observed in healthy controls. The proposed work is based on the TUH-EEG Epilepsy corpus, a dataset containing EEG recordings of 100 epilepsy patients and 100 healthy controls. The dataset is preprocessed to obtain consistent 22-channel EEG recordings across all individuals. The proposed work lays the foundation for future research focused on network-based biomarkers for diagnosing and monitoring epilepsy.
Download

Paper Nr: 98
Title:

Thyroid Nodule Classification via Weak Self-Supervision and Transfer Learning

Authors:

Alessio Fagioli, Marco Cascio, Gian Luca Foresti and Luigi Cinque

Abstract: Accurate classification of thyroid nodules in ultrasound images is essential for effective medical diagnostics. Segmentation maps can assist in this endeavor; however, the limited availability of annotated masks in medical imaging datasets hampers the performance of deep learning models, which rely heavily on large volumes of labeled data for training. To address this challenge, we propose a two-stage segmentation approach enhanced by transfer learning from a different domain. First, a segmentation model is trained on a large colonoscopy dataset to generate segmentation masks. Despite the domain differences, this model captures generalizable features that are then fine-tuned on a smaller thyroid ultrasound dataset with coarse annotations. The refined model is subsequently used as a weak self-supervisor to produce segmentation maps that guide the training of a DenseNet classifier to distinguish between benign and malignant nodules. Experimental results on a private collection show that, although coarse, our approach effectively extracts relevant information from thyroid images. Specifically, it achieved an accuracy of 87%, a notable increase from the baseline accuracy of 53% observed when the DenseNet classifier was trained directly on ultrasound images without segmentation guidance. By leveraging transfer learning to generate informative segmentation masks, this framework successfully mitigates the problem of data scarcity in thyroid ultrasound imaging.
Download

Paper Nr: 100
Title:

HKAN-LogAD: Hierarchical Kolmogorov-Arnold Networks for Log Anomaly Detection

Authors:

Aristotelis Charalampous and Andreas Economides

Abstract: In this paper we present an extension of the Hierarchical Kolmogorov–Arnold Network (HKAN), dubbed the HKAN-LogAD methodology, tailored to large-scale software-intensive systems log anomaly detection. Our approach replaces deep Transformer stacks with a single shared HKAN encoder and a lightweight masked log modeling (MLM) head, trained via multi-output regression over one-hot event targets. Unlike Transformer-based log methods, which rely on backpropagation through deep attention layers, HKAN-LogAD learns the encoder in closed form and solves only small convex problems at the top, yielding substantial gains in performance and controllability. More specifically, on the HDFS benchmark, HKAN-LogAD improves F-1 score by 2.59 percentage points and recall by 4.37 percentage points over LogBERT. The additive, basis-function structure of HKAN and the head-wise anomaly scores together provide a favorable accuracy–transparency trade-off for operators seeking interpretable and robust log anomaly detection.
Download

Paper Nr: 105
Title:

Benchmarking Building Energy Performance Using AI and Customer Data

Authors:

Ahmed Mabrouk and Mihir Sarkar

Abstract: In 2022, France’s building sector accounted for 44% of national energy consumption and emitted over 125,000 tonnes of CO2, positioning it as a critical focus for climate action. Prioritizing this sector is essential not only to reduce greenhouse gas emissions and support the transition to a zero-carbon economy, but also to alleviate household energy expenses. However, addressing such a challenge is not straightforward and often leads to several difficulties, including the need for detailed and accurate data (such as energy consumption, building typology, construction year, etc.), as well as the availability of a representative data sample across a given territory. Moreover, obtaining a representative sample across a given territory is often problematic. In practice, available datasets are frequently biased, incomplete, or contain inaccurate measurements, which undermines the reliability of analyses. In this paper, we propose an automated and comprehensive approach to facilitate the generation and the analysis of building energy profiles, which has traditionally been carried out manually by operators. This method leverages energy consumption data and artificial intelligence techniques to make the analysis more efficient and insightful. The effectiveness and advantages of this approach are demonstrated through quantitative experiments.
Download

Paper Nr: 106
Title:

Deep Learning-Based Early Detection of Breast Cancer in Mammography Images: Impact of Image Enhancement Techniques

Authors:

Chaima Athimni, Mohamed Amine Mezghich, Seif Eddine Amara and Slim Mhiri

Abstract: Breast cancer detection through mammography remains challenging due to low image contrast, noise, and subtle lesion characteristics. While convolutional neural networks have demonstrated strong potential for automated detection, their performance depends critically on preprocessing strategies-an aspect insufficiently explored in comparative studies. This work systematically evaluates Negative Transform and Adaptive Histogram Equalization across four CNN architectures (ResNet50, DenseNet121, InceptionV3, ConvNeXtBase) on CBIS-DDSM and external Mini-DDSM datasets. Negative Transform consistently outperforms alternatives, with ResNet50 achieving 98.56% accuracy on CBIS-DDSM and maintaining 94.83% on Mini-DDSM, demonstrating robust cross-dataset generalization. Results highlight preprocessing as a critical factor for clinically deployable computer-aided detection systems, with Negative Transform emerging as the superior approach for mammography analysis.
Download

Paper Nr: 110
Title:

Hybrid CNN–GNN Model for Deepfake Image Forensics

Authors:

Hanadi Elsablaoui, Mohamed Amine Mezghich, Ridha Ghayoula and Lasaad Latrach

Abstract: In legal and forensic contexts, the increasing use of digital images raises serious concerns regarding image manipulation and deepfake generation, which threaten the integrity and admissibility of visual evidence in judicial procedures. This work proposes a hybrid deep learning framework for deepfake image detection that combines convolutional neural networks (CNNs) and graph neural networks (GNNs) within a transfer learning strategy. Pre-trained CNN backbones (DenseNet121, VGG16, and ResNet50) are fine-tuned to extract discriminative features, which are subsequently used to construct dynamic k-nearest neighbor graphs processed by a GraphSAGE-based GNN. Experimental results demonstrate that the proposed CNN+GNN architecture significantly outperforms CNN-only baselines. In particular, the DenseNet121-based hybrid model achieves an accuracy of 93.25% and an F1-score of 92.78%, highlighting the effectiveness of graph-based relational learning in enhancing deepfake detection robustness for forensic applications.
Download

Paper Nr: 115
Title:

Is Hierarchical Quantization Essential for Optimal Reconstruction?

Authors:

Shirin Reyhanian and Laurenz Wiskott

Abstract: Vector-Quantized variational autoencoders (VQ-VAEs) are central to models that rely on high reconstruction fidelity, from neural compression to generative pipelines. Hierarchical extensions, such as VQ-VAE2, are often credited with superior reconstruction performance because they split global and local features across multiple latent levels. However, since higher-level latents derive all their information from lower levels, they should not carry additional reconstructive content beyond what the lower-level already encodes. Combined with recent advances in training objectives and quantization mechanisms, this leads us to ask whether a single-level VQ-VAE, with matched representational budget and no codebook collapse, can equal the reconstruction fidelity of its hierarchical counterpart. Although the multi-scale structure of hierarchical models may improve perceptual quality in downstream tasks, the effect of hierarchy on reconstruction accuracy, isolated from codebook utilization and overall representational capacity, remains empirically underexamined. We revisit this question by comparing a two-level hierarchical VQ-VAE and a capacity-matched single-level model on high-resolution ImageNet images. Consistent with prior observations, we confirm that inadequate codebook utilization limits single-level VQ-VAEs and that overly high-dimensional embeddings destabilize quantization and worsen codebook collapse. We show that lightweight interventions such as initialization from data, periodic reset of inactive code vectors, and systematic tuning of codebook size and dimension, significantly reduce collapse and enable the single-level model to make effective use of its available capacity. Our results demonstrate that when representational budgets are matched, and codebook collapse is mitigated, single-level VQ-VAEs can match the reconstruction fidelity of hierarchical variants, challenging the assumption that hierarchical quantization is inherently superior for high-quality reconstructions. The code for reproducing our experiments is available at https://github.com/wiskott-lab/single-vs-hier-recon.
Download

Paper Nr: 12
Title:

A Machine Learning Approach Using Logistic Regression to Analyze and Predict Online Consumer Behavior in E-Commerce

Authors:

Zahra Ali, Abdul Ahad, Hina Tufail, Abdul Hannan, Paulo Jorge Coelho and Ivan Miguel Pires

Abstract: Machine learning is a powerful data analysis tool. Yet, even with abundant big data, predicting online consumer behavior remains challenging. This research analyzes how customers act on e-commerce sites. We aim to predict whether visitors will return and make purchases. We use a logistic regression model with the Google Merchandise Store dataset, which contains user session data. While logistic regression is our main tool, we also test other techniques to improve performance. Our model predicts purchase likelihood using data that is often unavailable in physical stores. By analyzing past behavior, we gain insights into customer attitudes toward online shopping. The study’s goal is to compare predictive models and find which best classifies buying versus non-buying behaviors.
Download

Paper Nr: 17
Title:

Benchmarking YOLO for Multi-Tissue Fetal Brain MRI Segmentation

Authors:

Dorsaf Sebai and Manel Zouaoui

Abstract: Segmentation of fetal brain tissues from Magnetic Resonance Imaging (MRI) is a key step for assessing in-utero neurodevelopment. While the FeTA benchmark defines the task across seven tissue classes, most leading approaches rely on U-Net variants and 3D architectures, which require heavy preprocessing, volumetric reconstruction, and high computational resources. Such requirements limit their deployment in time-constrained or resource-constrained environments. In this study, we explore YOLOv8-Seg, a lightweight segmentation framework, as an alternative for multi-tissue fetal brain MRI segmentation. We evaluate its performance on the FeTA dataset against U-Net and nnU-Net baselines, focusing on both segmentation accuracy and efficiency. Results demonstrate that YOLOv8-Seg delivers competitive accuracy on major brain structures while offering substantially faster inference, highlighting its potential for real-time applications.
Download

Paper Nr: 31
Title:

Greedy Control Group Selection for Multi-Explanatory Multi-Output Regression Problem

Authors:

Gábor Szűcs, Marcell Németh and Richárd Kiss

Abstract: The problem of multi-output learning involves the simultaneous prediction of multiple outputs based on given inputs. This paper focuses on addressing this challenge, assuming that we can only monitor a subset of variables, which constitutes the control group. This resource constraint led to a definition of a new kind of problem, that we call Multi-Explanatory Multi-Output Regression (MEMOR) task. The goal of MEMOR is to select explanatory variables that minimize the prediction error for target group, i.e. the set of output variables. The central question pertains to the optimal choice of a given number of variables to maximize the goodness of the regression. We propose two greedy approaches for identifying good explanatory variables, along with a linear approximation as a baseline. To evaluate the performance of the proposed algorithms, we compared the resulting explanatory variables with the optimal set obtained through an exhaustive search. Our greedy algorithms surpass the linear method with better regression results, while they are faster than the exhausted method. Both the MEMOR problem and the methods developed for it are well-suited for multi-dimensional data analysis with resource constraints.
Download

Paper Nr: 67
Title:

SwapPF: Correcting HPE Left-Right Swaps for Gait Analysis Using Particle Filters

Authors:

Kosuke Aoyagi, Miho Adachi, Hiroyuki Yomo and Ryusuke Miyamoto

Abstract: The analysis of gait plays a particularly important role in personal identification in public environments. Gait is represented by the relative positions of joints. Joint points are estimated using deep learning-based Human Pose Estimation (HPE) model that takes visible light camera images as input. When using HPE, there are problems such as left-right swapping and temporal noise degrading the performance of gait-based personal identification, and making the estimation of the gait cycle unstable. In this study, a novel motion model-based particle filter is proposed that removes noise from the inference results of the model for the analysis of gait. Specifically, the HPE results are corrected using a model that focuses on the left-right symmetry and periodicity of walking. Compared to using only ViTPose, the state-of-the-art model on COCO, the application of our filtering method improved the overall accuracy of personal identification by approximately 3-6%.
Download

Paper Nr: 83
Title:

Evaluating Generalizability of Convolutional Deep Learning Models for Plant Disease Classification

Authors:

Ludovico Boratto, Gianni Fenu, Francesca Maridina Malloci, Mirko Marras and Marco Tocco

Abstract: Plant disease classification using deep learning has shown promising results on controlled datasets, yet practical deployment in agricultural settings remains challenging due to performance degradation under real world conditions. In this paper, we investigate the extent to which established convolutional neural network architectures (VGG16, ResNet50, DenseNet121, EfficientNetB0 and MobileNetV2) can provide stable and robust performance across both laboratory and field acquisition conditions, using five publicly available datasets (BRACOL, PlantVillage, RoCoLe, DiaMOSPlant and Plant Pathology). Our findings reveal that deep architectures with residual or dense connectivity (ResNet50, DenseNet121, VGG16) maintain stable performance across diverse imaging contexts, achieving F1-scores above 0.93 with negligible degradation when transitioning from laboratory to field settings. In contrast, lightweight architectures designed for computational efficiency experience substantial performance drops of 6 to 10 percentage points under field conditions, indicating reduced robustness to environmental variability.
Download

Paper Nr: 99
Title:

A High-Precision Hybrid Intelligence Framework for Diabetic Retinopathy Grading

Authors:

Nader Belhadj, Mohamed Amine Mezghich, Jaouher Fattahi, Ridha Ghayoula and Lassaad Latrach

Abstract: Diabetic Retinopathy (DR) remains one of the leading causes of preventable blindness worldwide, highlighting the need for screening systems that are accurate, interpretable, and computationally efficient. While deep learning models currently dominate DR analysis, their dependence on extensive annotated datasets and high-end hardware limits their deployment in real-world clinical environments. This paper introduces a compact hybrid-intelligence framework that integrates multiscale handcrafted descriptors-spanning vascular enhancement, fractal geometry, frequency-domain cues, wavelet representations, and log-Euclidean covariance em-beddings-into an optimized LightGBM classifier tailored for ordinal DR grading. Evaluated on the Messidor-2 benchmark, the proposed method achieves strong discriminative performance while remaining lightweight, transparent, and training-efficient. These findings demonstrate that a carefully engineered fusion of domain-driven visual cues can serve as a robust and interpretable alternative to deep neural architectures, particularly in resource-constrained DR screening scenarios.
Download

Paper Nr: 108
Title:

Enhancing VGG16 with Analytic Fourier–Mellin Invariants

Authors:

Chaima Dhaouadi, Mohamed Amine Mezghich and Faouzi Ghorbel

Abstract: Medical image classification is sensitive to acquisition-induced geometric variability (rotation, scale, translation). We propose a hybrid framework that injects Analytical Fourier–Mellin Transform (AFMT) similarity-invariant descriptors into a VGG16 backbone, using compact amplitude–phase representations as complementary cues to convolutional features. Three integration strategies are studied at intermediate layers: (i) intermediate fusion by feature concatenation, (ii) FiLM-based conditioning that applies feature-wise affine modulation driven by the invariant descriptor, and (iii) amplitude-guided attention that learns adaptive weights to regulate invariant contributions during aggregation. The proposed framework enables a structured comparison of fusion, modulation, and attention mechanisms for invariant-guided medical image classification.
Download

Paper Nr: 109
Title:

Enterprise Resource Planning Using Multi-Type Transformers in Ferro-Titanium Industry

Authors:

Samira Yazdanpourmoghadam, Mahan Balal Pour and Vahid Partovi Nia

Abstract: Combinatorial optimization problems such as the Job-Shop Scheduling Problem (JSP) and Knapsack Problem (KP) are fundamental challenges in operations research, logistics, and eterprise resource planning (ERP). These problems often require sophisticated algorithms to achieve near-optimal solutions within practical time constraints. Recent advances in deep learning have introduced transformer-based architectures as promising alternatives to traditional heuristics and metaheuristics. We leverage the Multi-Type Transformer (MTT) architecture to address these benchmarks in a unified framework. We present an extensive experimental evaluation across standard benchmark datasets for JSP and KP, demonstrating that MTT achieves competitive performance on different size of these benchmark problems. We showcase the potential of multi-type attention on a real application in Ferro-Titanium industry. To the best of our knowledge, we are the first to apply multi-type transformers in real manufacturing.

Paper Nr: 112
Title:

Facial Emotion Recognition: A Comparative Study with Cross-Corpus and Multi-Corpus Training

Authors:

Sofia Condesso, Artur Ferreira and Nuno Leite

Abstract: Emotion Recognition (ER) is crucial in Human-Computer Interaction (HCI), with applications ranging from mental health support, educational support, adaptive learning, and customer feedback. For many of these applications, the analysis of facial expressions, revealing the emotions that a person is experiencing, is very useful. With the proliferation of smartphones, cameras, and other devices it is common to take photos or to record videos in many situations of our daily lives. Thus, in many life occasions we are able to take a picture of a person's face. In this paper, we conduct an evaluation of Facial Emotion Recognition (FER) techniques based on deep learning techniques. We perform experiments on four benchmark datasets (namely CK+, FER2013, RAF-DB, and AffectNet) and confirmed the influence of data quality, color information, and dataset bias on model performance. Our study evaluates single-corpus, cross-corpus, and multi-corpus training of deep learning models. We find that multi-corpus training combining FER2013, CK+, and RAF-DB is the best performing approach, being able to deal with the variability and dynamics of face image acquisition.
Download

Paper Nr: 116
Title:

Explainable AI for Time Series: An Empirical Comparative Study

Authors:

Faryal Siddique, Faizan Ahmed and Maurice van Keulen

Abstract: Time series is a data modality often associated with critical application domains such as healthcare and predictive maintenance. Due to the criticality of these application, the explanation of machine learning models becomes a legal requirement. This paper presents an empirical comparative study of representative Explainable AI (XAI) methods for time-series analysis. Experiments evaluate multiple XAI techniques using two datasets (one univariate and one multivariate) across three traditional machine learning models (Random Forest, Support Vector Machine, and XGBoost) and one deep learning model (Long Short-Term Memory (LSTM) network). The study highlights qualitative trade-offs across the data modality, explanation granularity (global, local, and temporal), and model dependence, and discusses limitations of popular attribution methods in capturing long-term dependencies in sequential models.
Download

Area 2 - Applications

Full Papers
Paper Nr: 22
Title:

Revisiting Person Re-ID: ConvNeXt with AIBN and TNorm in IICS/IIDS Frameworks

Authors:

Faisal Z. Qureshi and Roya Dehghani

Abstract: This paper investigates the integration of ConvNeXt-a convolutional architecture inspired by vision transformers-into the Intra- and Inter-Camera Similarity (IICS) and Intra- and Inter-Domain Similarity (IIDS) frameworks for unsupervised person re-identification (Re-ID). These frameworks follow a two-stage process that first generates pseudo labels by modeling both intra-camera and inter-camera relationships. These pseudo labels are then used to train feature encoders that learn identity representations consistent across multiple cameras. We improve upon this scheme by replacing the ResNet backbone with ConvNeXt-a convolutional architecture inspired by vision transformers, combining modern design principles with the efficiency of CNNs to achieve state-of-the-art performance in image recognition tasks. Additionally, we introduce two normalization techniques: (1) Adaptive Instance-Batch Normalization (AIBN) and (2) Transform Normalization (TNorm). Extensive ablation studies demonstrate that applying AIBN in the final ConvNeXt stages (Stages 3 and 4), and inserting TNorm after Stages 1 through 3, leads to significant performance improvements. We also analyze four ConvNeXt variants within the IICS/IIDS framework and demonstrate that larger ConvNeXt models consistently yield better performance. Experimental results on the Market1501, DukeMTMC-reID, and MSMT17 benchmarks show that our method achieves state-of-the-art performance among unsupervised person Re-ID approaches in terms of mean Average Precision (mAP), underscoring the potential of ConvNeXt-based architectures for scalable, label-free re-identification.
Download

Paper Nr: 28
Title:

Beyond Obstacle Avoidance: A Multimodal Goal-Based Wearable Navigator for the Visually Impaired

Authors:

Aditya Bangde, Benjamin Klein and Sanchita Ghose

Abstract: Safe and independent navigation remains a major challenge for visually impaired individuals, especially in unfamiliar environments. While existing assistive devices e.g., ultrasonic canes and smart glasses can detect obstacles, they are not designed to help users reach specific goals, such as an exit door. As a result, users may avoid obstacles but still move in the wrong direction. In this research, we propose a multimodal target-aware wearable system that integrates obstacle avoidance with goal directed guidance for people with visual impairments. The proposed system features a lightweight image processing algorithm that separately detects surfaces and obstacles, enabling a better understanding of the environment. Users can define targets using voice commands, and the system uses object detection to guide them accordingly. The system provides multisensor feedback: a depth camera supports obstacle detection; an inertial measurement unit (IMU) assesses surface conditions (e.g., slipperiness); and a servo driven haptic actuator delivers directional steering. Additionally, we present a cloud-based image recognition module allowing users to request scene descriptions through voice. Through experiments we show that the system significantly reduces collision incidents by approximately 50% and reliably detects slippery surfaces, highlighting multimodal sensor fusion for practical, goal-oriented mobility assistance.
Download

Paper Nr: 34
Title:

GNNs for Time Series Anomaly Detection: An Open-Source Framework and a Critical Evaluation

Authors:

Federico Bello, Gonzalo Chiarlone, Marcelo Fiori, Gastón García González and Federico Larroca

Abstract: There is growing interest in applying graph-based methods to Time Series Anomaly Detection (TSAD), particularly Graph Neural Networks (GNNs), as they naturally model dependencies among multivariate signals. GNNs are typically used as backbones in score-based TSAD pipelines, where anomalies are identified through reconstruction or prediction errors followed by thresholding. However, and despite promising results, the field still lacks standardized frameworks for evaluation and suffers from persistent issues with metric design and interpretation. We thus present an open-source framework for TSAD using GNNs, designed to support reproducible experimentation across datasets, graph structures, and evaluation strategies. Built with flexibility and extensibility in mind, the framework facilitates systematic comparisons between TSAD models and enables in-depth analysis of performance and interpretability. Using this tool, we evaluate several GNN-based architectures alongside baseline models across two real-world datasets with contrasting structural characteristics. Our results show that GNNs not only improve detection performance but also offer significant gains in interpretability, an especially valuable feature for practical diagnosis. We also find that attention-based GNNs offer robustness when graph structure is uncertain or inferred. In addition, we reflect on common evaluation practices in TSAD, showing how certain metrics and thresholding strategies can obscure meaningful comparisons. Overall, this work contributes both practical tools and critical insights to advance the development and evaluation of graph-based TSAD systems.
Download

Paper Nr: 47
Title:

A Generalized System for Offline Signature Verification Using FSNet and Siamese Networks

Authors:

Jumana Y. Mostafa and Ahmed El-Rafei

Abstract: Even with today’s technologies, offline signature verification is still a challenging task due to the high interclass variability of handwriting styles and the high variation of distributions of signatures in different languages. In this paper, we propose an FSNet-based Siamese network architecture with a Support Vector Machine (SVM) classifier for robust offline writer-independent signature verification. To mitigate performance degradation in cross-dataset evaluation, we propose three solutions for better generalization depending on application needs: fine-tuning the SVM on a few target samples, training the full pipeline on a combined dataset, and using an ensemble feature fusion approach that fuses multiple dataset-specific feature extractors. Among these, the ensemble feature fusion method yielded the most consistent verification performance across varied signature distributions from different datasets, with accuracies of 99.34%, 97.20%, 95.57%, and 98.14% on the CEDAR, Persian, Bengali, and Hindi datasets respectively, surpassing results of previous works.
Download

Paper Nr: 53
Title:

Real-Time Unsupervised Anomaly Detection in SQL Performance Data via Semantic Query Embeddings, Incremental Clustering, and Random Cut Forest

Authors:

Alaeddine Moussa and Wassim Dhib

Abstract: We propose a novel unsupervised real-time anomaly detection pipeline for SQL performance data. Our approach marries semantic enrichment and embedding of SQL queries using the Universal Sentence Encoder (USE) to learn high-dimensional semantic representations of query patterns. The ensuing embeddings serve as input to an incremental clustering phase, which groups semantically related queries in real time and allows the system to learn on the fly as new query types emerge. Within every group of queries, a series of collective performance indicators are monitored, and anomalous instances are detected by using the Random Cut Forest (RCF) algorithm. This new three-stage composite of semantic query tagging, high-dimensional representation, clustering, and robust anomaly scoring is the first of its kind to handle the heterogeneity, mixed-type nature of SQL workload data. We evaluate the method on a synthetic labeled performance anomaly test set, where it successfully identifies injected performance anomalies, and on a real-world unlabeled SQL performance trace set, demonstrating that the approach can deduce unusual performance behavior in an unsupervised manner. The experiments indicate that our pipeline can successfully detect anomalies with limited false alarms, and its real-time and unsupervised nature evidences substantial practical deployment value in Database Management System (DBMS) monitoring and self-healing systems.
Download

Paper Nr: 77
Title:

Robust Cooperative Localization for Grazing Cattle via Factor Graph Optimization: Jointly Optimizing Data-Driven Dead Reckoning and RSSI Factors

Authors:

Ryoga Tamasaki, Kenji Oyama and Takenao Ohkawa

Abstract: Efficient management of grazing cattle requires accurate, low-power localization. While GPS is the de-facto standard, its power consumption is too high for long-term neck-mounted devices. Inertial Measurement Unit (IMU)–based dead reckoning (DR) offers a low-power alternative but suffers from drift. In outdoor pastures, RSSI measurements from BLE tags follow a non-monotonic Two-Ray Ground-Reflection model, rendering RSSI-only trilateration unstable. We propose a cooperative localization framework based on factor graph optimization (FGO) that fuses IMU and RSSI data. First, S2M-LGBM (Static-to-Moving LightGBM), a gradient-boosting DR model, jointly classifies motion states and regresses relative displacement from time–frequency IMU features, applying Zero-Velocity Updates (ZUPT) during static periods to suppress drift. It outperforms deep learning baselines under limited training data. Second, the proposed FGO framework combines the DR factor from S2M-LGBM with RSSI factors for gateway-to-cattle (G2C) and cattle-to-cattle (C2C) links, adaptively scaling C2C weights based on position uncertainty to mitigate error propagation. Experiments on real grazing-cattle data demonstrate that S2M-LGBM surpasses an LSTM-based DR baseline and that the FGO framework significantly reduces the 90th-percentile error, enhancing robustness against catastrophic drift. Simulations further confirm that cooperative fusion via C2C links reduces median RMSE by up to 66% compared with non-cooperative baselines.
Download

Paper Nr: 102
Title:

Knowledge Distillation for Lightweight Models in Wildfire Segmentation

Authors:

Rafael M. Mamede, Leonardo M. Ferreira, Mansur Mustafin, Eduarda Caldeira, Hélder P. Oliveira, Jaime S. Cardoso and Ana F. Sequeira

Abstract: Code smells impair software maintainability, yet their joint occurrence and temporal behaviour remain insufficiently understood. This study investigates co-occurrence patterns and evolutionary trends among twenty-one code smell types across seventeen open-source Java projects. Using association rule mining on SonarQube detections, supported by an external Data Clumps dataset, the analysis identifies recurrent smell combinations and probabilistic dependencies. Established relations such as Data Clumps → Long Parameter List are confirmed, while new strong associations emerge, including Dead Code ↔ Uncommunicative Name. A temporal correlation analysis shows that smell introductions consistently exceed removals, resulting in a monotonic accumulation of smells over project lifecycles. The findings provide empirical evidence of structured interdependencies among code smells and highlight characteristic growth patterns that can inform predictive refactoring and maintainability assessment.
Download

Paper Nr: 103
Title:

Siamese Network-Based Handwritten Pattern Similarity for Few-Shot Automatic Scoring of Very Short Answers

Authors:

Nam Tuan Ly, Hung Tuan Nguyen and Masaki Nakagawa

Abstract: Automatic scoring of handwritten answers is an important technique in computer-aided education. Previous studies have primarily employed OCR-based methods for this task. However, OCR systems typically rely on predefined dictionaries, which makes them less effective when handling out-of-vocabulary characters or non-character symbols that resemble dictionary entries. To address this limitation, the paper proposes a Siamese network-based automatic scoring method to improve the automatic scoring of single/few-character answers. The proposed method consists of two main components: Siamese Network-based Handwritten Pattern Similarity and Similarity-based Automatic Scoring Algorithm. The Siamese network takes two answer patterns as input and outputs the similarity between them. The similarity-based automatic scoring algorithm scores an answer as correct, incorrect, or rejected based on its similarity. We also propose a method for sampling training data to train the Siamese network. We conducted experiments on a collection of handwritten answers from elementary school students, consisting of 98,547 Japanese and 15,896 English answers. The extensive experiments demonstrate the superiority of the proposed method over the previous methods. The evaluation experiments also verify the generalization of our method and its effectiveness for few-shot automatic scoring, useful for low-resource settings.
Download

Short Papers
Paper Nr: 29
Title:

Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce

Authors:

Sayak Chakrabarty and Souradip Pal

Abstract: Unlike traditional recommendation tasks, finite user time budgets introduce a critical resource constraint, requiring the recommender system to balance item relevance and evaluation cost. For example, in a mobile shopping interface, users interact with recommendations by scrolling, where each scroll triggers a list of items called slate. Users incur an evaluation cost - time spent assessing item features before deciding to click. Highly relevant items having higher evaluation costs may not fit within the user’s time budget, affecting engagement. In this position paper, our objective is to evaluate reinforcement learning algorithms that learn patterns in user preferences and time budgets simultaneously, crafting recommendations with higher engagement potential under resource constraints. Our experiments explore the use of reinforcement learning to recommend items for users using Alibaba’s Personalized Re-ranking dataset supporting slate optimization in e-commerce contexts. Our contributions include (i) a unified formulation of time-constrained slate recommendation modeled as Markov Decision Processes (MDPs) with budget-aware utilities; (ii) a simulation framework to study policy behavior on re-ranking data; and (iii) empirical evidence that on-policy and off-policy control can improve performance under tight time budgets than traditional contextual bandit-based methods.
Download

Paper Nr: 30
Title:

Machine Learning for Hydrogen-Fired Gas Turbine Combustion Monitoring

Authors:

Roland Unterberger, Martin Winter, Werner Bailer, Herwig Zeiner, Fabrice Giuliani, Andrea Hofer and Alexander Schricker

Abstract: This work explores the application of machine learning techniques for monitoring the stability of combustion processes in hydrogen (H2)-fuelled gas turbines. We describe an approach using a sophisticated sensor network for generating live data in a recently developed H2 burner, optimized by computational fluid dynamics (CFD) simulations. The method is based solely on pressure data measured inside the burner and very close to the flame. Key innovation of the approach is the use of an autoencoder (AE) to create a generic compact representation of the combustion monitoring data for the classification of basic burner operation states. The AE’s latent space vectors (representing the compressed state of the burner) are then used to train a regression pipeline estimating the remaining time until a critical abnormal event (for instance a flashback) happens. This time-to-failure estimation is crucial to initiate timely countermeasures and avoid critical states leading to hazards destroying the burner or even entire turbine system. The proposed approach not only advances the field of combustion monitoring, but also sets a foundation for improved performance and safety of operation.
Download

Paper Nr: 41
Title:

Error-Guided Distillation for Real-Time 2D LiDAR People Detection

Authors:

Avesalon Razvan Marian and Miron Radu

Abstract: Knowledge distillation (KD) has become a leading approach for model compression. For 2D range sensing (e.g., LiDAR, depth), it enables deployment on resource-constrained hardware by training a compact student to emulate a larger teacher’s predictions and internal representations. This can be done with soft-label supervision, feature regression, or relation/contrastive objectives. In 2D LiDAR applications, distillation typically cuts inference time and memory while preserving accuracy across tasks such as object detection, semantic segmentation, and Simultaneous Localization and Mapping (SLAM). For person detection specifically, it supports real-time operation on mobile robots, surveillance systems, and edge devices. We show that an error-weighted distillation scheme which prioritizes hard examples, allows a small student to learn representations that are more robust than its teacher’s. Our final model achieves real-time speeds on edge hardware and preserves the accuracy of state of the art models.

Paper Nr: 42
Title:

Toward Intelligent E-Learning: Leveraging Multimodal Biometrics and Real-Time Feedback

Authors:

Christine Bukola Asaju and Hima Vadapalli

Abstract: E-learning has undergone a significant transformation over the years. An example of such an advancement is the integration of biometric systems to assess student learning outcomes. Traditionally, these systems rely on a single modality to collect information, typically facial expressions or textual feedback from students. However, this approach often fails to provide a complete picture of the student's learning state. This study proposes a multimodal biometric system that combines facial emotion analysis with voice-to-text input to better estimate student learning affect. The system includes facial emotion classification and voice-based narrations by the student related to their understanding, which are converted to text for further analysis. A CNN-BiLSTM cascade is utilized for the facial emotion classification module, followed by a fine-tuned BERT-based natural language understanding module that takes in the textual summaries extracted from speech-based feedback provided by the students, along with text-based questions, to estimate the learning affect experienced by the students. The facial emotion classification module reported a 92% accuracy on 2,274 samples from the DISFA+ dataset, followed by the BERT-based natural language understanding and response generation model that reported an average F1-score of 66% for six question categories. The final estimate of the learning affect is then obtained using a rule-based decision level fusion. The proposed multimodal system is expected to enhance virtual learning environments by helping teachers better understand and respond to students' levels of comprehension during online teaching and learning.
Download

Paper Nr: 45
Title:

Leveraging Self Supervised Learning for Non-Technical Loss Detection

Authors:

Adrián Nicolás Cardozo, Camilo Mariño and Alvaro Gómez

Abstract: Given the increasing availability of data across multiple domains and the difficulty of generating large labeled data sets, Self-supervised learning (SSL) is a paradigm that has gained significant interest in recent years. SSL enables the exploitation of vast amounts of unlabeled data by learning useful representations from the data itself. These representations can then be transferred and fine-tuned on small labeled datasets for different downstream tasks. In the field of electric power distribution, non-technical losses (NTL), and in particular those arising from irregularities in measured consumption, are a major concern. Supervised machine learning in this case relies on labels gathered through on-site inspections. However, with the deployment of Advanced Metering Infrastructure (AMI) by power utilities, there is a large volume of unlabeled data that can be exploited by SSL techniques to enhance NTL detection. This article presents the application of SSL algorithms to a large dataset of 15-minute-interval smart-metered data from over 1.6M points of service. Learned representations are used in the downstream task of irregularity classification.
Download

Paper Nr: 51
Title:

EGDF-Net: Edge-Guided Deblurring and Fusion Network for Underwater Image Object Detection

Authors:

E. Goutham, M. Srinivas and R. B. V. Subramanyam

Abstract: Finding objects underwater is very important in robotics, marine biology, and archaeology. But these pictures often have problems because of water, which makes them look worse by scattering, absorbing, and changing colors. These problems make traditional object detection models less efficient. These models are usually made for clear, land-based images. We came up with an Edge Guided Deblurring and Fusion Network (EGDF-Net) to solve this problem. EGDF-Net is a system that combines object detection with image enhancement. Our method starts by using our proposed image enhancing method to improve the visual quality of the images. The detection model then gets the refined images so that it can find the objects. We tested EGDF Net on the URPC2020 dataset and found that it had a 52.7% mAP50:90, which is better than detectors like Prior Guided, YOLOv8, YOLOv12, and YSOOB. These results demonstrate that enhancing image quality before detection improves the model accuracy, providing a robust solution for underwater object detection tasks.
Download

Paper Nr: 70
Title:

6D Pose Estimation of Pallet Bins for Autonomous Logistics Using a Synthetically Trained Object Detector and FoundationPose

Authors:

Yanming Wu and Eric Demeester

Abstract: Accurate 6D pose estimation of pallet bins is essential for enabling autonomous forklifts and mobile manipulators to operate safely and efficiently in warehouse and agricultural logistics. This work presents a modular perception paradigm that combines a synthetically trained 2D object detector with a transformer-based foundation model to achieve practical and scalable pallet bin pose estimation when CAD models are available at inference time. We generate a diverse synthetic dataset in NVIDIA Isaac Sim, fine-tune state-of-the-art detectors (YOLOv12 and RF-DETR), and identify RF-DETR-S as the most effective initialization module. FoundationPose is then used in a model-based configuration to refine and rank multiple pose hypotheses. To provide an initial evaluation of this pipeline, we collect a new RGB-D dataset of 284 frames featuring diverse distances, illumination settings, viewpoints, and visibility levels. Experimental results show that RF-DETR-S, trained solely on synthetic data, yields bounding boxes sufficiently accurate for FoundationPose to achieve pose accuracy close to that obtained using ground-truth segmentation masks. A visibility-based analysis reveals that pose accuracy remains stable when at least 25% of the object is visible, whereas severe occlusion or truncation is the primary source of failure. Depth has a moderate effect, with reliable performance maintained between 2 m and 7 m. Overall, our findings indicate that the proposed RF-DETR-S + FoundationPose pipeline is a promising direction for pallet bin perception in autonomous logistics. We further outline practical insights and considerations for deploying such perception systems in real forklift applications.
Download

Paper Nr: 78
Title:

DTSPL-BEV: Decomposable Tiled Soccer Player Localization

Authors:

Ivar Persson, Abolfazl Chaman Motlagh and Mikael Nilsson

Abstract: We present an extension of the SPL-BEV (Soccer Player Localization for Bird’s-Eye-View) method, introducing Decomposable Tiled SPL-BEV (DTSPL-BEV), a variant designed to enable efficient inference on embedded devices. We show specialized tiling algorithms that enable inference on a given devices with some known memory limitations, while also ensuring the same detection results as running inference on the entire image. The tiling algorithms optimise tile configuration to minimises computational cost in terms of FLOPs, given memory constraints. We also introduce several potential network architecture changes that enable performance on par with SPL-BEV while reducing the computational cost. The code is available at https://github.com/IvarPersson/SPL-BEV.
Download

Paper Nr: 81
Title:

Informative Trait Identification for Verticillium dahliae in Olive through Bootstrap-Based Inference

Authors:

José Ramón Torres-Martín, Laura Teresa Martínez-Marquina, José Manuel Velarde-Gestera, Juan Antonio Navas-Cortés, Miguel Román-Écija, Mihaela I. Chidean and Inmaculada Mora-Jiménez

Abstract: Verticillium dahliae (Vd) is a vascular fungal pathogen that severely affects olive trees. We compiled a dataset of 104 olive trees in Jaén (Spain), collected at five time points from November 2023 to July 2024 using a field spectrometer, a Dualex sensor, and a porometer/fluorometer. A non-parametric bootstrap-based test was applied to identify spectral indices and physiological traits that discriminate among healthy branches, branches with no visual symptoms, and branches with visual symptoms for each time point. Results show that several canopy-structure indices, pigment-related indices, xanthophyll-cycle indicators, and visible-band indices consistently distinguished branches with visual symptoms from the rest. Dualex-derived traits, such as chlorophyll and nitrogen balance indices, decreased markedly in branches with visual symptoms, indicating pigment degradation and impaired nitrogen status. Physiological variables, including stomatal and boundary conductance, transpiration, leaf water content, and fluorescence parameters, also exhibited strong changes consistent with vascular obstruction and reduced photosynthetic efficiency. Some variables even showed sensitivity to early stages before the appearance of visual symptoms. This study provides a detailed characterization of Vd-induced changes at branch scale and identifies robust spectral and physiological markers of infection. Future work will focus on leveraging these variables for early disease detection models.
Download

Paper Nr: 87
Title:

Auto K-Means with Cluster Labelling Using LLM: MDL-Driven HERCULES for Actionable Segmentation

Authors:

Ian K. T. Tan and Zhen Hao Wee

Abstract: The effectiveness of clustering in customer segmentation is fundamentally constrained by two persistent issues: the need to predefine k (the number of clusters) and the difficulty in translating abstract cluster groupings into actionable insights. In the original HERCULES algorithm, it explicitly identified a key limitation which is the reliance on heuristic metrics for k-determination, which risks suboptimal segmentation. While it uses Large Language Models (LLM) for interpretability, its use of unanalysed feature statistics for the LLM prompts compromises output consistency and analytical transparency. This paper proposes an Auto-k Strategy and an Enhanced Interpretability Framework. The first improvement integrates k∗-means, which is based on the Minimum Description Length (MDL) principle, to autonomously determine the statistically optimal number of clusters (k∗) at each step. The second improvement is a novel two-step LLM prompting methodology. This includes Feature-Driven Summarisation, which statistically selects defining features for label consistency, followed by Contextualised Labelling, which incorporates user-defined context and business goals to generate immediate and actionable descriptions. This unified Auto-k HERCULES approach handles mixed data, producing statistically sound and semantically coherent segments, directly addressing the core limitations of the original framework.
Download

Paper Nr: 92
Title:

Tactical Overlay Interpretation: A Pattern-Recognition Study of Compact VLMs

Authors:

Alexandre Godinho and Alvaro Figueira

Abstract: This paper benchmarks compact vision-language models (VLMs, < 20B parameters) on a safety-critical task: interpreting a military tactical overlay and drafting structured Courses of Action (COAs). We evaluate five models using three lenses: (i) qualitative rubric-based inspection (OCR, grounding, symbology, doctrinal form, decision utility), (ii) a blind questionnaire where trained military students rate anonymized model dossiers (A–E) on doctrinal criteria and overall adequacy, and (iii) a golden-template fidelity check covering axis, objective sequence, junction, reserve, and fires priority. Results show similar perceived adequacy across models and low agreement in ranking, indicating that subjective ratings weakly discriminate text–overlay alignment. In contrast, template checks reveal systematic omissions of staff-critical clauses and occasional sequence drift, exposing a gap between plausibility and doctrinal fidelity. Compact VLMs can support supervised COA drafting but require checklist validation for operational reliability.
Download

Paper Nr: 101
Title:

Graph Pattern Mining for Anomalous Business Partnerships

Authors:

Moritz Luecke, Héctor Allende-Cid and Stefan Rueping

Abstract: Graph-based anomaly detection has become a key approach in pattern recognition and data mining for identifying irregular structures in complex relational data. This paper introduces a framework for discovering non-obvious collaborations in organizational networks, addressing limitations of traditional partnership identification methods. We propose three complementary algorithms: Industry Ecosystem Anomaly Detection for cross-sector pattern discovery, Strategic Complementarity Surprise Measures for capability-based anomaly scoring, and Competitive Context Local Outlier Factor for context-aware detection. The main contribution lies in integrating strategic management theory-including Porter’s competitive forces, resource-based view, and dynamic capabilities-directly into graph mining algorithm design. The framework is validated on a real-world dataset of 355 Norwegian public companies comprising 421 documented partnerships. Experimental results demonstrate that the approach successfully detects 36 cross-industry anomalies, with top-scoring partnerships achieving anomaly scores above 0.8 and LOF values exceeding 1.6. These findings illustrate the potential of combining graph pattern mining with domain-specific measures for anomaly detection in business networks.
Download

Paper Nr: 107
Title:

Zero-Shot Urban–Rural Classification of Satellite Imagery Using CLIP and Ensemble Prompt Engineering

Authors:

Houda Khmila

Abstract: Urban–Rural land-use classification from satellite imagery is a key task in remote sensing, yet conventional supervised approaches require large amounts of labeled data, limiting their scalability. Recent vision–language models such as CLIP enable zero-shot image classification by jointly modeling visual and textual representations, but their performance in remote sensing is affected by domain mismatch and prompt sensitivity. This paper proposes an ensemble-based zero-shot framework for urban–rural classification that integrates multiple CLIP inference strategies to improve robustness and reliability. The proposed approach combines a lightweight CLIP model, a larger-capacity variant, and a prompt-engineered semantic expansion module, whose outputs are fused through weighted ensemble learning. The framework is evaluated on a balanced subset of the EuroSAT dataset constructed from diverse land-cover categories. Experimental results demonstrate that the proposed method achieves an accuracy of 95.4% and an F1-score of 0.95, consistently outperforming individual CLIP-based baselines without requiring any labeled training data. These findings highlight the potential of ensemble vision–language models as scalable and annotation-free solutions for land-use classification in remote sensing applications.
Download

Paper Nr: 111
Title:

Clinically Constrained Evaluation of Monocular Markerless Gait Analysis in the Sagittal Plane

Authors:

Deni Kernjus and Marina Ivasic-Kos

Abstract: This study investigates monocular, markerless gait analysis in the sagittal plane under explicit clinical accuracy constraints. We propose a processing pipeline that operates on a single RGB video, applying two state-of-the-art monocular 3D pose estimators (RTMW3D and MeTRAbs) for whole-body reconstruction. Anatomical keypoints are mapped to virtual markers to produce OpenSim-compatible lower-limb kinematics. To reflect downstream biomechanical requirements, geometric accuracy is treated as the primary constraint, with performance evaluated using a conservative Procrustes-aligned mean per-joint position error (PA-MPJPE) metric for both joints and virtual markers. The analysis, conducted on public gait datasets captured at sagittal viewpoints of 45°, 70°, and 90°, reveals that MeTRAbs consistently outperforms RTMW3D across all angles but remains above the 100 mm PA-MPJPE threshold. These findings establish an empirical upper bound on current monocular 3D pose accuracy for clinically meaningful gait assessment and yield practical guidelines for robust monocular camera placement and pipeline design.
Download

Paper Nr: 113
Title:

Lightweight Self-Supervised Detection of Fundamental Frequency and Accurate Probability of Voicing in Monophonic Music

Authors:

Venkat Suprabath Bitra and Homayoon Beigi

Abstract: Reliable fundamental frequency (F0) and voicing estimation is essential for neural synthesis, yet many pitch extractors depend on large labeled corpora and degrade under realistic recording artifacts. We propose a lightweight, fully self-supervised framework for joint F0 estimation and voicing inference, designed for rapid single-instrument training from limited audio. Using transposition-equivariant learning on CQT features, we introduce an EM-style iterative reweighting scheme that uses Shift Cross-Entropy (SCE) consistency as a reliability signal to suppress uninformative noisy/unvoiced frames. The resulting weights provide confidence scores that enable pseudo-labeling for a separate lightweight voicing classifier without manual annotations. Trained on MedleyDB and evaluated on MDB-stem-synth ground truth, our method achieves competitive cross-corpus performance (RPA 95.84, RCA 96.24) and demonstrates cross-instrument generalization.
Download

Paper Nr: 23
Title:

The Hidden Price Tag behind LLMs and Their Environmental Cost

Authors:

Nermeen Abou Baker and Uwe Handmann

Abstract: Artificial intelligence systems consume substantial environmental resources, which are often hidden due to inconsistent measurement methodologies and limited transparency. This study analyses the energy consumption, carbon emissions, and water usage of thirteen major language models to estimate the environmental costs of AI development and deployment. The analysis indicates that training energy consumption varies by 41,000× across models, which far exceeds parameter scaling ratios. Geographic training location creates an additional 6× difference in emissions for similar computational requirements. However, critical transparency gaps exist: seven of the thirteen major models lack verified environmental impact data, which prevents accurate crossmodel comparisons. True environmental costs are underestimated by traditional assessments, as shown by a comprehensive measurement methodology. Moreover, infrastructure imposes substantial additional costs. Analysis of water consumption demonstrates a significant shift in impact from training to inference, with significant geographic variations in efficiency, and inference consumption is now exceeding training requirements. These findings suggest that strategic choices about measurement methodology, training location, and deployment can produce significant environmental improvements. This study recommends the standardisation of environmental reporting to enable the accurate assessment and systematic optimisation of AI systems. Accurate assessments are essential for the sustainable development of AI as the technology scales globally
Download

Paper Nr: 27
Title:

AMAVA: Adaptive Motion-Aware Video-to-Audio Framework for Visually-Impaired Assistance

Authors:

Benjamin Klein, Kazi Ruslan Rahman and Sanchita Ghose

Abstract: Navigational aids for blind and low vision individuals struggle conveying dynamic real-world environments, leading to cognitive overload from continuous, undifferentiated feedback. We present AMAVA, a novel real-time video-to-audio framework that converts mobile device video into contextually relevant sound effects or text-to-speech descriptions. We propose a motion-aware pipeline using a lightweight AI classification model to distinguish between low and high-movement scenes followed by a real-time text-to-audio synthesis pipeline to enhance environmental perception more efficiently. In static environments, AMAVA generates spoken audio scene descriptions for situational awareness. In high-movement situations, it prioritizes safety by delivering sound cues, such as spoken hazard alerts and environmental sound effects. These audio outputs are produced by a decoder-only transformer-based vision-language model with mixture-of-experts and cross-modal attention for visual understanding, in conjunction with a neural text-to-speech and natural sound synthesis networks. The proposed framework uses prompt-based caching and category-specific throttling to avoid auditory clutter and minimize latency. We present a comprehensive evaluation of the system, including a real-time navigation study comparing a white cane alone versus with AMAVA, that shows a significant increase in user confidence and perceived safety.
Download

Paper Nr: 37
Title:

Impact of Head Pose Angles on Face Image Quality and Recognition Performance

Authors:

Alexander Kurz, Jacob Carnap and Olaf Henniger

Abstract: The head pose is a critical influencing factor for the utility of a face image in face recognition systems. For reference face images such as passport photographs, the head pose is required to be frontal. For probe images, the question arises as to how much deviation from a frontal head pose would be tolerable without compromising recognition performance. Relaxed head pose requirements can speed up the process of capturing single-use probe images, e.g., at border checkpoints, as adopting a precise frontal pose can take several attempts. Our work validates the threshold values based on the head pose angles computed using the Open Source Face Image Quality (OFIQ) software.
Download

Paper Nr: 46
Title:

Bridging Methods and Metrics: A Practical Framework for Time-Series Anomaly Detection - An Application Case to Energy Distribution Stations

Authors:

Manuel Sánchez-Laguardia, Sol Peluffo, Gastón García González, Alicia Fernández and Alvaro Gómez

Abstract: Anomaly detection in time series is critical for identifying unusual patterns that may indicate system failures, attacks, or other rare events. This is a challenging problem due to the rarity, unpredictability, and temporal variability of anomalies. We present an integrated anomaly detection framework that combines multiple state-of-the-art time-series methods into a unified, and user-friendly environment. The framework incorporates range-based evaluation metrics, enabling robust and consistent comparison of different detectors. We demonstrate its applicability through a comprehensive study on power measurements from distribution series, showing that combining complementary algorithms within a unified framework enhances the analysis. Our results further indicate that, in this domain, appropriately configured simple models can perform competitively, emphasizing the importance of methodological adequacy over model complexity. Overall, the framework provides a flexible environment for evaluating and comparing anomaly detection methods across diverse time-series datasets.
Download

Paper Nr: 60
Title:

Drought Stress Detection via Image Recognition in Lettuce Plants

Authors:

Zühal Wagner, Anubhav Saha, Lukas Munser and Stefan Streif

Abstract: The early detection of drought stress in crops is of critical importance for the maintenance of yield quality and the enablement of timely intervention, especially in controlled indoor farming environments. This study proposes a deep learning-based image recognition framework for detecting drought-induced stress symptoms in lettuce plants using advanced convolutional neural network architectures. The integration of two distinct deep learning models, namely EfficientNet and DenseNet, with a YOLOv11-based pipeline has been employed for the purpose of undertaking classification tasks. The experimental results demonstrated that EfficientNet-B1 achieved a testing accuracy of 87.8% and an F1-Score of 89.0%, indicating reliable performance with minimal computational overhead. However, DenseNet-169 demonstrated superior performance, attaining a testing accuracy of 95.1%, precision of 95.0%, recall of 95.0%, and an F1-Score of 95.0%. The enhanced performance of DenseNet-169 is attributable to its dense connectivity, which facilitates more effective feature propagation for the purpose of capturing subtle visual cues associated with drought stress. The findings emphasise the potential of deep learning-based image analysis in automating stress detection and enhancing decision-making in precision agriculture.
Download

Paper Nr: 79
Title:

Mobile Augmented Reality Indoor Navigation with DWC: Multi-Floor Deployment and Usability Evidence

Authors:

Soroush Mostofi Rad, Mohammad Reza Mohebbi and Sophie Jörg

Abstract: Indoor Augmented Reality (AR) navigation on smartphones remains challenged by pose drift in long corridors and staircases, as well as unreliable initialization under low-light conditions. These issues degrade spatial alignment, increase user confusion, and limit the scalability of AR wayfinding in large buildings. This paper presents a multi-floor indoor navigation system that combines QR code–based initialization, AR Foundation visual–inertial SLAM for tracking, Unity NavMesh for path planning, and a lightweight Dynamic Wall Correction (DWC) mechanism. DWC exploits collisions with annotated virtual walls and corridor boundaries to detect systematic misalignment and translate the building model accordingly, without requiring additional sensors or infrastructure. The system was deployed across three floors of a building at the University of Bamberg and evaluated with 25 participants. The user experience was assessed using the System Usability Scale (SUS), and the robustness of tracking was evaluated through repeated QR-based localization attempts. We implemented two visual guidance methods to support indoor navigation: a directional arrow and a virtual agent with a ghost-like appearance. Both guidance metaphors exceeded the SUS benchmark of 68, with the directional arrow achieving higher usability (SUS = 86.1) than the agent-based guidance (SUS = 71.2; Wilcoxon signedrank test, p = 1.78 × 10−4). Multiple QR scans were required in low-light areas, highlighting the reliability of initialization as a critical practical factor. Overall, the results show that structure-aware drift correction improves perceived spatial alignment while integrating seamlessly into standard mobile AR pipelines.
Download

Paper Nr: 95
Title:

Semi-Supervised Overtake Detection in Trucks Using CAN Data and BiLSTM Networks

Authors:

Fernando Alonso-Fernandez, Talha Hanif Butt and Prayag Tiwari

Abstract: Overtaking is a critical driving manoeuvre that poses significant safety risks, particularly for heavy-duty vehicles such as trucks. Despite its importance, automatic overtake detection has received limited attention in the literature, especially when relying exclusively on vehicle-internal data. In this paper, we address the challenge of overtake detection in trucks using Controller Area Network (CAN) bus signals collected from real, in-service vehicles. CAN data is readily available onboard, does not require additional sensing hardware, and avoids privacy concerns associated with camera-based or driver-monitoring systems. We propose a semi-supervised learning approach that combines classical machine learning classifiers and deep temporal models. Specifically, Support Vector Machines and Random Forests are first trained on a limited set of annotated data and then used to pseudolabel large amounts of unlabelled CAN data. These pseudolabelled samples are subsequently employed to train Bidirectional Long Short-Term Memory (BiLSTM) networks, enabling the effective use of data-hungry temporal models despite the scarcity of manual annotations. Experiments conducted on CAN data from multiple operational trucks demonstrate that the proposed strategy improves the overtake detection performance of BiLSTM models compared to purely supervised training. Via classifier fusion, we also demonstrate that a more balanced accuracy between classes (overtake vs. no-overtake) can be achieved compared to the classifiers used separately. The results show that pseudolabeling enables effective training of BiLSTM models under limited annotation, improving overtake detection performance and robustness.
Download