ICPRAM 2024 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 21
Title:

Tab-VAE: A Novel VAE for Generating Synthetic Tabular Data

Authors:

Syed M. Tazwar, Max Knobbout, Enrique H. Quesada and Mirela Popa

Abstract: Variational Autoencoders (VAEs) suffer from a well-known problem of overpruning or posterior collapse due to strong regularization while working in a sufficiently high-dimensional latent space. When VAEs are used to generate tabular data, categorical one-hot encoded data expand the dimensionality of the feature space dramatically, making modeling multi-class categorical data challenging. In this paper, we propose Tab-VAE, a novel VAE-based approach to generate synthetic tabular data that tackles this challenge by introducing a sampling technique at inference for categorical variables. A detailed review of the current state-of-the-art models shows that most of the tabular data generation approaches draw methodologies from Generative Adversarial Networks (GANs) while a simpler more stable VAE method is ignored. Our extensive evaluation of the Tab-VAE with other leading generative models shows Tab-VAE improves the state-of-the-art VAEs significantly. It also shows that Tab-VAE outperforms the best GAN-based tabular data generators, paving the way for a powerful and less computationally expensive tabular data generation model.
Download

Paper Nr: 35
Title:

GENUINE: Genomic and Nucleus Information Embedding for Single Cell Genetic Alteration Classification in Microscopic Images

Authors:

Simon Gutwein, Martin Kampel, Sabine Taschner-Mandl and Roxane Licandro

Abstract: Fluorescence in situ hybridization (FISH) is an essential technique in cancer diagnostics, providing valuable insights into the genetic aberrations typical of malignancies. However, the effectiveness of FISH analysis is often impeded by the susceptibility of conventional classification algorithms to variations in image appearances, coupled with a reliance on manually crafted decision rule design, limiting their adaptability and precision. To address these challenges, we introduce GENUINE, an innovative two-stream network that combines whole image information through a convolutional neural network encoder and incorporates a single FISH signal stream dedicated to the analysis of individual signals. Our results demonstrate that GENUINE achieves remarkable accuracy not only on datasets resembling the training data distributions, but also on previously unseen data, underscoring its robustness and generalizability. Moreover, we present evidence that the architecture of GENUINE inherently acts as a regularizer during training against label noise. This leads to the extraction of meaningful features and thereby fosters a biological relevant organization of the feature space. The development of GENUINE marks a significant advancement in the utilization of FISH for cancer diagnostics, providing a robust and versatile tool capable of navigating the complexities of genetic aberrations in malignancies.
Download

Paper Nr: 45
Title:

MAC: Multi-Scales Attention Cascade for Aerial Image Segmentation

Authors:

Yubo Wang, Zhao Wang, Yuusuke Nakano, Katsuya Hasegawa, Hiroyuki Ishii and Jun Ohya

Abstract: Unlike general semantic segmentation, aerial image segmentation has its own particular challenges, three of the most prominent of which are great object scale variation, the scattering of multiple tiny objects in a complex background and imbalance between foreground and background. Previous affinity learning-based methods introduced intractable background noise but lost key-point information due to the additional interaction between different level features in their Feature Pyramid Network (FPN) like structure, which caused inferior results.We argue that multi-scale information can be further exploited in each FPN level individually without cross-level interaction, then propose a Multi-scale Attention Cascade (MAC) model to leverage spatial local contextual information by using multiple sized non-overlapping window self-attention module, which mitigates the effect of complex and imbalanced background. Moreover, the multi-scale contextual cues are propagated in a cascade manner to tackle the large scale variation problem while extracting further details. Finally, a local channels attention is presented to achieve cross-channel interaction. Extensive experiments verify the effectiveness of MAC and demonstrate that the performance of MAC surpasses those of the stateof-the-art approaches by +2.2 mIoU and +3.1 mFscore on iSAID dataset, by +2.97 mIoU on ISPRS Vaihingen dataset. Code has been made available at https://github.com/EricBooob/Multi-scale-Attention-Cascade-forAerial-Image-Segmentation.
Download

Paper Nr: 66
Title:

Counterfactual-Based Feature Importance for Explainable Regression of Manufacturing Production Quality Measure

Authors:

Antonio L. Alfeo and Mario A. Cimino

Abstract: Machine learning (ML) methods need to explain their reasoning to allow professionals to validate and trust their predictions, and employ those in real-world decision-making processes. To do so, explainable artificial intelligence (XAI) methods based on feature importance can be employed, even though those can be very computationally expensive. Moreover, it can be challenging to determine whether an XAI technique might introduce bias into the explanation (e.g., overestimating or underestimating the feature importance) in the absence of some reference feature importance measure or even some domain knowledge from which deriving an expected importance level for each feature. We address both these issues by (i) employing a counterfactual-based strategy, i.e. deriving a measure of feature importance by checking if some minor changes in one feature’s values significantly affect the ML model’s regression outcome, and (ii) employing both synthetic and real-world industrial data coupled with the expected degree of importance for each feature. Our experimental results show that the proposed approach (BoCSoRr) is more reliable and way less computationally expensive than DiCE, a well-known counterfactual-based XAI approach able to provide a measure of feature importance.
Download

Paper Nr: 71
Title:

Let Me Take a Better Look: Towards Video-Based Age Estimation

Authors:

Krešimir Bešenić, Igor S. Pandžić and Jörgen Ahlberg

Abstract: Taking a better look at subjects of interest helps humans to improve confidence in their age estimation. Unlike still images, sequences offer spatio-temporal dynamic information that contains many cues related to age progression. A review of previous work on video-based age estimation indicates that this is an underexplored field of research. This may be caused by a lack of well-defined and publicly accessible video benchmark protocol, as well as the absence of video-oriented training data. To address the former issue, we propose a carefully designed video age estimation benchmark protocol and make it publicly available. To address the latter issue, we design a video-specific age estimation method that leverages pseudo-labeling and semi-supervised learning. Our results show that the proposed method outperforms image-based baselines on both offline and online benchmark protocols, while the online estimation stability is improved by more than 50%.
Download

Paper Nr: 88
Title:

Large Age Gap Face Verification by Learning GAN Synthesized Prototype Representations

Authors:

Swastik Jena, Bunil K. Balabantaray and Rajashree Nayak

Abstract: A phenomenal growth in the field of face recognition has been witnessed over the last few years. Existing deep learning-based face recognition methodologies employ auxiliary age classifiers and intermediate age synthesizers to address the discrepancies in facial appearance due to aging. However, even after training on large amount of annotated data samples and by utilizing prior information the existing methodologies still underperform in recognizing the large intra-class age variance posed by images of same identity. LAG is a challenging face verification benchmark dataset having very few samples per identity with large age variance and no age annotations. This paper aims to perform face verification on the LAG dataset by learning the large intra-class variance posed by aging. The proposed work integrates a new training regime for the face verification task. SimSwap GAN is used for generating hybrid faces from young and adult images present in the LAG dataset. A Prototype Feature Activation (PFA) network is used to extract the feature embeddings of the hybrid faces and a modified Siamese Neural Network is trained to learn the face embeddings combined with attention-enhanced feature fusion. Extensive experiments highlight the outperforming performance of the proposed approach compared with existing baseline face verification methods on the LAG dataset.
Download

Paper Nr: 98
Title:

Classifying Soccer Ball-on-Goal Position Through Kicker Shooting Action

Authors:

Javier T. Artiles, Daniel Hernández-Sosa, Oliverio J. Santana, Javier Lorenzo-Navarro and David Freire-Obregón

Abstract: This research addresses whether the ball’s direction after a soccer free-kick can be accurately predicted solely by observing the shooter’s kicking technique. To investigate this, we meticulously curated a dataset of soccer players executing free kicks and conducted manual temporal segmentation to identify the moment of the kick precisely. Our approach involves utilizing neural networks to develop a model that integrates Human Action Recognition (HAR) embeddings with contextual information, predicting the ball-on-goal position (BoGP) based on two temporal states: the kicker’s run-up and the instant of the kick. The study encompasses a performance evaluation for eleven distinct HAR backbones, shedding light on their effectiveness in BoGP estimation during free-kick situations. An extra tabular metadata input is introduced, leading to an interesting model enhancement without introducing bias. The promising results reveal 69.1% accuracy when considering two primary BoGP classes: right and left. This underscores the model’s proficiency in predicting the ball’s destination towards the goal with high accuracy, offering promising implications for understanding free-kick dynamics in soccer.
Download

Paper Nr: 100
Title:

On Spectrogram Analysis in a Multiple Classifier Fusion Framework for Power Grid Classification Using Electric Network Frequency

Authors:

Georgios Tzolopoulos, Christos Korgialas and Constantine Kotropoulos

Abstract: The Electric Network Frequency (ENF) serves as a unique signature inherent to power distribution systems. Here, a novel approach for power grid classification is developed, leveraging ENF. Spectrograms are generated from audio and power recordings across different grids, revealing distinctive ENF patterns that aid in grid classification through a fusion of classifiers. Four traditional machine learning classifiers plus a Convolutional Neural Network (CNN), optimized using Neural Architecture Search, are developed for One-vs-All classification. This process generates numerous predictions per sample, which are then compiled and used to train a shallow multi-label neural network specifically designed to model the fusion process, ultimately leading to the conclusive class prediction for each sample. Experimental findings reveal that both validation and testing accuracy outperform those of current state-of-the-art classifiers, underlining the effectiveness and robustness of the proposed methodology.
Download

Paper Nr: 101
Title:

Vision Transformer Interpretability via Prediction of Image Reflected Relevance Among Tokens

Authors:

Kento Sago and Kazuhiro Hotta

Abstract: The Vision Transformer (ViT) has a complex structure. To use it effectively in a place of critical decision-making, it is necessary to visualize an area that affects the model’s predictions so that people can understand. In this paper, we propose a new visualization method based on Transformer Attribution which is widely used for visualizing the area for ViT’s predictions. This method estimates the influences of each token on predictions by considering the predictions of images reflected relevance among tokens, and produce saliency maps. Our method increased the accuracy by about 1.28%, 1.61% for deletion and insertion and about 3.01%, 0.94% for average drop and average increase on ILSVRC2012 validation data in comparison with conventional methods.
Download

Paper Nr: 117
Title:

Path of Solutions for Fused Lasso Problems

Authors:

Torpong Nitayanont, Cheng Lu and Dorit S. Hochbaum

Abstract: In a fused lasso problem on sequential data, the objective consists of two competing terms: the fidelity term and the regularization term. The two terms are often balanced with a tradeoff parameter, the value of which affects the solution, yet the extent of the effect is not a priori known. To address this, there is an interest in generating the path of solutions which maps values of this parameter to a solution. Even though there are infinite values of the parameter, we show that for the fused lasso problem with convex piecewise linear fidelity functions, the number of different solutions is bounded by n 2 q where n is the number of variables and q is the number of breakpoints in the fidelity functions. Our path of solutions algorithm, PoS, is based on an efficient minimum cut technique. We compare our PoS algorithm with a state-of-the-art solver, Gurobi, on synthetic data. The results show that PoS generates all solutions whereas Gurobi identifies less than 22% of the number of solutions, on comparable running time. Even allowing for hundreds of times factor increase in time limit, compared with PoS, Gurobi still cannot generate all the solutions.
Download

Paper Nr: 131
Title:

A Post-Processing Strategy for Association Rules in Knowledge Discovery

Authors:

Luiz C. Cintra, Rodigo S. Dias and Rogerio Salvini

Abstract: Association Rule Mining (ARM) is a traditional data mining method that describes associations among elements in transactional databases. A well-known problem of ARM is the large number of rules generated, requiring approaches to post-process these rules so that a human expert can analyze the associations found. In certain scenarios, experts focus on exploring a specific element within the data, and a search based on this item can help reduce the problem. Few methods concentrate on post-processing generated rules targeting a specific item of interest. This study aims to highlight relevant associations of a particular element in order to gain knowledge about its role through its interactions and relationships with other factors. The paper introduces a post-processing strategy for association rules, selecting and grouping rules pertinent to a specific item of interest as provided by a domain expert. Additionally, a graphical representation facilitates the visualization and interpretation of associations between rules and their groupings. A case study demonstrates the applicability of the proposed method, effectively reducing the number of relevant rules to a manageable level for expert analysis.
Download

Paper Nr: 149
Title:

Incremental Whole Plate ALPR Under Data Availability Constraints

Authors:

Markus Russold, Martin Nocker and Pascal Schöttle

Abstract: In the realm of image processing, deep neural networks (DNNs) have proven highly effective, particularly in tasks such as license plate recognition. However, a notable limitation in their application is the dependency on the quality and availability of training data, a frequent challenge in practical settings. Addressing this, our research involves the creation of a comprehensive database comprising over 45,000 license plate images, meticulously designed to reflect real-world conditions. Diverging from conventional character-based approaches, our study centers on the analysis of entire license plates using machine learning algorithms. This novel approach incorporates continual learning and dynamic network adaptation techniques, enhancing existing automatic license plate recognition (ALPR) systems by boosting their overall confidence levels. Our findings validate the utility of machine learning in ALPR, even under stringent constraints, and demonstrate the feasibility and efficiency of recognizing license plates as complete units.
Download

Paper Nr: 151
Title:

XPCA Gen: Extended PCA Based Tabular Data Generation Model

Authors:

Sreekala K. Padinjarekkara, Jessica Alecci and Mirela Popa

Abstract: The proposed method XPCA Gen, introduces a novel approach for synthetic tabular data generation by util-ising relevant patterns present in the data. This is performed using principle components obtained through XPCA (probabilistic interpretation of standard PCA) decomposition of original data. Since new data points are obtained by synthesizing the principle components, the generated data is an accurate and noise redundant representation of original data with a good diversity of data points. The experimental results obtained on benchmark datasets (e.g. CMC, PID) demonstrate performance in ML utility metrics (accuracy, precision, recall), showing its ability to capture inherent patterns in the dataset. Along with ML utility metrics, high Hausdorff distance indicates diversity in generated data without compromising statistical properties. Moreover, this is not a data hungry method like other complex neural networks. Overall, XPCA Gen emerges as a promising solution for data privacy preservation and robust model training with diverse samples.
Download

Paper Nr: 159
Title:

Sign Language Recognition Based on Subspace Representations in the Spatio-Temporal Frequency Domain

Authors:

Ryota Sato, Suzana A. Beleza, Erica K. Shimomoto, Matheus Silva de Lima, Nobuko Kato and Kazuhiro Fukui

Abstract: This paper proposes a subspace-based method for sign language recognition in videos. Typical subspace-based methods represent a video as a low-dimensional subspace generated by applying principal component analysis (PCA) to a set of images from the video. Such representation is compact and practical for motion recognition under few learning data. However, given the complex motion and structure in sign languages, subspace-based methods need to improve performance as they do not consider temporal information like the order of frames. To address this issue, we propose processing time-domain information on the frequency-domain by applying the three-dimensional fast Fourier transform (3D-FFT) to sign videos, where a sign video is represented as a 3D amplitude spectrum tensor, which is invariant to deviations in the spatial and temporal directions of target objects. Further, a 3D amplitude spectral tensor is regarded as one point on the Product Grassmann Manifold (PGM). By unfolding the tensor in all three dimensions, PGM can account for the temporal information. Finally, we calculate video similarity by using the distances between two corresponding points on the PGM. The effectiveness of the proposed method is demonstrated on private and public sign language recognition datasets, showing a significant performance improvement over conventional subspace-based methods.
Download

Paper Nr: 160
Title:

Semantic Properties of Cosine Based Bias Scores for Word Embeddings

Authors:

Sarah Schröder, Alexander Schulz, Fabian Hinder and Barbara Hammer

Abstract: Plenty of works have brought social biases in language models to attention and proposed methods to detect such biases. As a result, the literature contains a great deal of different bias tests and scores, each introduced with the premise to uncover yet more biases that other scores fail to detect. What severely lacks in the literature, however, are comparative studies that analyse such bias scores and help researchers to understand the benefits or limitations of the existing methods. In this work, we aim to close this gap for cosine based bias scores. By building on a geometric definition of bias, we propose requirements for bias scores to be considered meaningful for quantifying biases. Furthermore, we formally analyze cosine based scores from the literature with regard to these requirements. We underline these findings with experiments to show that the bias scores’ limitations have an impact in the application case.
Download

Short Papers
Paper Nr: 17
Title:

TenebrioVision: A Fully Annotated Dataset of Tenebrio Molitor Larvae Worms in a Controlled Environment for Accurate Small Object Detection and Segmentation

Authors:

Angelos-Michael Papadopoulos, Paschalis Melissas, Anestis Kastellos, Panagiotis Katranitsiotis, Panagiotis Zaparas, Konstantinos Stavridis and Petros Daras

Abstract: Tenebrio molitor worms have shown extreme nutritional benefits, as they contain useful natural compounds, making them worth as an alternative food source. It is beneficial for insect farms to have automated mechanisms that can detect these worms. Without an explicitly annotated dataset, the task of detecting tenebrio molitor worms remains challenging and underdeveloped. To address this issue, we introduce TenebrioVi-sion, which is a fully annotated dataset, suitable for the detection and segmentation of tenebrio molitor larvae worms. The data acquisition is performed in a controlled environment. The dataset consists of 1,120 images, with a total of 53,600 worm instances. The 1,120 images are equally distributed on 14 distinct levels, each level containing a specific number of tenebrio monitor larvae worms. The dataset is validated in terms of mean average precision, memory allocation, and inference time, on several state-of-the-art baseline methods for both detection and segmentation purposes. The results unequivocally show that the detection and segmentation accuracy is high on both TenebrioVision and real farm images.
Download

Paper Nr: 22
Title:

Group Importance Estimation Method Based on Group LASSO Regression

Authors:

Yuki Mori, Seiji Yamada and Takashi Onoda

Abstract: There has been a rapidly growing interest in penalized least squares problems via l 1 regularization. The LASSO (Least Absolute Shrinkage and Selection Operator) regression, which utilizes l1 regularization, has gained popularity as a method for model selection and shrinkage estimation. An important extension of LASSO regression is Group LASSO regression, which generates sparse models at the group level. However, Group LASSO regression does not directly evaluate group importance. In this study, we propose a method to assess group importance based on Group LASSO regression. This method leverages regularization parameters to estimate the importance of each group. We applied this method to both synthetically generated data and real-world data, conducting experiments to evaluate its performance. As a result, the method accurately approximated the importance of groups, enhancing the interpretability of models at the group level.
Download

Paper Nr: 30
Title:

Learned Fusion: 3D Object Detection Using Calibration-Free Transformer Feature Fusion

Authors:

Michael Fürst, Rahul Jakkamsetty, René Schuster and Didier Stricker

Abstract: The state of the art in 3D object detection using sensor fusion heavily relies on calibration quality, difficult to maintain in large scale deployment outside a lab environment. We present the first calibration-free approach for 3D object detection. Thus, eliminating complex and costly calibration procedures. Our approach uses transformers to map features between multiple views of different sensors at multiple abstraction levels. In an extensive evaluation for object detection, we show that our approach outperforms single modal setups by 14.1% in BEV mAP, and that the transformer learns mapping features. By showing calibration is not necessary for sensor fusion, we hope to motivate other researchers following the direction of calibration-free fusion. Additionally, resulting approaches have a substantial resilience against rotation and translation changes.
Download

Paper Nr: 31
Title:

Content Rating Classification in Fan Fiction Using Active Learning and Explainable Artificial Intelligence

Authors:

Yi Sheng Heng and James Pope

Abstract: The emergence of fan fiction websites, where fans write their own storied about a topic/genre, has resulted in serious content rating issues. The websites are accessible to general audiences but often includes explicit content. The authors can rate their own fan fiction stories but this is not required and many stories are unrated. This motivates automatically predicting the content rating using recent natural languages processing techniques. The length of the fan fiction text, ambiguity in ratings schemes, self-annotated (weak) labels, and style of writing all make automatic content rating prediction very difficult. In this paper, we propose several embedding techniques and classification models to address these problem. Based on a dataset from a popular fan fiction website, we show that binary classification is better than multiclass classification and can achieve nearly 70% accuracy using a transformer-based model. When computation is considered, we show that a traditional word embedding technique and Logistic Regression produce the best results with 66% accuracy and 0.1 seconds computation (approximately 15,000 times faster than DistilBERT). We further show that many of the labels are not correct and require subsequent preprocessing techniques to correct the labels. We propose an Active Learning approach, that while the results are not conclusive, suggest further work to address.
Download

Paper Nr: 32
Title:

Improvement of Tensor Representation Label in Image Recognition: Evaluation on Selection, Complexity and Size

Authors:

Shinji Niihara and Minoru Mori

Abstract: One-hot vectors representing correct/incorrect answer classes as {1/0} are usually used as labels for classification problems in Deep Neural Networks. On the other hand, a method using a tensor consisting of speech spectrograms of class names as labels has been proposed and reported to improve resistance to Adversarial Examples. However, effective representations for tensor-based labels have not been sufficiently studied. In this paper, we evaluate the effects of selections of image, complexity, and tensor size expansion on the tensor representation labels. Evaluation experiments using several databases and DNN models show that higher accuracies and tolerances can be achieved by improving tensor representations.
Download

Paper Nr: 36
Title:

Practical Deep Feature-Based Visual-Inertial Odometry

Authors:

Charles Hamesse, Michiel Vlaminck, Hiep Luong and Rob Haelterman

Abstract: We present a hybrid visual-inertial odometry system that relies on a state-of-the-art deep feature matching front-end and a traditional visual-inertial optimization back-end. More precisely, we develop a fully-fledged feature tracker based on the recent SuperPoint and LightGlue neural networks, that can be plugged directly to the estimation back-end of VINS-Mono. By default, this feature tracker returns extremely abundant matches. To bound the computational complexity of the back-end optimization, limiting the number of used matches is desirable. Therefore, we explore various methods to filter the matches while maintaining a high visual-inertial odometry performance. We run extensive tests on the EuRoC machine hall and Vicon room datasets, showing that our system achieves state-of-the-art odometry performance according relative pose errors.
Download

Paper Nr: 42
Title:

Generative Data Augmentation for Few-Shot Domain Adaptation

Authors:

Carlos E. López Fortín and Ikuko Nishikawa

Abstract: Domain adaptation in computer vision focuses on addressing the domain gap between source and target distributions, generally via adversarial methods or feature distribution alignment. However, most of them suppose the availability of sufficient target data to properly teach the model domain-invariant representations. Few-shot scenarios where target data is scarce pose a significant challenge for their implementation in real-world scenarios. Leveraging fine-tuned diffusion models for synthetic data augmentation, we present Generative Data Augmentation for Few-shot Domain Adaptation, a model-agnostic approach to address the Few-shot problem in domain adaptation for multi-class classification. Experimental results show that using augmented data from fine-tuned diffusion models with open-source data sets can improve average accuracy by up to 3%, as well as increase per-class accuracy between 3% to 30%, for state-of-the-art domain adaptation methods with respect to their non-augmented counterparts, without requiring any major modifications to their architecture. This provides an easy-to-implement solution for the adoption of domain adaptation methods in practical scenarios.
Download

Paper Nr: 60
Title:

QEBB: A Query-Efficient Black-Box Adversarial Attack on Video Recognition Models Based on Unsupervised Key Frame Selection

Authors:

Kimia Haghjooei and Mansoor Rezghi

Abstract: Despite the success of deep learning models, they remain vulnerable to adversarial attacks introducing slight perturbations to inputs, resulting in adversarial examples. Black-box attacks, where model details are hidden from the attacker, gain attention for their real-world applications. Although studying adversarial attacks on video models is crucial due to their surveillance importance and security applications, most works on adversarial examples mainly focus on images, and videos are rarely studied since attacking videos is more challenging. Recent black-box video attacks involve selecting key frames to reduce video’s dimensionality. This addresses the high costs of attacking the entire video but may require numerous queries, making the attack noticeable. Our work introduces QEBB, a query-efficient black-box video attack. We employ an unsupervised key frame selection method to choose frames with vital representative information. Using saliency maps, we focus on key frame salient regions. QEBB successfully attacks UCF-101 and HMDB-51 datasets with 100% success and reducing query numbers by nearly 90% in comparison to state-of-the-art methods.
Download

Paper Nr: 61
Title:

Investigating the Suitability of Concept Drift Detection for Detecting Leakages in Water Distribution Networks

Authors:

Valerie Vaquet, Fabian Hinder and Barbara Hammer

Abstract: Leakages are a major risk in water distribution networks as they cause water loss and increase contamination risks. Leakage detection is a difficult task due to the complex dynamics of water distribution networks. In particular, small leakages are hard to detect. From a machine-learning perspective, leakages can be modeled as concept drift. Thus, a wide variety of drift detection schemes seems to be a suitable choice for detecting leakages. In this work, we explore the potential of model-loss-based and distribution-based drift detection methods to tackle leakage detection. We additionally discuss the issue of temporal dependencies in the data and propose a way to cope with it when applying distribution-based detection. We evaluate different methods systematically for leakages of different sizes and detection times. Additionally, we propose a first drift-detection-based technique for localizing leakages.
Download

Paper Nr: 65
Title:

Towards Better Motif Detection: Comparative Analysis of Several Symbolic Methods

Authors:

Nour H. Fodil, Damien Olivier and Pierrick Tranouez

Abstract: Motif discovery in time series is a process aimed at finding significant original structures. Methods like SAX rely on dimensionality reduction techniques to reduce computation time. Their inability to capture amplitude variations is one of their limitations. By introducing a new representation named UniformSAX, we aim to improve this aspect. We compare our approach to SAX, 1d-SAX, and fABBA, also introducing grammatical inference. The results show that approaches relying exclusively on representations are more suitable for fixed-length motifs but lose effectiveness for variable-length motifs.
Download

Paper Nr: 68
Title:

Information Theoretic Deductions Using Machine Learning with an Application in Sociology

Authors:

Arunselvan Ramaswamy, Yunpeng Ma, Stefan Alfredsson, Fran Collyer and Anna Brunström

Abstract: Conditional entropy is an important concept that naturally arises in fields such as finance, sociology, and intelligent decision making when solving problems involving statistical inferences. Formally speaking, given two random variables X and Y, one is interested in the amount and direction of information flow between X and Y. It helps to draw conclusions about Y while only observing X. Conditional entropy H(Y|X) quantifies the amount of information flow from X to Y. In practice, calculating H(Y|X) exactly is infeasible. Current estimation methods are complex and suffer from estimation bias issues. In this paper, we present a simple Machine Learning based estimation method. Our method can be used to estimate H(Y|X) for discrete X and bi-valued Y. Given X and Y observations, we first construct a natural binary classification training dataset. We then train a supervised learning algorithm on this dataset, and use its prediction accuracy to estimate H(Y|X). We also present a simple condition on the prediction accuracy to determine if there is information flow from X to Y. We support our ideas using formal arguments and through an experiment involving a gender-bias study using a part of the employee database of Karlstad University, Sweden.
Download

Paper Nr: 77
Title:

Parallel Tree Kernel Computation

Authors:

Souad Taouti, Hadda Cherroun and Djelloul Ziadi

Abstract: Tree kernels are fundamental tools that have been leveraged in many applications, particularly those based on machine learning for Natural Language Processing tasks. In this paper, we devise a parallel implementation of the sequential algorithm for the computation of some tree kernels of two finite sets of trees (Ouali-Sebti, 2015). Our comparison is narrowed on a sequential implementation of SubTree kernel computation. This latter is mainly reduced to an intersection of weighted tree automata. Our approach relies on the nature of the data parallelism source inherent in this computation by deploying both MapReduce paradigm and Spark framework. One of the key benefits of our approach is its versatility in being adaptable to a wide range of substructure tree kernel-based learning methods. To evaluate the efficacy of our parallel approach, we conducted a series of experiments that compared it against the sequential version using a diverse set of synthetic tree language datasets that were manually crafted for our analysis. The reached results clearly demonstrate that the proposed parallel algorithm outperforms the sequential one in terms of latency.
Download

Paper Nr: 82
Title:

YOLOv7E: An Attention-Based Improved YOLOv7 for the Detection of Unmanned Aerial Vehicles

Authors:

Dapinder Kaur, Neeraj Battish, Arnav Bhavsar and Shashi Poddar

Abstract: The detection of Unmanned Aerial Vehicles (UAVs) is a special case for object detection, specifically in the case of air-to-air scenarios with complex backgrounds. The proliferated use of UAVs in commercial, noncommercial, and defense applications has raised concerns regarding their unauthorized usage and mishandling in certain instances. Deep learning-based architectures developed recently to deal with this challenge could detect UAVs very efficiently in different backgrounds. However, the problem of detecting UAVs in complex background environments need further improvement and has been catered here by incorporating an attention mechanism in the YOLOv7 architecture, which considers channel and spatial attention. The proposed model is trained with the DeTFly dataset, and its performance has been evaluated in terms of detection rate, precision, and mean average precision values. The experimental results present the effectiveness of the proposed YOLOv7E architecture for detecting UAVs in aerial scenarios.
Download

Paper Nr: 84
Title:

ShapeAug: Occlusion Augmentation for Event Camera Data

Authors:

Katharina Bendig, René Schuster and Didier Stricker

Abstract: Recently, Dynamic Vision Sensors (DVSs) sparked a lot of interest due to their inherent advantages over conventional RGB cameras. These advantages include a low latency, a high dynamic range and a low energy consumption. Nevertheless, the processing of DVS data using Deep Learning (DL) methods remains a challenge, particularly since the availability of event training data is still limited. This leads to a need for event data augmentation techniques in order to improve accuracy as well as to avoid over-fitting on the training data. Another challenge especially in real world automotive applications is occlusion, meaning one object is hindering the view onto the object behind it. In this paper, we present a novel event data augmentation approach, which addresses this problem by introducing synthetic events for randomly moving objects in a scene. We test our method on multiple DVS classification datasets, resulting in an relative improvement of up to 6.5 % in top1-accuracy. Moreover, we apply our augmentation technique on the real world Gen1 Automotive Event Dataset for object detection, where we especially improve the detection of pedestrians by up to 5 %.
Download

Paper Nr: 85
Title:

Efficient Solver Scheduling and Selection for Satisfiability Modulo Theories (SMT) Problems

Authors:

David Mojžíšek and Jan Hůla

Abstract: This paper introduces innovative concepts for improving the process of selecting solvers from a portfolio to tackle Satisfiability Modulo Theories (SMT) problems. We propose a novel solver scheduling approach that significantly enhances solving performance, measured by the PAR-2 metric, on selected benchmarks. Our investigation reveals that, in certain cases, scheduling based on a crude statistical analysis of training data can perform just as well, if not better, than a machine learning predictor. Additionally, we present a dynamic scheduling approach that adapts in real-time, taking into account the changing likelihood of solver success. These findings shed light on the nuanced nature of solver selection and scheduling, providing insights into situations where data-driven methods may not offer clear advantages.
Download

Paper Nr: 102
Title:

Visualization of the Basis for Decisions by Selecting Layers Based on Model's Predictions Using the Difference Between Two Networks

Authors:

Takahiro Sannomiya and Kazuhiro Hotta

Abstract: Grad-CAM and Score-CAM are methods to improve the interpretation of CNNs whose internal behaviour is opaque. These methods do not select which layer to use, but simply use the final layer to visualize the basis of the decision. However, we wondered whether this was really appropriate, and wondered whether there might be important information hidden in layers other than the final layer in making predictions. In the proposed method, layers are selected based on the prediction probability of the model, and the basis of judgment is visualized. In addition, by taking the difference between the model that has been trained slightly to increase the confidence level of the model’s output class and the model before training, the proposed method performs a process to emphasize the parts that contributed to the prediction and provides a better quality basis for judgment. Experimental results confirm that the proposed method outperforms existing methods in two evaluation metrics.
Download

Paper Nr: 111
Title:

Applying the Neural Bellman-Ford Model to the Single Source Shortest Path Problem

Authors:

Spyridon Drakakis and Constantine Kotropoulos

Abstract: The Single Source Shortest Path problem aims to compute the shortest paths from a source node to all other nodes on a graph. It is solved using deterministic algorithms such as the Bellman-Ford, Dijkstra’s, and A* algorithms. This paper addresses the shortest path problem using a Message-Passing Neural Network model, the Neural Bellman Ford network, which is modified to conduct Predecessor Prediction. It provides a roadmap for developing models to calculate true optimal paths based on user preferences. Experimental results on real-world maps produced by the Open Street Map package show the ability of a Graph Neural Network to imitate the Bellman-Ford algorithm and solve the Single-Source Shortest Path problem.
Download

Paper Nr: 120
Title:

Surface Extraction in Coherence Scanning Interferometry by Gauss-Markov Monte-Carlo Method and Teager-Kaiser Operator

Authors:

Fabien Salzenstein and Abdel-Ouahab Boudraa

Abstract: This work deals with the problem of surface extraction using a combination of Teager-Kaiser operators and Gauss-Markov process in the context of coherence scanning (or white light scanning i.e, WLSI) interferometry. Our approach defines a Markov sequence along multiple surface profiles extracting their characteristics by the means of parameters describing the fringe signals along the optical axis, while most studies of the literature are restricted to local extraction of signals in one-dimensional mode. Thus the interest of the proposed strategy is to classify different surfaces present in a material, in particular the information relating to their roughness, by exploiting the statistical dependence between neighboring points where the noise is supposed to be white Gaussian noise. The effectiveness of our unsupervised method is illustrated on both synthetic and real images.
Download

Paper Nr: 126
Title:

Study of an Expansion Method Based on an Image-Specific Classifier and Multi-Features for Weakly Supervised Semantic Segmentation

Authors:

Zhengyang Lyu, Pierre Beauseroy and Alexandre Baussard

Abstract: In this paper, we propose a study of an expansion method based on an image-specific classifier and multi-features for Weakly Supervised Semantic Segmentation (WSSS) with only image-level labels. Recent WSSS methods focus mainly on enhancing the pseudo masks to improve the segmentation performance by obtaining improved Class Activation Maps (CAM) or by applying post-process methods that combine expansion and refinement. Most of these methods either lack of consideration for the balance between resolution and semantics in the used features, or are carried out globally for the whole data set, without taking into account potential additional improvements based on the specific content of the image. Previously, we proposed an image-specific expansion method using multi-features to alleviate these limitations. This new study aims firstly at determining the upper performance limit of the proposed method using the ground truth masks, and secondly at analysing this performance limit in relation with the features chosen. Experiments show that our expansion method can achieve promising results, when used with the ground truth (upper performance) and the features that strike a balance between semantics and resolution.
Download

Paper Nr: 127
Title:

Towards Self-Adaptive Resilient Swarms Using Multi-Agent Reinforcement Learning

Authors:

Rafael Pina, Varuna De Silva and Corentin Artaud

Abstract: Cooperative swarms of intelligent agents have been used recently in several different fields of application. The ability to have several units working together to accomplish a task can drastically extend the range of challenges that can be solved. However, these swarms are composed of machines that are susceptible to suffering external attacks or even internal failures. In cases where some of the elements of the swarm fail, the others must be capable of adjusting to the malfunctions of the teammates and still achieve the objectives. In this paper, we investigate the impact of possible malfunctions in swarms of cooperative agents through the use of Multi-Agent Reinforcement Learning (MARL). More specifically, we investigate how MARL agents react when one or more teammates start acting abnormally during their training and how that transfers to testing. Our results show that, while common MARL methods might be able to adjust to simple flaws, they do not adapt well when these become more complex. In this sense, we show how independent learners can be used as a potential direction of future research to adapt to malfunctions in swarms using MARL. With this work, we hope to motivate further research to create more robust intelligent swarms using MARL.
Download

Paper Nr: 132
Title:

Classification Performance Boosting for Interpolation Kernel Machines by Training Set Pruning Using Genetic Algorithm

Authors:

Jiaqi Zhang and Xiaoyi Jiang

Abstract: Interpolation kernel machines belong to the class of interpolating classifiers that interpolate all the training data and thus have zero training error. Recent research shows that they do generalize well. Interpolation kernel machines have been demonstrated to be a good alternative to support vector machine and thus should be generally considered in practice. In this work we study training set pruning as a means of performance boosting. Our work is motivated from different perspectives of the curse of dimensionality. We design a genetic algorithm to perform the training set pruning. The experimental results clearly demonstrate its potential for classification performance boosting.
Download

Paper Nr: 133
Title:

A Mutual Information Based Discretization-Selection Technique

Authors:

Artur J. Ferreira and Mário T. Figueiredo

Abstract: In machine learning (ML) and data mining (DM) one often has to resort to data pre-processing techniques to achieve adequate data representations. Among these techniques, we find feature discretization (FD) and feature selection (FS), with many available methods for each one. The use of FD and FS techniques improves the data representation for ML and DM tasks. However, these techniques are usually applied in an independent way, that is, we may use a FD technique but not a FS technique or the opposite case. Using both FD and FS techniques in sequence, may not produce the most adequate results. In this paper, we propose a supervised discretization-selection technique; the discretization step is done in an incremental approach and keeps information regarding the features and the number of bits allocated per feature. Then, we apply a selection criterion based upon the discretization bins, yielding a discretized and dimensionality reduced dataset. We evaluate our technique on different types of data and in most cases the discretized and reduced version of the data is the most suited version, achieving better classification performance, as compared to the use of the original features.
Download

Paper Nr: 137
Title:

Instance Selection Framework for Alzheimer’s Disease Classification Using Multiple Regions of Interest and Atlas Integration

Authors:

Juan A. Castro-Silva, Maria N. Moreno-Garcia, Lorena Guachi-Guachi and Diego H. Peluffo-Ordoñez

Abstract: Optimal selection of informative instances from a dataset is critical for constructing accurate predictive models. As databases expand, leveraging instance selection techniques becomes imperative to condense data into a more manageable size. This research unveils a novel framework designed to strategically identify and choose the most informative 2D brain image slices for Alzheimer’s disease classification. Such a framework integrates annotations from multiple regions of interest across multiple atlases. The proposed framework consists of six core components: 1) Atlas merging for ROI annotation and hemisphere separation. 2) Image preprocessing to extract informative slices. 3) Dataset construction to prevent data leakage, select subjects, and split data. 4) Data generation for memory-efficient batches. 5) Model construction for diverse classification training and testing. 6) Weighted ensemble for combining predictions from multiple models with a single learning algorithm. Our instance selection framework was applied to construct Transformer-based classification models, demonstrating an overall accuracy of approximately 98.33% in distinguishing between Cognitively Normal and Alzheimer’s cases at the subject level. It exhibited enhancements of 3.68%, 3.01%, 3.62% for sagittal, coronal, and axial planes respectively in comparison with the percentile technique.
Download

Paper Nr: 150
Title:

Mitigating Outlier Activations in Low-Precision Fine-Tuning of Language Models

Authors:

Alireza Ghaffari, Justin Yu, Mahsa G. Nejad, Masoud Asgharian, Boxing Chen and Vahid P. Nia

Abstract: Low-precision fine-tuning of language models has gained prominence as a cost-effective and energy-efficient approach to deploying large-scale models in various applications. However, this approach is susceptible to the existence of outlier values in activation. The outlier values in the activation can negatively affect the performance of fine-tuning language models in the low-precision regime since they affect the scaling factor and thus make representing smaller values harder. This paper investigates techniques for mitigating outlier activation in low-precision integer fine-tuning of the language models. Our proposed novel approach enables us to represent the outlier activation values in 8-bit integers instead of floating-point ( FP16) values. The benefit of using integers for outlier values is that it enables us to use operator tiling to avoid performing 16-bit integer matrix multiplication to address this problem effectively. We provide theoretical analysis and supporting experiments to demonstrate the effectiveness of our approach in improving the robustness and performance of low-precision fine-tuned language models.
Download

Paper Nr: 14
Title:

Achieving RGB-D Level Segmentation Performance from a Single ToF Camera

Authors:

Pranav Sharma, Jigyasa S. Katrolia, Jason Rambach, Bruno Mirbach and Didier Stricker

Abstract: Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing depth-specific convolutions in a multi-task learning framework. In our evaluation on an in-car segmentation dataset, we demonstrate the competitiveness of our method against the more costly RGB-D approaches.
Download

Paper Nr: 16
Title:

Noise Simulation for the Improvement of Training Deep Neural Network for Printer-Proof Steganography

Authors:

Telmo Cunha, Luiz Schirmer, João Marcos and Nuno Gonçalves

Abstract: In the modern era, images have emerged as powerful tools for concealing information, giving rise to innovative methods like watermarking and steganography, with end-to-end steganography solutions emerging in recent years. However, these new methods presented some issues regarding the hidden message and the decreased quality of images. This paper investigates the efficacy of noise simulation methods and deep learning methods to improve the resistance of steganography to printing. The research develops an end-to-end printer-proof steganography solution, with a particular focus on the development of a noise simulation module capable of overcoming distortions caused by the transmission of the print-scan medium. Through the development, several approaches are employed, from combining several sources of noise present in the physical environment during printing and capture by image sensors to the introduction of data augmentation techniques and self-supervised learning to improve and stabilize the resistance of the network. Through rigorous experimentation, a significant increase in the robustness of the network was obtained by adding noise combinations while maintaining the performance of the network. Thereby, these experiments conclusively demonstrated that noise simulation can provide a robust and efficient method to improve printer-proof steganography.
Download

Paper Nr: 27
Title:

A Branch-and-Bound Approach to Efficient Classification and Retrieval of Documents

Authors:

Kotaro Ii, Hiroto Saigo and Yasuo Tabei

Abstract: Text classification and retrieval have been crucial tasks in natural language processing. In this paper, we present novel techniques for these tasks by leveraging the invariance of feature order to the evaluation results. Building on the assumption that text retrieval or classification models have already been constructed from the training documents, we propose efficient approaches that can restrict the search space spanned by the test documents. Our approach encompasses two key contributions. The first contribution introduces an efficient method for traversing a search tree, while the second contribution involves the development of novel pruning conditions. Through computational experiments using real-world datasets, we consistently demonstrate that the proposed approach outperforms the baseline method in various scenarios, showcasing its superior speed and efficiency.
Download

Paper Nr: 41
Title:

Deep Learning, Feature Selection and Model Bias with Home Mortgage Loan Classification

Authors:

Hope Hodges, J. A. Connell, Carolyn Garrity and James Pope

Abstract: Analysis of home mortgage applications is critical for financial decision-making for commercial and government lending organisations. The Home Mortgage Disclosure Act (HMDA) requires financial organisations to provide data on loan applications. Accordingly, the Consumer Financial Protection Bureau (CFPB) provides loan application data by year. This loan application data can be used to design regression and classification models. However, the amount of data is too large to train for modest computational resources. To address this, we used reservoir sampling to take suitable subsets for processing. A second issue is that the number of features are limited to the original 78 features in the HMDA records. There are a large number of other data source and associated features that may improve model accuracy. We augment the HMDA data with ten economic indicator features from an external data source. We found that the additional economic features do not improve the model’s accuracy. We designed and compared several classical and recent classification approaches to predict the loan approval decision. We show that the Decision Tree, XG Boost, Random Forest, and Support Vector Machine classifiers achieve between 82-85% accuracy while Naive Bayes results in the lowest accuracy of 79%. We found that a Deep Neural Network classifier had the best classification perfor-mance with almost 89% f1 accuracy on the HMDA data. We performed feature selection to determine what features are the most important loan classification. We found that the more obvious loan amount and applicant income were important. Interestingly we found that when we left race and gender in the feature set, unfortunately, they were selected as an important feature by the machine learning methods. This highlights the need for diligence in financial systems to make sure the machine is not biased.
Download

Paper Nr: 52
Title:

Detecting Overgrown Plant Species Occluding Other Species in Complex Vegetation in Agricultural Fields Based on Temporal Changes in RGB Images and Deep Learning

Authors:

Haruka Ide, Hiroyuki Ogata, Takuya Otani, Atsuo Takanishi and Jun Ohya

Abstract: Synecoculture cultivates useful plants while expanding biodiversity in farmland, but the complexity of its management requires the establishment of new automated systems for management. In particular, pruning overgrown dominant species that lead to reduced diversity is an important task. This paper proposes a method for detecting overgrown plant species occluding other species from the camera fixed in a Synecoculture farm. The camera acquires time series images once a week soon after seeding. Then, a deep learning based semantic segmentation is applied to each of the weekly images. The plant species map, which consist of multiple layers corresponding to the segmented species, is created by storing the number of the existence of that plant species over weeks at each pixel in that layer. Finally, we combine the semantic segmentation results with the earlier plant species map so that occluding overgrown species and occluded species are detected. As a result of conducting experiments using six sets of time series images acquired over six weeks, (1) UNet-Resnet101 is most accurate for semantic segmentation, (2) Using both segmentation and plant species map achieves significantly higher segmentation accuracies than without plant species map, (3) Overgrown, occluding species and occluded species are successfully detected.
Download

Paper Nr: 55
Title:

Face Blending Data Augmentation for Enhancing Deep Classification

Authors:

Emna Ghorbel, Ghada Maddouri and Faouzi Ghorbel

Abstract: Facial image classification plays a vital role in computer vision applications, particularly in face recognition. Convolutional Neural Networks have excelled in this domain, however, their performance decline when dealing with small facial datasets. In that context, data augmentation methods have been proposed. In line with this, we introduce the Face Blending data augmentation method, which augments intra-class variability while preserving image semantics. By interpolating faces, we generate non-linear deformations, resulting in in-between images that maintain the original’s global aspect. Results show that Face Blending significantly enhances facial classification. Comparisons with Mix-up and Random Erasing techniques reveal improved accuracy, precision, recall, and F1 score, particularly with limited datasets. This method offers promise for realistic applications contributing to more reliable and accurate facial classification systems with limited data.
Download

Paper Nr: 58
Title:

Determination of Factors of Interest in Bone Models Based on Ultrasonic Data

Authors:

Marija Chuchalina, Aleksandrs Sisojevs and Alexey Tatarinov

Abstract: Osteoporosis is characterized by increased bone fragility due to a decrease in thickness of the cortical layer CTh and the development of internal porosity in it. The assessment of bone models that simulate the state of osteoporosis causes difficulties due to their complex and multi-layered structure. In the present work, the possibility of using machine learning approaches to determine internal porosity using the ultrasonic data obtained by scanning bone models was researched. The bone models were represented as sets of PMMA plates with gradually varying CTh from 2 to 6 mm. A stepwise progression of porosity from 0 to 100% of CTh was set by increasing the thickness of the porous layer PTh in steps of 1 mm. The evaluation method was based on the results of the supervised multi-class classification of the raw ultrasonic signals and their magnitude of the DFT spectrum with PTh used for labeling. Ultrasonic data was split into training and testing datasets while preserving the percentage of samples for each class. The results of the experiments demonstrated the potential effectiveness of the PTh classification, while optimization of the datasets and additional signal processing may contribute to the improvement of the results.
Download

Paper Nr: 62
Title:

Cardiac Arrhythmia Detection in Electrocardiogram Signals with CNN-LSTM

Authors:

Igor L. Souza and Daniel O. Dantas

Abstract: Sudden cardiac death and arrhythmia account for a large percentage of all deaths worldwide. Electrocardiography is essential in the clinical evaluation of patients who have heart disease. Through the electrocardiogram (ECG), medical doctors can identify whether the cardiac muscle dysfunctions presented by the patient have an inflammatory origin and diagnose early serious diseases that primarily affect the blood vessels and the brain. The basis of arrhythmia diagnosis is the identification of normal and abnormal heartbeats and their classification into different diagnoses based on ECG morphology. Traditionally, ECG signals are classified manually, requiring experience and great skill, while being time-consuming and prone to error. Thus, machine learning algorithms have been widely adopted because of their ability to perform complex data analysis. The objective of this study is to develop a classifier capable of classifying a patient’s ECG signals for the detection of arrhythmia in clinical patients. We developed a convolutional neural network (CNN) with long short memory (LSTM) to identify five classes of heartbeats in ECG signals. Our experiment was conducted with ECG signals obtained from a publicly available MIT-BIH database. The number of instances was even out to five classes of heartbeats. The proposed model achieved an accuracy of 98.12% and an F1-score of 99.72% in the classification of ventricular ectopic beats (V), and an accuracy of 97.39% and an F1-score of 95.25% in the classification of supraventricular ectopic beats (S).
Download

Paper Nr: 81
Title:

Fuel Classification in Electronic Tax Documents

Authors:

Yúri F. Dantas de Sant’Anna, Mariana Lira de Farias, Methanias Colaço Júnior, Daniel O. Dantas and Max C. Rodrigues Junior

Abstract: The Tax on the Circulation of Goods and Services (Imposto sobre Circulação de Mercadorias e Serviços, ICMS), a responsibility of the federative units, is the main Brazilian tax collection resource. One way to collect this tax is through a product’s weighted average price to the end consumer (preço médio ponderado ao consumidor final, PMPF) of a product. The PMPF is the only resource for charging state fees for the fuel segment, so if improperly calculated, it can lead to losses both in the collection of public funds and in the evolution of prices practiced by merchants. The objective of this work is to make a comparative analysis of classification algorithms used to calculate the PMPF of fuels in the state of Sergipe to select the most appropriate technique. This system circumvented deficiencies present in the previously applied simple random sampling methodology. The naive Bayes algorithm was considered the most effective approach due to its high accuracy and feasibility of application in a real-life scenario.
Download

Paper Nr: 103
Title:

Improvement of TransUNet Using Word Patches Created from Different Dataset

Authors:

Ayato Takama, Satoshi Kamiya and Kazuhiro Hotta

Abstract: UNet is widely used in medical image segmentation, but it cannot extract global information sufficiently. On the other hand, TransUNet achieves better accuracy than conventional UNet by combining a CNN, which is good at local features, and a Transformer, which is good at global features. In general, TransUNet requires a large amount of training data, but there are constraints on training images in the medical area. In addition, the encoder of TransUNet uses a pre-trained model on ImageNet consisted of natural images, but the difference between medical images and natural images is a problem. In this paper, we propose a method to learn Word Patches from other medical datasets and effectively utilize them for training TransUNet. Experiments on the ACDC dataset containing 4 classes of 3D MRI images and the Synapse multi-organ segmentation dataset containing 9 classes of CT images show that the proposed method improved the accuracy even with small training data, and we showed that the performance of TransUNet is greatly improved by using Word Patches created from different medical datasets.
Download

Paper Nr: 130
Title:

Swap-Deep Neural Network: Incremental Inference and Learning for Embedded Systems

Authors:

Taihei Asai and Koichiro Yamauchi

Abstract: We propose a new architecture called “swap-deep neural network” that enables the learning and inference of large-scale artificial neural networks on edge devices with low power consumption and computational complexity. The proposed method is based on finding and integrating subnetworks from randomly initialized networks for each incremental learning phase. We demonstrate that our method achieves a performance equivalent to that of conventional deep neural networks for a variety of various classification tasks.
Download

Paper Nr: 135
Title:

An Improved VGG16 Model Based on Complex Invariant Descriptors for Medical Images Classification

Authors:

Mohamed A. Mezghich, Dorsaf Hmida, Taha M. Nahdi and Faouzi Ghorbel

Abstract: In this paper, we intent to present an improved VGG16 deep learning model based on an invariant and complete set of descriptors constructed by a linear combination of complex moments. First, the invariant features are studied to highlight it’s stability and completeness properties over rigid transformations, noise and non rigid transformations. Then our proposed method to inject this family to the well know deep leaning VGG16 model is presented. Experimental results are satisfactory and the model accuracy is improved.
Download

Paper Nr: 141
Title:

On Function of the Cortical Column and Its Significance for Machine Learning

Authors:

Alexei Mikhailov and Mikhail Karavay

Abstract: Columnar organization of the neocortex is widely adopted to explain the cortical processing of information (Mountcastle, V., 1957, Mountcastle, V., 1997, DeFelipe, J., 2012). Neurons within a minicolumn (feature column) simultaneously respond to a specific feature, whereas neurons within a macrocolumn respond to all values of receptive field parameters (Horton, J., Adams, D., 2005). Hypotheses for a cortical column function envisage a massively repeated “canonical” circuit or a spatiotemporal filter (Bastos, A. et al., 2012). However, nearly a century after the neuroanatomical organization of the cortex was first defined, there is still no consensus about what a function of the cortical column is (Marcus, G., Marblestone, A., Dean, T., 2014). That is, why are cortical pyramidal neurons arranged into columns? Here we propose what the function of the neocortical column is using both neuro-physiological and computational evidence. This conjecture of the column’s function helped find a way of evaluating the memory capacity of a cortical region in terms of patterns as a solution to a suggested connectivity equation. Also, it allowed introducing a connectivity-based machine learning model that accounted for pattern recognition accuracy, noise tolerance and showed how to build practically instant learning pattern recognition systems.
Download

Paper Nr: 142
Title:

Evaluation of K-Means Time Series Clustering Based on Z-Normalization and NP-Free

Authors:

Ming-Chang Lee, Jia-Chun Lin and Volker Stolz

Abstract: Despite the widespread use of k-means time series clustering in various domains, there exists a gap in the literature regarding its comprehensive evaluation with different time series preprocessing approaches. This paper seeks to fill this gap by conducting a thorough performance evaluation of k-means time series clustering on real-world open-source time series datasets. The evaluation focuses on two distinct techniques: z-normalization and NP-Free. The former is one of the most commonly used approaches for normalizing time series, and the latter is a real-time time series representation approach. The primary objective of this paper is to assess the impact of these two techniques on k-means time series clustering in terms of its clustering quality. The experiments employ the silhouette score, a well-established metric for evaluating the quality of clusters in a dataset. By systematically investigating the performance of k-means time series clustering with these two preprocessing techniques, this paper addresses the current gap in k-means time series clustering evaluation and contributes valuable insights to the development of time series clustering.
Download

Paper Nr: 156
Title:

Neuromorphic Encoding / Reconstruction of Images Represented by Poisson Counts

Authors:

V. E. Antsiperov

Abstract: The paper discusses one of the possible neuromorphic methods for processing relatively large volumes of streaming data. The method is mainly motivated by the known mechanisms of sensory perception of living systems, in particular, methods of visual perception. In this regard, the main provisions of the method are discussed in the context of problems of encoding/recovering images on the periphery of the visual system. The proposed method is focused on representing input data in the form of a stream of discrete events (counts), like the firing events of retinal neurons. For these purposes, a special representation of data streams is used in the form of a controlled size samples of counts (sampling representations). Based on the specifics of the sampling representation, the generative data model is naturally formalized in the form of a system of components distributed over the field of view. These components are equipped with some “neuromorphic” structure, which model a system of receptive fields, embodying universal principles (including lateral inhibition) of the neural network of the brain. The mechanism of lateral inhibition is implemented in the model in the form of an antagonistic structure of the RF centre / surround. Issues of image decoding are considered in the context of restoring spatial contrasts, which partly emulates the work of the so-called simple / complex cells of the primary visual cortex. It is shown that the model of coupled ON-OFF decoding allows for the restoration of sharp image details in the form of emphasizing edges.
Download

Area 2 - Applications

Full Papers
Paper Nr: 39
Title:

Fast Filtering for Similarity Search Using Conjunctive Enumeration of Sketches in Order of Hamming Distance

Authors:

Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata and Tetsuji Kuboyama

Abstract: Sketches are compact bit-string representations of points, often employed for speeding up searches through the effects of dimensionality reduction and data compression. In this paper, we propose a novel sketch enumeration method and demonstrate its ability to realize fast filtering for approximate nearest neighbor search in metric spaces. Whereas the Hamming distance between the query’s sketch and sketches of points to be searched has been used for sketch prioritization traditionally, recent research has introduced asymmetric distances, enabling higher recall rates with fewer candidates. Additionally, sketch enumeration methods that speed up the filtering such that high-priority solution candidates are selected based on the priority of the sketch to the given query without the need for direct sketch comparisons have been proposed. Our primary goal in this paper is to further accelerate sketch enumeration through parallel processing. While Hamming distance-based enumeration can be parallelized relatively easily, achieving high recall rates requires a large number of candidates, and speeding up the filtering alone is insufficient for overall similarity search acceleration. Therefore, we introduce the conjunctive enumeration method, which concatenates two Hamming distance-based enumerations to approximate asymmetric distance-based enumeration. Then, we validate the effectiveness of the proposed method through experiments using large-scale public datasets. Our approach offers a significant acceleration effect, thereby enhancing the efficiency of similarity search operations.
Download

Paper Nr: 51
Title:

Crossing Domain Borders with Federated Few-Shot Adaptation

Authors:

Manuel Röder, Maximilian Münch, Christoph Raab and Frank-Michael Schleif

Abstract: Federated Learning has gained significant attention as a data protecting paradigm for decentralized, client-side learning in the era of interconnected, sensor-equipped edge devices. However, practical applications of Federated Learning face three major challenges: First, the expensive data labeling process required for target adaptation involves human participation. Second, the data collection process on client devices suffers from covariate shift due to environmental impact on attached sensors, leading to a discrepancy between source and target samples. Third, in resource-limited environments, both continuous or regular model updates are often infeasible due to limited data transmission capabilities or technical constraints on channel availability and energy efficiency. To address these challenges, we propose FedAcross, an efficient and scalable Federated Learning framework designed specifically for real-world client adaptation in industrial environments. It is based on a pre-trained source model that includes a deep backbone, an adaptation module, and a classifier running on a powerful server. By freezing the backbone and the classifier during client adaptation on resource-constrained devices, we enable the domain adaptive linear layer to solely handle target domain adaptation and minimize the overall computational overhead. Our extensive experimental results validate the effectiveness of FedAcross in achieving competitive adaptation on low-end client devices with limited target samples, effectively addressing the challenge of domain shift. Our framework effectively handles sporadic model updates within resource-limited environments, ensuring practical and seamless deployment.
Download

Paper Nr: 54
Title:

CLIP: Assisted Video Anomaly Detection

Authors:

Meng Dong

Abstract: As the main application of intelligent monitoring, video anomaly detection in surveillance has been well developed but remains challenging. Various types of anomalies promote the requirements of unique detectors in the general domains, whereas users may need to customize normal and abnormal situations in specific domains in descriptions, such as ”pedestrian No entry” or ”people fighting”. Moreover, anomalies in unseen videos are usually excluded from the training datasets. Conventional techniques based on computer vision or machine learning are typically data-intensive or limited to specific domains. Targeting developing a generalized framework for intelligent monitoring, we introduce generative anomaly descriptions to compensate for the visual branch and bridge the possibilities to adapt specific application domains. In particular, we adopt contrastive language-image pre-training (CLIP) with generative anomaly descriptions as our general anomaly detector. Not as state-of-the-art, category-level anomaly descriptions instead of simple category names will be adopted as language prompts in this work. A temporal module is developed on top of CLIP to capture temporal correlations of anomaly events. Besides the above frame-level anomaly detection, we support the detection of object-centric anomalies for some specific domains. Extensive experiment results show that the novel framework offers state-of-the-art performance on UCF-Crime and ShanghaiTech datasets.
Download

Paper Nr: 67
Title:

CaRaCTO: Robust Camera-Radar Extrinsic Calibration with Triple Constraint Optimization

Authors:

Mahdi Chamseddine, Jason Rambach and Didier Stricker

Abstract: The use of cameras and radar sensors is well established in various automation and surveillance tasks. The multimodal nature of the data captured by those two sensors allows for a myriad of applications where one covers for the shortcomings of the other. While cameras can capture high resolution color data, radar can capture the depth and velocity of targets. Calibration is a necessary step before applying fusion algorithms to the data. In this work, a robust extrinsic calibration algorithm is developed for camera-radar setups. The standard geometric constraints used in calibration are extended with elevation constraints to improve the optimization. Furthermore, the method does not rely on any external measurements beyond the camera and radar data, and does not require complex targets unlike existing work. The calibration is done in 3D thus allowing for the estimation of the elevation information that is lost when using 2D radar. The results are evaluated against a sub-millimeter ground truth system and show superior results to existing more complex algorithms. https://github.com/mahdichamseddine/CaRaCTO.
Download

Paper Nr: 80
Title:

AirEyeSeg: Teacher-Student Insights into Robust Fisheye UAV Detection

Authors:

Zhenyue Gu, Benedikt Kolbeinsson and Krystian Mikolajczyk

Abstract: Accurate obstacle detection in Unmanned Aerial Vehicles (UAVs) using fisheye lenses is challenged by image distortions. While advanced algorithms like Fast Region-Based Convolutional Neural Network (Fast R-CNN), Spatial Pyramid Pooling-Net (SPP-Net), and You Only Look Once (YOLO) are proficient with standard images, they underperform on fisheye images due to serious distortions. We introduce a real-time fisheye object detection system for UAVs, underpinned by specialized fisheye datasets. Our contributions encompass the creation of UAV-centric fisheye datasets, a distillation-based (also termed Teacher-Student) training method, and AirEyeSeg, a pioneering fisheye detector. AirEyeSeg achieved a Mask(mAP50) of 88.6% for cars on the combined Visdorone and UAVid datasets and 84.5% for people on the SEE dataset using the Box(P) metric. Our results demonstrate AirEyeSeg’s superiority over traditional detectors and validate our Teacher-Student training approach, setting a benchmark in fisheye-lensed UAV object detection. The code is available at https://github.com/Zane-Gu/AirEyeSeg.
Download

Paper Nr: 83
Title:

Modeling Batch Tasks Using Recurrent Neural Networks in Co-Located Alibaba Workloads

Authors:

Hifza Khalid, Arunselvan Ramaswamy, Simone Ferlin and Alva Couch

Abstract: Accurate predictive models for cloud workloads can be helpful in improving task scheduling, capacity planning and preemptive resource conflict resolution, especially in the setting of co-located jobs. Alibaba, one of the leading cloud providers co-locates transient batch tasks and high priority latency sensitive online jobs on the same cluster. In this paper, we consider the problem of using a publicly released dataset by Alibaba to model the batch tasks that are often overlooked compared to online services. The dataset contains the arrivals and resource requirements (CPU, memory, etc.) for both batch and online tasks. Our trained model predicts, with high accuracy, the number of batch tasks that arrive in any 30 minute window, their associated CPU and memory requirements, and their lifetimes. It captures over 94% of arrivals in each 30 minute window within a 95% prediction interval. The F1 scores for the most frequent CPU classes exceed 75%, and our memory and lifetime predictions incur less than 1% test data loss. The prediction accuracy of the lifetime of a batch-task drops when the model uses both CPU and memory information, as opposed to only using memory information.
Download

Paper Nr: 108
Title:

Enhanced Segmentation of Deformed Waste Objects in Cluttered Environments

Authors:

Muhammad Ali, Omar Alsuwaidi and Salman Khan

Abstract: Recycling is a crucial process for mitigating environmental pollution; however, due to inefficiencies in waste sorting, a significant portion of recyclable waste is being underutilized. The complexity and disorganization of waste streams make it challenging to efficiently separate recyclable materials. Identifying recyclable items in cluttered environments requires the recognition of highly deformable objects by computer vision systems. To this end, we propose a computer vision-based approach capable of efficiently separating recyclable materials from waste, even in disorganized settings, by recognizing highly deformable objects. We extend an existing large-scale CNN-based model, the InternImage, by introducing Mutli-scale networks and combining cross-entropy and dice loss for improved segmentation. Our focus is on enhancing the segmentation of the ZeroWaste-f dataset, an industrial-grade dataset for waste detection and segmentation. We further propose a unique Mutli-scale feed-forward network configuration and integrate it with the InternImage architecture to effectively model Multi-scale information on the challenging ZeroWaste-f dataset for both waste detection and segmentation tasks. This improvement is further enhanced by introducing a novel Freezeconnect module which helps to counteract neuron co-adaptation during training by redistributing the learning (gradient signal) across the network. We compare our model with existing state-of-the-art baseline methods on ZeroWaste-f and TrashCAN datasets to demonstrate the effectiveness of our method.
Download

Paper Nr: 122
Title:

Integrating Structure and Sequence: Protein Graph Embeddings via GNNs and LLMs

Authors:

Francesco Ceccarelli, Lorenzo Giusti, Sean B. Holden and Pietro Liò

Abstract: Proteins perform much of the work in living organisms, and consequently the development of efficient computational methods for protein representation is essential for advancing large-scale biological research. Most current approaches struggle to efficiently integrate the wealth of information contained in the protein sequence and structure. In this paper, we propose a novel framework for embedding protein graphs in geometric vector spaces, by learning an encoder function that preserves the structural distance between protein graphs. Utilizing Graph Neural Networks (GNNs) and Large Language Models (LLMs), the proposed framework generates structure- and sequence-aware protein representations. We demonstrate that our embeddings are successful in the task of comparing protein structures, while providing a significant speed-up compared to traditional approaches based on structural alignment. Our framework achieves remarkable results in the task of protein structure classification; in particular, when compared to other work, the proposed method shows an average F1-Score improvement of 26% on out-of-distribution (OOD) samples and of 32% when tested on samples coming from the same distribution as the training data. Our approach finds applications in areas such as drug prioritization, drug re-purposing, disease sub-type analysis and elsewhere.
Download

Paper Nr: 157
Title:

Identifying Indian Cattle Behaviour Using Acoustic Biomarkers

Authors:

Ruturaj Patil, Hemavathy B, Sanat Sarangi, Dineshkumar Singh, Rupayan Chakraborty, Sanket Junagade and Srinivasu Pappula

Abstract: A system to recognise sounds from some major cattle breeds commonly found in India and linking them to intents reflecting specific behaviour along with associated needs is proposed. Cattle breeds in India consist of a mix of indigenous and exotic breeds where Sindhi, Sahiwal, and Gir make up a significant fraction of the indigenous breeds. Exotic breeds are Jersey and Holstein Friesian. Vocalisation from the animals in this cattle group is used to create a sound dataset comprising 120 utterances for over six intents where the intents were labelled by domain experts familiar with the animals and their behaviour. MFCCs and OpenSMILE global features from the audio signal with 6552 properties are used to model for intent recognition. The dataset is scaled and augmented with four different methods to 870 cattle sounds for the six classes. Two model architectures are created and tested on data for each method independently and with all of them together. The models are also tested on unseen cattle sounds for speaker independent verification. An accuracy of 97% was obtained for intent classification with MFCCs and OpenSMILE features. This indicates that behaviour recognition from sounds for Indian cattle breeds is possible with a good confidence level.
Download

Short Papers
Paper Nr: 26
Title:

Anomaly Detection Methods for Finding Technosignatures

Authors:

Rohan Loveland and Ryan Sime

Abstract: Machine learning based anomaly detection methods are used to find technosignatures, in this case human activity on the Moon, in high resolution imagery for four anomaly detection methods: autoencoder based reconstruction loss, kernel density estimate of probability density, isolation forests, and the Farpoint algorithm. A deep learning variational autoencoder was used which provided both a reconstruction capability as well as a means of dimensionality reduction. The resulting lower dimension latent space data was used for the probability density and isolation forest methods. For our data, we use Lunar Reconnaissance Orbiter high resolution imagery on four known mission locations, with large areas broken into smaller tiles. We rank the tiles by anomalousness and determine the gains in efficiency that would result from showing the tiles in that order as compared to using random selection. The resulting efficiency in reduction of necessary amount of analyst time ranges into factors in the hundreds depending on the particular mission, with the Farpoint algorithm generally having the best performance. We also combine the tiles into bounding boxes based on spatial proximity, and demonstrate that this could provide a further improvement in reduction efficiency.
Download

Paper Nr: 37
Title:

Detection of Energy Drifts in Waste Water Treatment Plants Using Dynamic Clustering

Authors:

Lucie Martin, Muriel Dugachard, Yuqi Wang and Guillaume Scherpereel

Abstract: The sanitation process is energy intensive. There are therefore environmental issues for treated wastewater companies which must always optimize and reduce their energy expenditure. This paper aims to characterize the energy consumption patterns of the Waste Water Treatment Plants (WWTPs). Once these patterns have been established, their evolution is monitored through time. This work is based on the 78 most energy-intensive treated wastewater treatment plants in France. The consumption is studied from 2019 to the beginning of 2020. Energy expenditure depends on the operating condition of the WWTP, such as the volume of treated wastewater, the organic-based pollution, the rainfall, the amount of suspended solids, the temperature and the pH of the effluent. This relation is modeled using PLS regression, which can be used to characterize the WWTP’s energy consumption behavior. WWTPs’ load patterns are grouped into clusters using K-means. Five different consumption patterns are obtained for the year 2019. A dynamic K-means is employed to update patterns on a daily basis. Potentials drifts may have been detected thanks to the statistical distances of the treatment plants compared to the average characteristics of each of the groups.
Download

Paper Nr: 44
Title:

Robust 3D Point Cloud Registration Exploiting Unique LiDAR Scanning Pattern

Authors:

Ahmad K. Aijazi and Paul Checchin

Abstract: The task of 3D point cloud registration is fundamentally about aligning multiple scans or point clouds obtained from one or more LiDAR sensors to create a unified and accurate representation of the scanned scene. This process serves as the cornerstone for applications such as map building, autonomous navigation, land surveying and many others. While 3D registration techniques have made significant advancements, several persistent challenges continue to warrant research attention and innovation. Recently, non-repetitive scanning LiDAR sensors are emerging as a promising alternative for 3D data acquisition. In this paper, a novel 3D point cloud registration method is presented that exploits the unique scanning pattern of the sensor to register successive 3D scans. The sensor is first characterized and then, using the characteristic equation of the unique scanning pattern, a perfect scan is reconstructed at the target distance. The real scan is then compared with this reconstructed scan to extract objects in the scene. The displacements of these extracted objects in successive scans, with respect to the center of the unique scanning pattern, are compared in successive scans to determine the transformations that are then used to register the successive scans. The proposed method is evaluated on two real and different datasets and compared with other state-of-the-art registration methods. The results show that the method is comparable with other methods in terms of accuracy but surpasses them in performance in terms of processing time.
Download

Paper Nr: 46
Title:

Enhancing the Readability of Palimpsests Using Generative Image Inpainting

Authors:

Mahdi Jampour, Hussein Mohammed and Jost Gippert

Abstract: Palimpsests are manuscripts that have been scraped or washed for reuse, usually as another document. Recovering the undertext of these manuscripts can be of significant interest to scholars in the humanities. Multispectral imaging is a technique often used to make the undertext visible in palimpsests. Nevertheless, this approach is not sufficient in many cases, due to the fact that the undertext in resulting images is still covered by the overtext or other artefacts. Therefore, we propose defining this issue as an inpainting problem and enhancing the readability of the undertext using generative image inpainting. To this end, we introduce a novel method for generating synthetic multispectral palimpsest images and make the generated dataset publicly available. Furthermore, we utilise this dataset in the fine-tuning of a generative inpainting approach to enhance the readability of palimpsest undertext. The evaluation of our approach is provided for both the synthetic dataset and palimpsests from actual research in the humanities. The evaluation results indicate the effectiveness of our method in terms of both quantitative and qualitative measures.
Download

Paper Nr: 47
Title:

Efficient Use of Large Language Models for Analysis of Text Corpora

Authors:

David Adamczyk and Jan Hůla

Abstract: In this paper, we propose an efficient approach for tracking a given phenomenon in a corpus using natural language processing (NLP) methods. The topic of tracking phenomena in a corpus is important, especially in the fields of sociology, psychology, and economics, which study human behavior in society. Unlike existing approaches that rely on universal large language models (LLMs), which are computationally expensive, we focus on using computationally less expensive methods. These methods allow for high data processing speed while maintaining high accuracy. Our approach is inspired by the cascade approach to optimization, where we first roughly filter out unwanted information and then gradually use more accurate models, which are computationally more expensive. In this way, we are able to process large amounts of data with high accuracy using different models, while also reducing the overall cost of computations. To demonstrate the proposed method, we chose a task that consists of finding the frequency of occurrence of a certain phenomenon in a large text corpus, which is divided into individual months of the year. In practice, this means that we can, for example, use Internet discussions to find out how much people are discussing a particular topic. The entire solution is presented as a pipeline, which consists of individual phases that successively process text data using methods selected to minimize the overall cost of processing all data.
Download

Paper Nr: 48
Title:

Quantification of Matching Results for Autofluorescence Intensity Images and Histology Images

Authors:

Malihe Javidi, Qiang Wang and Marta Vallejo

Abstract: Fluorescence lifetime imaging microscopy utilises lifetime contrast to effectively discriminate between healthy and cancerous tissues. The co-registration of autofluorescence images with the gold standard, histology images, is essential for a thorough understanding and clinical diagnosis. As a preliminary step of co-registration, since histology images are whole-slide images covering the entire tissue, the histology patch corresponding to the autofluorescence image must be located using a template matching method. A significant difficulty in a template matching framework is distinguishing correct matching results from incorrect ones. This is extremely challenging due to the different nature of both images. To address this issue, we provide fully experimental results for quantifying template matching outcomes via a diverse set of metrics. Our research demonstrates that the Kullback Leibler divergence and misfit-percent are the most appropriate metrics for assessing the accuracy of our matching results. This finding is further supported by statistical analysis utilising the t-test.
Download

Paper Nr: 49
Title:

Information Retrieval Chatbot on Military Policies and Standards

Authors:

Charith Gunasekara, Alaa Sharafeldin, Matthew Triff, Zareen Kabir and Rohan Ben Joseph

Abstract: In the Canadian Armed Forces (CAF), navigating through extensive policies and standards can be a challenging task. To address the need for streamlined access to these vital documents, this paper explores the usage of artificial intelligence (AI) and natural language processing (NLP) to create a question-answering chatbot. This chatbot is specifically tailored to pinpoint and retrieve specific passages from policy documents in response to user queries. Our approach involved first developing a comprehensive and systematic data collection technique for parsing the multi-formatted policy and standard documents. Following this, we implemented an advanced NLP-based information retrieval system to provide the most relevant answers to users’ questions. Preliminary user evaluations showcased a promising accuracy rate of 88.46%. Even though this chatbot is designed to operate on military policy documents, it can be extended for similar use cases to automate information retrieval from long documents.
Download

Paper Nr: 50
Title:

Military Badge Detection and Classification Algorithm for Automatic Processing of Documents

Authors:

Charith Gunasekara, Yash Matharu and Rohan Ben Joseph

Abstract: This paper outlines a robust approach to automate the detection of military badges on official government documents utilizing YOLOv5 computer vision model. In an era where the rapid classification and management of sensitive documents is paramount, developing a system capable of accurately identifying and classifying distinct badge types plays a crucial role in supporting data management and security protocols. To address the challenges posed by the lack of accessible, real-world government and military documents for research, we introduced a novel method to simulate training data. We employ a technique that automates the data labelling process, facilitating the generation of a comprehensive and versatile dataset while eliminating the risk of compromising sensitive information. Through careful model training and hyper-parameter tuning, the YOLOv5 model demonstrated exemplary performance, successfully detecting a wide spectrum of badge types across various documents.
Download

Paper Nr: 59
Title:

Self-Supervised-Based Multimodal Fusion for Active Biometric Verification on Mobile Devices

Authors:

Youcef Ouadjer, Chiara Galdi, Sid-Ahmed Berrani, Mourad Adnane and Jean-Luc Dugelay

Abstract: This paper focuses on the fusion of multimodal data for an effective active biometric verification on mobile devices. Our proposed Multimodal Fusion (MMFusion) framework combines hand movement data and touch screen interactions. Unlike conventional approaches that rely on annotated unimodal data for deep neural network training, our method makes use of contrastive self-supervised learning in order to extract powerful feature representations and to deal with the lack of labeled training data. The fusion is performed at the feature level, by combining information from hand movement data (collected using background sensors like accelerometer, gyroscope and magnetometer) and touch screen logs. Following the self- supervised learning protocol, MMFusion is pre-trained to capture similarities between hand movement sensor data and touch screen logs, effectively attracting similar pairs and repelling dissimilar ones. Extensive evaluations demonstrate its high performance on user verification across diverse tasks compared to unimodal alternatives trained using the SimCLR framework. Moreover, experiments in semi-supervised scenarios reveal the superiority of MMFusion with the best trade-off between sensitivity and specificity.
Download

Paper Nr: 69
Title:

TrajViViT: A Trajectory Video Vision Transformer Network for Trajectory Forecasting

Authors:

Gauthier Rotsart de Hertaing, Dani Manjah and Benoit Macq

Abstract: Forecasting trajectory is a complex task relying on the accuracy of past positions, a correct model of the agent’s motion and an understanding of the social context, which are often challenging to acquire. Deep Neural Networks (DNNs), especially Transformer networks (TFs), have recently evolved as state-of-the-art tools in tackling these challenges. This paper presents TrajViViT (Trajectory Video Vision Transformer), a novel multimodal Transformer Network combining images of the scene and positional information. We show that such approach enhances the accuracy of trajectory forecasting and improves the network’s robustness against inconsistencies and noise in positional data. Our contributions are the design and comprehensive implementation of TrajViViT. A public Github repository will be provided.
Download

Paper Nr: 70
Title:

Semantic and Horizon-Based Feature Matching for Optimal Deep Visual Place Recognition in Waterborne Domains

Authors:

Luke Thomas, Matt Roach, Alma Rahat, Austin Capsey and Mike Edwards

Abstract: To tackle specific challenges of place recognition in the shoreline image domain, we develop a novel Deep Visual Place Recognition pipeline minimizing redundant feature extraction and maximizing salient feature extraction by exploiting the shoreline horizon. Optimizing for model performance and scalability, we present Semantic and Horizon-Based Matching for Visual Place Recognition (SHM-VPR). Our approach is motivated by the unique nature of waterborne imagery, namely the tendency for salient land features to make up a minority of the overall image, with the rest being disposable sea and sky regions. We initially attempt to exploit this via unsupervised region proposal, but we later propose a horizon-based approach that provides improved performance. We provide objective results on both a novel in-house shoreline dataset and the already established Symphony Lake dataset, with SHM-VPR providing state-of-the-art results on the former.
Download

Paper Nr: 95
Title:

Person Detection and Geolocation Estimation in UAV Aerial Images: An Experimental Approach

Authors:

Sasa Sambolek and Marina Ivasic-Kos

Abstract: The use of drones in SAR operations has become essential to assist in the search and rescue of a missing or injured person, as it reduces search time and costs, and increases the surveillance area and safety of the rescue team. Detecting people in aerial images is a demanding and tedious task for trained humans as well as for detection algorithms due to variations in pose, occlusion, scale, size, and location where a person may be in the image, as well as poor shooting conditions, poor visibility, blur due to movement and the like. In this paper, the YOLOv8 generic object detection model pre-trained on the COCO dataset is fine-tuned on the customized SARD dataset used to optimize the model for person detection on aerial images of mountainous landscapes, which are captured by drone. Different models of the YOLOv8 family algorithms fine-tuned on the SARD set were experimentally tested and it was shown that the YOLOv8x model achieves the highest mean average precision (mAP@0.5:0.95) of 63.8%, with an inference time of 4.6 ms which shows potential for real-time use in SARD operations. We have tested three geolocation algorithms in real conditions and proposed modification and recommendations for using in SAR missions for determining the geolocation of a person recorded by drone after automatic detection with the YOLOv8x model.
Download

Paper Nr: 105
Title:

Mobile Phone Identification from Recorded Speech Signals Using Non-Speech Segments and Universal Background Model Adaptation

Authors:

Dimitrios Kritsiolis and Constantine Kotropoulos

Abstract: Mobile phone identification from recorded speech signals is an audio forensic task that aims to establish the authenticity of a speech recording. The typical methodology to address this problem is to extract features from the entire signal, model the distribution of the features of each phone, and then perform classification on the testing data. Here, we demonstrate that extracting features from non-speech segments or extracting features from the entire recording and modeling them using a Universal Background Model (UBM) of speech improves classification accuracy. The paper’s contribution is in the disclosure of experimental results on two benchmark datasets, the MOBIPHONE and the CCNU Mobile datasets, demonstrating that non-speech features and UBM modeling yield higher classification accuracy even under noisy recording conditions and amplified speaker variability.
Download

Paper Nr: 113
Title:

Comparison of Dimension Reduction Methods for Multivariate Time Series Pattern Recognition

Authors:

Patrick Petersen, Hanno Stage, Philipp Reis, Jonas Rauch and Eric Sax

Abstract: Large volumes of time series data are frequently analyzed using unsupervised algorithms to identify patterns. Multivariate time series’s time and space complexity poses challenges in this context. Dimensionality reduction, a common technique in data science, provides a viable solution to improve time and space complexity. Nevertheless, a crucial question arises concerning how the time advantage compares to the information loss. This paper compares dimension reduction methods within unsupervised time series pattern recognition, including rule-based, spectral, probabilistic, and unsupervised learning-based approaches. The comparison involves both synthetic and real-world datasets for a comprehensive evaluation. The findings reveal the potential to accelerate pattern recognition algorithms by 90 %, with only 18 % information loss in the sense of the F1 score.
Download

Paper Nr: 115
Title:

Homomorphic Encryption Friendly Multi-GAT for Information Extraction in Business Documents

Authors:

Djedjiga Belhadj, Yolande Belaïd and Abdel Belaïd

Abstract: This paper presents a homomorphic encryption (HE) system to extract information from business documents. We propose a structured method to replace the nonlinear activation functions of a multi-layer graph attention network (Multi-GAT), including ReLU, LeakyReLU, and the attention mechanism Softmax, with polynomials of different degrees. We also replace the normalization layers with an adapted HE algorithm. To solve the problem of accuracy loss during the approximation, we use a partially HE baseline model to train a fully HE model using techniques such as distillation knowledge and model fine-tuning. The proposed HE-friendly Multi-GAT models the document as a graph of words and uses the multi-head attention mechanism to classify the graph nodes. The first partially HE-Multi-GAT contains polynomial approximations of all ReLU, LeakyReLU and the attention Softmax activation functions. Normalization layers are used to handle values exploding when approximating all the nonlinear activation functions. These layers are approximated as well using an adapted algorithm that doesn’t rely on the training data and minimizes performances loss while avoiding connections between the server and the data owner. Experiments show that our approach minimizes the model accuracy loss. We tested the architecture on three different datasets and obtained competitive results (F1-scores greater than 93%).
Download

Paper Nr: 116
Title:

Experimental Application of Semantic Segmentation Models Fine-Tuned with Synthesized Document Images to Text Line Segmentation in a Handwritten Japanese Historical Document

Authors:

Sayaka Mori and Tetsuya Suzuki

Abstract: Because it is difficult even for Japanese to read handwritten Japanese historical documents, computer-assisted transcription of such documents is helpful. We plan to apply semantic segmentation to text line segmentation for handwritten Japanese historical documents. We use both synthesized document images resembling a Japanese historical document and annotations for them because it is time-consuming to manually annotate a large set of document images for training data. The purpose of this research is to evaluate the effect of fine-tuning semantic segmentation models with synthesized Japanese historical document images in text line segmentation. The experimental results show that the segmentation results produced by our method are generally satisfactory for test data consisting of synthesized document images and are also satisfactory for Japanese historical document images with straightforward formats.
Download

Paper Nr: 118
Title:

An Algorithmic Approach for Quantitative Motion Artefact Grading in HRpQCT Medical Imaging

Authors:

Thomas A. Cox, Sasan Mahmoodi, Elizabeth M. Curtis, Nicholas R. Fuggle, Rebecca J. Moon, Kate A. Ward, Leo D. Westbury and Nicholas C. Harvey

Abstract: High Resolution Peripheral Quantitative Computed Tomography (HRpQCT) is a modern form of medical imaging that is used to extract detailed internal texture and structure information from non-invasive scans. This greater resolution means HRpQCT images are more vulnerable to motion artefact than other existing bone imaging methods. Current practice is for scan images to be manually reviewed and graded on a one to five scale for movement artefact, where analysis of scans with the most severe grades of movement artefact may not be possible. Various approaches to automatically detecting motion artefact in HRpQCT images have been described, but these typically rely on classifying scans based on the qualitative manual gradings instead of determining the amount of artefact. This paper describes research into quantitatively calculating the degree of motion affecting an HRpQCT. This is approached by analysing the jumps and shifts present in the raw projection data produced by the HRpQCT instrument scanner, rather than using the reconstructed cross-sectional images. The motivation and methods of this approach are described, and results are provided, along with comparisons to existing work.
Download

Paper Nr: 119
Title:

Sample Size Estimation of Transfer Learning for Colorectal Cancer Detection

Authors:

Ruihao Luo, Shuxia Guo and Thomas Bocklitz

Abstract: Nowadays, deep learning has been widely implemented into biomedical applications, but it is problematic to acquire large annotated medical datasets to train the models. As a technique for reusing knowledge obtained from one domain in another domain, transfer learning can be used with only small datasets. Despite of some current research about model transfer methods for medical images, it is still unclear how sample size influences the model performance. Therefore, this study focuses on the estimation of required sample size for a satisfactory performance, and also compares transfer methods with only 200 images randomly chosen from a colorectal cancer dataset. Firstly, based on a K-fold cross-validation, the balanced accuracies of 3 transfer learning networks (DenseNet121, InceptionV3 and MobileNetV2) were generated, and each network used 3 model transfer methods, respectively. Afterwards, by curve fitting with inverse power law, their learning curves were plotted. Furthermore, the estimation of required sample size as well as the prediction of final performance were calculated for each model. In addition, to investigate how many images are needed for curve fitting, the maximum number of images also changed from 200 to smaller numbers. As a result, it is shown that there is a trade-off between predicted final performance and estimated sample size, and suggested model transfer methods for large datasets do not automatically apply to small datasets. For small datasets, complicated networks are not recommended despite of high final performance, and simple transfer learning methods are more feasible for biomedical applications.
Download

Paper Nr: 123
Title:

Directional Filter for Tree Ring Detection

Authors:

Rémi Decelle, Phuc Ngo, Isabelle Debled-Rennesson, Frédéric Mothe and Fleur Longuetaud

Abstract: This paper presents an approach to automatically detect tree rings and subsequent measurement of annual ring widths in untreated cross-section images. This approach aims to offer additional insights on wood quality. It is composed of two parts. The first one is to detect tree rings by using directional filters together with an adaptive refining process allowing to extract the rings from radial information for different angles around the tree pith. The second step consists in building a confidence map by considering polar quad-tree decomposition which enables us to identify the relevant regions of image for conducting the tree ring width measurements.The method is evaluated on two public datasets, demonstrating good performance in both detection and measurement. The source code is available at https://gitlab.com/Ryukhaan/treetrace/-/tree/master/treerings.
Download

Paper Nr: 138
Title:

PatchSVD: A Non-Uniform SVD-Based Image Compression Algorithm

Authors:

Zahra Golpayegani and Nizar Bouguila

Abstract: Storing data is particularly a challenge when dealing with image data which often involves large file sizes due to the high resolution and complexity of images. Efficient image compression algorithms are crucial to better manage data storage costs. In this paper, we propose a novel region-based lossy image compression technique, called PatchSVD, based on the Singular Value Decomposition (SVD) algorithm. We show through experiments that PatchSVD outperforms SVD-based image compression with respect to three popular image compression metrics. Moreover, we compare PatchSVD compression artifacts with those of Joint Photographic Experts Group (JPEG) and SVD-based image compression and illustrate some cases where PatchSVD compression artifacts are preferable compared to JPEG and SVD artifacts.
Download

Paper Nr: 146
Title:

An Evaluation of General-Purpose Optical Character Recognizers and Digit Detectors for Race Bib Number Recognition

Authors:

Modesto Castrillón-Santana, David Freire-Obregón, Daniel Hernández-Sosa, Oliverio J. Santana, Francisco Ortega-Zamorano, José Isern-González and Javier Lorenzo-Navarro

Abstract: Bib numbers are used in mass competitions to identify participants, especially in long-distance races where runners commonly wear tags to verify that they pass mandatory checkpoints. In this paper, we delve deeper into the use of existing computer vision techniques for recognizing the digits present in bib numbers. Our analysis of bib recognition involves evaluating OCRs (Optical Character Recognition) techniques and a YOLOv7 digit detector on two public datasets: RBNR and TGCRBNW. The results reveal that the former scenario is solvable, while the latter presents extremely in-the-wild challenges. However, the findings suggest that more than relying solely on RBN for runner identification, other appearance-based cues, e.g., clothing and accessories, may be required due to various circumstances, such as occlusion or incomplete bib recognition. In any case, all those cues do not necessarily imply that the same person is wearing the RBN across the competition track, as they are not biometric traits.
Download

Paper Nr: 147
Title:

Relevant Multi Domain Features Selection Based on Mutual Information for Heart Sound Classification

Authors:

Rima Touahria, Abdenour Hacine-Gharbi, Philippe Ravier and Messaoud Mostefai

Abstract: Many classification systems of the heart sound signals use a combination of features from different domains. In a former reference paper, 324 multidomain features were used for classifying segmented phonocardiogram signals. However, the large feature dimension requires high memory space, high calculus and probably reduces the classification accuracy caused by the curse of dimensionality. In the present work, we propose to reduce the dimensionality of features vectors by selecting the relevant features using six heuristic strategies of feature selection based on mutual information maximisation criterion. In order to validate the selected subset of features, a k-NN model based-classifier was used and evaluated on the PhysioNet/Computing in Cardiology Challenge2016 dataset using the same features sets described in the reference paper. The results demonstrate that the Joint Mutual Information (JMI) selection strategy increases the classification rate from 85. 57% to 89.28% and simultaneously reduces dimension from 324 to 46. Furthermore, this work demonstrates that systolic segment features are the most relevant for murmur/normal classification. It also demonstrates the capability of feature selection algorithms to emphasize specific key areas in signals, which is helpful for aided diagnostic systems and fundamental research.
Download

Paper Nr: 152
Title:

Enhancing Railway Safety: An Unsupervised Approach for Detecting Missing Bolts with Deep Learning and 3D Imaging

Authors:

Udith Krishnan Vadakkum Vadukkal, Angelo Cardellicchio, Nicola Mosca, Maria di Summa, Massimiliano Nitti, Ettore Stella and Vito Renò

Abstract: This paper delves into the realm of quality control within railway infrastructure, specifically addressing the critical issue of missing bolts. Leveraging 3D imaging and deep learning, the study compares two approaches: a binary classification method and an anomaly detection task. The results underscore the efficacy of the anomaly detection approach, showcasing its ability to identify missing bolts robustly. Utilizing a dataset of 3D images acquired from a diagnostic train, treated as depth maps, the paper formulates the problem as an unsupervised learning task, training and evaluating autoencoders for anomaly detection. This research contributes to advancing quality control processes by applying deep learning in critical infrastructure monitoring.
Download

Paper Nr: 154
Title:

Surface EMG Signal Segmentation and Classification for Parkinson’s Disease Based on HMM Modelling

Authors:

Hichem Bengacemi, Abdenour H. Gharbi, Philippe Ravier, Karim Abed-Meraim and Olivier Buttelli

Abstract: To increase the diagnostic accuracy, the techniques of artificial intelligence can be used as a medical support. The Electromyography (EMG) signals are used in the neuromuscular dysfunction evaluation. This paper proposes a new frame work for segmenting and classifying the surface EMG (sEMG)signals by segmenting the EMG signal in regions of muscle activity (ACN) and non activity (NAN) for control group (healthy) and the muscle activity (ACP) and non activity (NAP) for Parkinsonian group. This paper proposes an automatic system of the neuromuscular dysfunction identification for Parkinson disease diagnosis based on HMM modeling by using on sEMG signals. Discrete Wavelet Transform (DWT), LP coefficients and FLP coefficients have been used for feature extraction. The results have been evaluated on ECOTECH project database using the signal classification rate ( CRS) and the Accuracy (Acc) criterion. The obtained results show highest performance by using HMM models of 2 states associated with GMM of 6 Gaussians, combined with Log Wavelet decomposition based Energy(LWE) descriptor based on Coiflet wavelet mother with decomposition level of 4. The proposed methodology leads to a classification accuracy of leads to an Acc of 99.37 % and a CRS of 100 %.
Download

Paper Nr: 158
Title:

Enhancing Surgical Visualization: Feasibility Study on GAN-Based Image Generation for Post Operative Cleft Palate Images

Authors:

Daniel A. Atputharuban, Christoph Theopold and Aonghus Lawlor

Abstract: Cleft Lip/Palate (CL/P) is a prevalent maxillofacial congenital anomaly arising from the failure of fusion in the frontonasal and maxillary processes. Currently, no internationally agreed gold standard procedures for cleft lip repair exists, and surgical approaches are frequently selected based on the surgeon’s past experiences and the specific characteristics of individual patient cases. The Asher-McDade score, a widely employed tool in assessing unilateral cleft lip surgeries, relies on criteria related to aesthetics and symmetry of maxillofacial region. However, no objective metric has been developed for assessing surgical success. This study aims to incorporate deep learning and Generative Adversarial Network (GAN) methods to construct an image generation framework to produce post-operative lip images that can serve as a standardized reference for assessing surgical success. We introduce an image similarity score based on the image embeddings which we use to validate the generated images. Our method paves the way to a set of techniques for the generation of synthetic faces which can guide surgeons in assessing the outcomes of CL/P surgery.
Download

Paper Nr: 15
Title:

Small Patterns Detection in Historical Digitised Manuscripts Using Very Few Annotated Examples

Authors:

Hussein Mohammed and Mahdi Jampour

Abstract: Historical manuscripts can be challenging for computer vision tasks such as writer identification, style classification and layout analysis due to the degradation of the artefacts themselves and the poor quality of digitization, thereby limiting the scope of analysis. However, recent advances in machine learning have shown promising results in enabling the analysis of vast amounts of data from digitised manuscripts. Nevertheless, the task of detecting patterns in these manuscripts is further complicated by the lack of annotations and the small size of many patterns, which can be smaller than 0.1% of the image size. In this study, we propose to explore the possibility of detecting small patterns in digitised manuscripts using only a few annotated examples. We also propose three detection datasets featuring three types of patterns commonly found in manuscripts: words, seals, and drawings. Furthermore, we employed two state-of-the-art deep learning models on these novel datasets: the FASTER ResNet and the EfficientDet, along with our general approach for standard evaluations as a baseline for these datasets.
Download

Paper Nr: 19
Title:

Offline Text-Independent Arabic and Chinese Writer Identification Using a Multi-Segmentation Codebook-Based Strategy

Authors:

Mohamed N. Abdi and Maher Khemakhem

Abstract: Many approaches rely on segmentation for offline text-independent writer identification. Segmentation schemes based on contours, junctions and projections are widely used and are very effective with Latin alphabet handwriting. However, these schemes seem to be less consistent in capturing writer individuality with Arabic and Chinese. As writing systems, the latter languages are morphologically different and are considered more complex than Latin alphabet languages. In this paper, four different segmentation techniques are tested for the identification of Arabic and Chinese writers. Then, these techniques are combined to increase the accuracy of identification. Experiments were realized on handwriting samples by 300 writers from Arabic IFN/ENIT dataset and 300 writers from Chinese HIT-MW dataset. An additional 300 writers from English/German CVL dataset were used as a control group. Taken separately, these segmentation techniques that gave good results with CVL (Top1% = 99.00%) were not as conclusive with IFN/ENIT and HIT-MW. Nevertheless, the use of different types of segmentation in combination proved to be highly efficient for Arabic and Chinese with Top1% = 96.33% and Top1% = 91.33%, respectively.
Download

Paper Nr: 24
Title:

Impact of Using GAN Generated Synthetic Data for the Classification of Chemical Foam in Low Data Availability Environments

Authors:

Toon Stuyck and Eric Demeester

Abstract: One of the main challenges of using machine learning in the chemical sector is a lack of qualitative labeled data. Data of certain events can be extremely rare, or very costly to generate, e.g. an anomaly during a production process. Even if data is available it often requires highly educated observers to correctly annotate the data. The performance of supervised classification algorithms can be drastically reduced when confronted with limited amounts of training data. Data augmentation is typically used in order to increase the amount of available training data but the risk exists of overfitting or loss of information. In recent years Generative Adversarial Networks have been able to generate realistically looking synthetic data, even on small amounts of training data. In this paper the feasibility of utilizing Generative Adversarial Network generated synthetic data to improve classification results will be demonstrated via a comparison with and without standard augmentation methods such as scaling, rotation,... . In this paper a methodology is proposed on how to combine original data and synthetic data to achieve the best classifier result and to quantitatively verify generalization of the classifier using an explainable AI method. The proposed methodology compares favourably to using no or standard augmentation methods in the case of classification of chemical foam.
Download

Paper Nr: 25
Title:

FaceVision-GAN: A 3D Model Face Reconstruction Method from a Single Image Using GANs

Authors:

Danilo Avola, Luigi Cinque, Gian L. Foresti and Marco R. Marini

Abstract: Generative algorithms have been very successful in recent years. This phenomenon derives from the strong computational power that even consumer computers can provide. Moreover, a huge amount of data is available today for feeding deep learning algorithms. In this context, human 3D face mesh reconstruction is becoming an important but challenging topic in computer vision and computer graphics. It could be exploited in different application areas, from security to avatarization. This paper provides a 3D face reconstruction pipeline based on Generative Adversarial Networks (GANs). It can generate high-quality depth and correspondence maps from 2D images, which are exploited for producing a 3D model of the subject’s face.
Download

Paper Nr: 28
Title:

Benchmarking a Wide Range of Unsupervised Learning Methods for Detecting Anomaly in Blast Furnace

Authors:

Kendai Itakura, Dukka Bahadur and Hiroto Saigo

Abstract: Steel plays important roles in our daily lives, as it surrounds us in the form of various products. Blast furnace, one of the main facility in steel production process, is traditionally monitored by skilled workers to prevent incidents. However, there is a growing demand to automate the monitoring process by leveraging machine learning. This paper focuses on investigating the suitability of unsupervised learning methods for detecting anomalies in blast furnaces. Extensive benchmarking is conducted using a dataset collected from blast furnaces, encompassing a wide range of unsupervised learning methods, including both traditional approaches and recent deep learning-based techniques. The computational experiments yield results that suggest the effectiveness of traditional methods over deep learning-based methods. To validate this observation, additional experiments are performed on publicly available non time series datasets and complex time series datasets. These experiments serve to confirm the superiority of traditional methods in handling non time series datasets, while deep learning methods exhibit better performance in dealing with complex time series datasets. We have also discovered that dimensionality reduction before anomaly detection is beneficial in eliminating outliers and effectively modeling the normal data points in the blast furnace dataset.
Download

Paper Nr: 34
Title:

Leveraging VR and Force-Haptic Feedback for an Effective Training with Robots

Authors:

Panagiotis Katranitsiotis, Panagiotis Zaparas, Konstantinos Stavridis and Petros Daras

Abstract: The utilization of robots for numerous tasks is what defines automation in the industrial sector in the era we are going through for multiple fields, including insect farming. As an outcome of this progression, human-robot collaboration is becoming increasingly prevalent. Industrial workers must receive adequate training in order to guarantee optimal operational efficiency and reduce potential risks connected with the use of high-value machinery like robots given the precise and delicate handling requirements of these machines. Accordingly, we propose a framework that integrates Virtual Reality (VR) technologies with force and haptic feedback equipment. This framework aims to simulate real-world scenarios and human-robot collaboration tasks, with the goal of familiarizing users with the aforementioned technologies, overcoming risks that may arise, and enhancing the effectiveness of their training. The proposed framework was designed in regard to insect farming automation domain with the objective of facilitating human-robot collaboration for workers in this field. An experiment was designed and conducted to measure the efficiency and the impact of the proposed framework by analyzing the questionnaires given to participants to extract valuable insights.
Download

Paper Nr: 38
Title:

Fetal Health Classification Using One-Dimensional Convolutional Neural Network

Authors:

Anton J. Röscher and Dustin van der Haar

Abstract: Within the medical field, machine learning has the potential to allow doctors and medical professionals to make faster, more accurate diagnoses, empowering specialists to take immediate action. Early diagnosis and prevention of fetal health conditions can be achieved based on the biomarker data derived from the cardiotocography signals. The study proposes using a one-dimensional convolutional neural network for fetal health classification and compares it to conventional machine learning algorithms. A one-dimensional convolutional neural network is shown to outperform traditional machine learning algorithms in both data sets (CTU-CHB and UCI), with an accuracy of 89% - 94%.
Download

Paper Nr: 57
Title:

Impute Water Temperature in the Swiss River Network Using LSTMs

Authors:

Benjamin Fankhauser, Vidushi Bigler and Kaspar Riesen

Abstract: Switzerland is home to the sources of major European rivers. As the thermal regime of rivers is crucial for the environment, the Federal Office for the Environment has been collecting discharge and water temperature data at 81 river water stations for several decades. However, despite diligent collection 30% of the water temperature data is missing due to various reasons. These missing data are problematic in many ways – for instance, in predicting water temperatures based on different models. To tackle this problem, we propose to use LSTMs for water temperature imputing. In particular, we introduce three different scenarios – depending on the available input data – to impute possible data gaps. Then, we propose several methods for each scenario. For our empirical evaluation, we engineer a novel dataset (with ground truth) by artificially introducing gaps of sizes 2, 10, 30 and 60 days in the middle of 90-day sequences. A rather simple interpolation baseline achieves a competitive RMSE on gaps of two days. For larger gaps, however, this simple method clearly fails, and the novel, far more sophisticated models significantly outperform both interpolation and the current state of the art in this application.
Download

Paper Nr: 63
Title:

Detecting Manuscript Annotations in Historical Print: Negative Evidence and Evaluation Metrics

Authors:

Jacob Murel and David Smith

Abstract: Early readers’ manuscript annotations in books have been analyzed by bibliographers for evidence about book history and reading practice. Since handwritten annotations are not uniformly distributed across or within books, however, even the compilers of censuses of all copies of a single edition have very seldom produced systematic information about these interventions in the lives of books. This paper analyzes the use of object detection models (ODMs) for detecting handwritten annotations on the pages of printed books. While computer vision developers have dealt widely with imbalanced datasets, none have addressed the effect of negative sample images on model accuracy. We therefore investigate the use of negative evidence—pages with no annotations—in training accurate models for this task. We also consider how different evaluation metrics are appropriate for different modes of bibliographic research. Finally, we create a labeled training dataset of handwritten annotations in early printed books and release it for evaluation purposes.
Download

Paper Nr: 78
Title:

Linux Configuration Tuning: Is Having a Large Dataset Enough?

Authors:

Hifza Khalid, Peter Portante and Alva Couch

Abstract: While it would seem that enough data can solve any problem, data quality determines the appropriateness of data to solve specific problems. We intended to use a large dataset of performance data for the Linux operating system to suggest optimal tuning for network applications. We conducted a series of experiments to select hardware and Linux configuration options that are significant to network performance. Our results showed that network performance was mainly a function of workload and hardware. Investigating these results showed that our dataset did not contain enough diversity in configuration settings to infer the best tuning and was only useful for making hardware recommendations. Others with similar problems can use our tests to save time in concluding that a particular dataset is not suitable for machine learning.
Download

Paper Nr: 86
Title:

Speech Recognition for Indigenous Language Using Self-Supervised Learning and Natural Language Processing

Authors:

Satoshi Tamura, Tomohiro Hattori, Yusuke Kato and Naoki Noguchi

Abstract: This paper proposes a new concept to build a speech recognition system for an indigenous under-resourced language, by using another speech recognizer for a major language as well as neural machine translation and text autoencoder. Developing the recognizer for minor languages suffers from the lack of training speech data. Our method uses natural language processing techniques and text data, to compensate the lack of speech data. We focus on the model based on self-supervised learning, and utilize its sub-module as a feature extractor. We develop the recognizer sub-module for indigenous languages by making translation and autoencoder models. We conduct evaluation experiments for every systems and our paradigm. It is consequently found that our scheme can build the recognizer successfully, and improve the performance compared to the past works.
Download

Paper Nr: 107
Title:

Double Trouble? Impact and Detection of Duplicates in Face Image Datasets

Authors:

Torsten Schlett, Christian Rathgeb, Juan Tapia and Christoph Busch

Abstract: Various face image datasets intended for facial biometrics research were created via web-scraping, i.e. the collection of images publicly available on the internet. This work presents an approach to detect both exactly and nearly identical face image duplicates, using file and image hashes. The approach is extended through the use of face image preprocessing. Additional steps based on face recognition and face image quality assessment models reduce false positives, and facilitate the deduplication of the face images both for intra- and inter-subject duplicate sets. The presented approach is applied to five datasets, namely LFW, TinyFace, Adience, CASIA-WebFace, and C-MS-Celeb (a cleaned MS-Celeb-1M variant). Duplicates are detected within every dataset, with hundreds to hundreds of thousands of duplicates for all except LFW. Face recognition and quality assessment experiments indicate a minor impact on the results through the duplicate removal. The final deduplication data is made available at https://github.com/dasec/dataset-duplicates.
Download

Paper Nr: 124
Title:

Towards Small Anomaly Detection

Authors:

Thomas Messerer

Abstract: In this position paper, we describe the design of a camera-based FOD (Foreign Object Debris) detection system intended for use in the parking position at the airport. FOD detection, especially the detection of small objects, requires a great deal of human attention. The transfer of ML (machine learning) from the laboratory to the field calls for adjustments, especially in testing the model. Automated detection requires not only high detection performance and low false alarm rate, but also good generalization to unknown objects. There is not much data available for this use case, so in addition to ML methods, the creation of training and test data is also considered.
Download

Paper Nr: 129
Title:

Discrimination of Signals from Large Covariance Matrix for Pattern Recognition

Authors:

Masaaki Ida

Abstract: Pattern recognition applications and methods are important areas in modern data science. One of the conventional issues for the analysis is the selection of important signal eigenvalues from many eigenvalues dominated by randomness. However, appropriate theoretical reason for selection criteria is not indicated. In this paper, investigating eigenvalue distribution of large covariance matrix for data matrix, comprehensive discrimination method of signal eigenvalues from the bulk of eigenvalues due to randomness is investigated. Applying the discrimination method to weight matrix of three-layered neural network, the method is examined by handwritten character recognition example.
Download

Paper Nr: 134
Title:

Predicting the MGMT Promoter Methylation Status in T2-FLAIR Magnetic Resonance Imaging Scans Using Machine Learning

Authors:

Martyna Kurbiel, Agata M. Wijata and Jakub Nalepa

Abstract: Glioblastoma is the most common form of brain cancer in adults, and is characterized by one of the worst prognosis, with median survival being less than one year. Magnetic resonance imaging (MRI) plays a key role in detecting and objectively tracking the disease by extracting quantifiable parameters of the tumor, such as its volume or bidimensional measurements. However, it has been shown that the presence a specific genetic sequence in a lesion, being the DNA repair enzyme O6 -methylguanine-DNA methyltransferase (MGMT) promoter methylation, may be effectively used to predict the patient’s responsiveness to chemotherapy. The invasive process of analyzing a tissue sample to verify the MGMT promoter methylation status is time-consuming, and may require performing multiple surgical interventions in longitudinal studies. Thus, building non-invasive techniques of predicting the genetic subtype of glioblastoma is of utmost practical importance to not only accelerate the overall process of determining the MGMT promoter methylation status in glioblastoma patients, but also to minimize the number of necessary surgeries. In this paper, we tackle this problem and propose an end-to-end machine learning classification pipeline benefitting from radiomic features extracted from brain MRI scans, and validate it over a well-established RSNA-MICCAI Brain Tumor Radiogenomic Classification benchmark dataset.
Download

Paper Nr: 136
Title:

Performance Evaluation of the Electrical Appliances Identification System Using the PLAID Database in Independent Mode of House

Authors:

Fateh Ghazali, Abdenour Hacine-Gharbi, Khaled Rouabah and Philippe Ravier

Abstract: In Electrical Appliances Identification (EAI) system, Plug Load Appliance Identification Dataset (PLAID) is largely used to develop and benchmark new methods proposed for demand management in electricity networks, more particularly, automated control, non-intrusive load planning and monitoring. Particularly, this database contains electrical signals of 11 appliance electrical appliances, recorded in several houses. In state-of-the-art, the EAI systems have used this latest PLAID designed, in two parts (one for training and the other for testing). These parts can be organized on house-dependent mode or house-independent mode. In the first mode, the signals of each appliance class and house in the testing part have examples in the training part. In opposition, in the second mode, the houses in testing part have not any example in training part. In this paper, we propose a comparative study between the performance of house-dependent EAI system and those of house independent mode system. In addition, in order to more validate the results of the comparison study, we propose the use of other classifiers like Gaussian Mixture Model (GMM), Linear Discriminant Analysis (LDA) and Artificial Neural Network (ANN). The obtained results, based on the use of PLAID, have demonstrated that the performances of this system, in independent mode, are relatively low compared to those obtained in dependent mode. This shows that the house’s electrical installation has a good footprint in the input current signal.
Download

Paper Nr: 144
Title:

A Novel Keystroke Dataset for Preventing Advanced Persistent Threats

Authors:

Xiaofei Wang, Rashik Shadman, Daqing Hou, Faraz Hussain and Stephanie Schuckers

Abstract: Computer system security is indispensable in today’s world due to the large amount of sensitive data stored in such systems. Moreover, user authentication is integral to ensuring computer system security. In this paper, we investigate the potential of a novel keystroke dynamics-based authentication approach for preventing Advanced Persistent Threats (APT) and detecting APT actors. APT is an extended and planned cyber-attack in which the intruder logs into a system many times over a long period of time to gain administrative access and to steal sensitive data or disrupt the system. Since keystroke dynamics can be made to work whenever an APT actor is typing on the keyboard, we hypothesize that it naturally be a good match for APT detection. Furthermore, keystroke dynamics promises to be non-intrusive and cost-effective as no additional hardware is required other than the keyboard. In this work, we created a novel dataset consisting of keystroke timings of Unix/Linux IT system administration commands. We evaluated the authentication performance of our novel dataset on three algorithms, i.e., the Scaled Manhattan distance, and the so-called new distance metric (Zhong et al., 2012) with/without fusion. We compared our result with that of the state-of-the-art CMU dataset. The best 95% confidence interval of EER for our Linux Command dataset was (0.038, 0.044) which was very close to that of the CMU dataset (0.027, 0.031) despite the small size of our dataset.
Download

Paper Nr: 145
Title:

Intrusion Detection at Railway Tunnel Entrances Using Dynamic Vision Sensors

Authors:

Colin Gebler and Regina Pohle-Fröhlich

Abstract: The surveillance of railway tunnel entrances is integral to ensure the security of both people and infrastructure. Since 24/7 personal surveillance is not economically possible, it falls to automated solutions to ensure that no persons can intrude unseen. We investigate the use of Dynamic Vision Sensors in fulfilling this task. A Dynamic Vision Sensor differs from a traditional frame-based camera in that it does not record entire images at a fixed rate. Instead, each pixel outputs events independently and asynchronously whenever a change in brightness occurs at that location. We present a dataset recorded over three months at a railway tunnel entrance, with relevant examples assigned labeled as featuring or not featuring intrusions. Furthermore, we investigate intrusion detection by using neural networks to perform image classification on images generated from the event stream using established methods to represent the temporal information in that format. Of the models tested, MobileNetV2 achieved the best result with a classification accuracy of 99 .55% on our dataset when differentiating between Event Volumes that do or do not contain people.
Download