ICPRAM 2023 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 11
Title:

Multi-task Fusion for Efficient Panoptic-Part Segmentation

Authors:

Sravan K. Jagadeesh, René Schuster and Didier Stricker

Abstract: In this paper, we introduce a novel network that generates semantic, instance, and part segmentation using a shared encoder and effectively fuses them to achieve panoptic-part segmentation. Unifying these three segmentation problems allows for mutually improved and consistent representation learning. To fuse the predictions of all three heads efficiently, we introduce a parameter-free joint fusion module that dynamically balances the logits and fuses them to create panoptic-part segmentation. Our method is evaluated on the Cityscapes Panoptic Parts (CPP) and Pascal Panoptic Parts (PPP) datasets. For CPP, the PartPQ of our proposed model with joint fusion surpasses the previous state-of-the-art by 1.6 and 4.7 percentage points for all areas and segments with parts, respectively. On PPP, our joint fusion outperforms a model using the previous top-down merging strategy by 3.3 percentage points in PartPQ and 10.5 percentage points in PartPQ for partitionable classes.
Download

Paper Nr: 13
Title:

"Why Here and not There?": Diverse Contrasting Explanations of Dimensionality Reduction

Authors:

André Artelt, Alexander Schulz and Barbara Hammer

Abstract: Dimensionality reduction is a popular preprocessing and a widely used tool in data mining. Transparency, which is usually achieved by means of explanations, is nowadays a widely accepted and crucial requirement of machine learning based systems like classifiers and recommender systems. However, transparency of dimensionality reduction and other data mining tools have not been considered much yet, still it is crucial to understand their behavior – in particular practitioners might want to understand why a specific sample got mapped to a specific location. In order to (locally) understand the behavior of a given dimensionality reduction method, we introduce the abstract concept of contrasting explanations for dimensionality reduction, and apply a realization of this concept to the specific application of explaining two dimensional data visualization.
Download

Paper Nr: 19
Title:

Rainfuzz: Reinforcement-Learning Driven Heat-Maps for Boosting Coverage-Guided Fuzzing

Authors:

Lorenzo Binosi, Luca Rullo, Mario Polino, Michele Carminati and Stefano Zanero

Abstract: Fuzzing is a dynamic analysis technique that repeatedly executes the target program with many different inputs to trigger abnormal behavior, such as a crash. One of the most successful techniques consists in generating inputs to increase code-coverage by using a mutational approach: this type of fuzzers maintains a population of inputs, they perform mutations on the inputs in the current population, and they add mutated inputs to the population if they discover new code-coverage in the target program. Researchers are continuously looking for techniques to increment the efficiency of fuzzers; one of these techniques consists in generating heat-maps for targeting specific bytes during the mutation of the input, as not all bytes might be useful for controlling the program's workflow. We propose the first approach in the literature that uses reinforcement learning for building heat-maps, by formalizing the problem of choosing the position to be mutated within the input as a reinforcement-learning problem. We model the policy by means of a neural network, and we train it by using Proximal Policy Optimization (PPO). We implement our approach in Rainfuzz, and we show the effectiveness of its heat-maps by comparing Rainfuzz against an equivalent fuzzer that performs mutations at random positions. We achieve the best performance by running AFL++ and Rainfuzz in parallel (in a collaborative fuzzing setting), outperforming a setting where we run two AFL++ instances in parallel.
Download

Paper Nr: 28
Title:

Hard Spatio-Multi Temporal Attention Framework for Driver Monitoring at Nighttime

Authors:

Karam Abdullah, Imen Jegham, Mohamed A. Mahjoub and Anouar Ben Khalifa

Abstract: Driver distraction and inattention is recently reported to be the major factor in traffic crashes even with the appearance of various advanced driver assistance systems. In fact, driver monitoring is a challenging vision-based task due to the high number of issues present including the dynamic and cluttered background and high in-vehicle actions similarities. This task becomes more and more complex at nighttime because of the low illumination. In this paper, to efficiently recognize driver actions at nighttime, we unprecedentedly propose a hard spatio-multi-temporal attention network that exclusively focuses on dynamic spatial information of the driving scene and more specifically driver motion, then using a batch split unit only relevant temporal information is considered in the classification. Experiments prove that our proposed approach achieves high recognition accuracy compared to state-of-the art-methods on the unique realistic available dataset 3MDAD.
Download

Paper Nr: 31
Title:

Dealing with Overfitting in the Context of Liveness Detection Using FeatherNets with RGB Images

Authors:

Miguel Leão and Nuno Gonçalves

Abstract: With the increased use of machine learning for liveness detection solutions comes some shortcomings like overfitting, where the model adapts perfectly to the training set, becoming unusable when used with the testing set, defeating the purpose of machine learning. This paper proposes how to approach overfitting without altering the model used by focusing on the input and output information of the model. The input approach focuses on the information obtained from the different modalities present in the datasets used, as well as how varied the information of these datasets is, not only in number of spoof types but as the ambient conditions when the videos were captured. The output approaches were focused on both the loss function, which has an effect on the actual ”learning”, used on the model which is calculated from the model’s output and is then propagated backwards, and the interpretation of said output to define what predictions are considered as bonafide or spoof. Throughout this work, we were able to reduce the overfitting effect with a difference between the best epoch and the average of the last fifty epochs from 36.57% to 3.63%.
Download

Paper Nr: 36
Title:

Deep Learning for Diagonal Earlobe Crease Detection

Authors:

Sara L. Almonacid-Uribe, Oliverio J. Santana, Daniel Hernández-Sosa and David Freire-Obregón

Abstract: An article published on Medical News Today in June 2022 presented a fundamental question in its title: Can an earlobe crease predict heart attacks? The author explained that end arteries supply the heart and ears. In other words, if they lose blood supply, no other arteries can take over, resulting in tissue damage. Consequently, some earlobes have a diagonal crease, line, or deep fold that resembles a wrinkle. In this paper, we take a step toward detecting this specific marker, commonly known as DELC or Frank’s Sign. For this reason, we have made the first DELC dataset available to the public. In addition, we have investigated the performance of numerous cutting-edge backbones on annotated photos. Experimentally, we demonstrate that it is possible to solve this challenge by combining pre-trained encoders with a customized classifier to achieve 97.7% accuracy. Moreover, we have analyzed the backbone trade-off between performance and size, estimating MobileNet as the most promising encoder.
Download

Paper Nr: 43
Title:

DNN Pruning and Its Effects on Robustness

Authors:

Sven Mantowksy, Firas Mualla, Saqib S. Bukhari and Georg Schneider

Abstract: The popularity of deep neural networks (DNNs) and their application on embedded systems and edge devices is increasing rapidly. Most embedded systems are limited in their computational capabilities and memory space. To meet these restrictions, the DNNs need to be compressed while keeping their accuracy, for instance, by pruning the least important neurons or filters. However, the pruning may introduce other effects on the model, such as influencing the robustness of its predictions. To analyze the impact of pruning on the model robustness, we employ two metrics: heatmap based correlation coefficient (HCC) and expected calibration error (ECE). Using the HCC, on one hand it is possible to gain insight to which extent a model and its compressed version tend to use the same input features. On the other hand, using the difference in the ECE between a model and its compressed version, we can analyze the side effect of pruning on the model’s decision reliability. The experiments were conducted for image classification and object detection problems. For both types of issues, our results show that some off-the-shelf pruning methods considerably improve the model calibration without being specifically designed for this purpose. For instance, the ECE of a VGG16 classifier is improved by 35% after being compressed by 50% using the H-Rank pruning method with a negligible loss in accuracy. Larger compression ratios reduce the accuracy as expected but may improve the calibration drastically (e.g. ECE is reduced by 77% under a compression ratio of 70%). Moreover, the HCC measures feature saliency under model compression and tends to correlate as expected positively with the model’s accuracy. The proposed metrics can be employed for comparing pruning methods from another perspective than the commonly considered trade-off between the accuracy and compression ratio.
Download

Paper Nr: 55
Title:

Subcaterpillar Isomorphism Between Caterpillars: Subtree Isomorphism Restricted Text and Pattern Trees to Caterpillars

Authors:

Tomoya Miyazaki and Kouich Hirata

Abstract: In this paper, as a pattern matching for rooted labeled caterpillars (caterpillars, for short), we discuss a subcaterpillar isomorphism between caterpillars whether or not a pattern caterpillar is a subcaterpillar of a text caterpillar. Then, we design the algorithms to solve it by simplifying the algorithms for subcaterpillar isomorphism (between a caterpillar and a tree) when a pattern caterpillar is a subcaterpillar of a text tree designed by Miyazaki and Hirata (2022). These algorithms run in O(hHσ) time and O(h) space, where h is the height of a pattern caterpillar, H is the height of a text caterpillar and σ is the number of labels in the caterpillars. Finally, we give experimental results of computing these algorithms by comparing with subcaterpillar isomorphism and caterpillar inclusion.
Download

Paper Nr: 69
Title:

Unsupervised Tree Detection and Counting via Region-Based Circle Fitting

Authors:

Smaragda Markaki and Costas Panagiotakis

Abstract: Automatic tree detection and counting is a very important task for many areas such as environmental protection, agricultural planning, crop yield estimation and monitoring of replanted forest areas. This paper presents an unsupervised method for tree detection from high resolution UAV imagery based on a modified version of the Decremental Ellipse Fitting Algorithm DEFA. The proposed Decremental Circle Fitting Algorithm (DCFA) works similarly to DEFA with the main difference that DCFA uses circles instead of ellipses. According to DCFA, the skeleton of the 2D shape is calculated first, followed by the initialization of the circle hypotheses and the application of the Gaussian Mixture Model Expectation Maximization algorithm. Finally, model evaluation is performed based on the Akaike Information Criterion. The DCFA method was tested on the Acacia-6 dataset, which depicts six months acacia trees, collected with Unmanned Aerial Vehicles in Southeast Asia and it exhibits high performance compared with the state-of-the art unsupervised and supervised methods.
Download

Paper Nr: 72
Title:

Gradient Clipping in Deep Learning: A Dynamical Systems Perspective

Authors:

Arunselvan Ramaswamy

Abstract: Neural networks are ubiquitous components of Machine Learning (ML) algorithms. However, training them is challenging due to problems associated with exploding and vanishing loss-gradients. Gradient clipping is shown to effectively combat both the vanishing gradients and the exploding gradients problems. As the name suggests, gradients are clipped in order to prevent large updates. At the same time, very small neural network weights are updated using larger step-sizes. Although widely used in practice, there is very little theory surrounding clipping. In this paper, we analyze two popular gradient clipping techniques – the classic norm-based gradient clipping method and the adaptive gradient clipping technique. We prove that gradient clipping ensures numerical stability with very high probability. Further, clipping based stochastic gradient descent converges to a set of neural network weights that minimizes the average scaled training loss in a local sense. The averaging is with respect to the distribution that generated the training data. The scaling is a consequence of gradient clipping. We use tools from the theory of dynamical systems for the presented analysis.
Download

Paper Nr: 75
Title:

Distance Transform in Parallel Logarithmic Complexity

Authors:

Majid Banaeyan and Walter G. Kropatsch

Abstract: Nowadays a huge amount of digital data are generated every moment in a broad spectrum of application domains such as biomedical imaging, document processing, geosciences, remote sensing, video surveillance, etc. Processing such big data requires an efficient data structure, encouraging the algorithms with lower complexity and parallel operations. In this paper, first, a new method for computing the distance transform (DT) as the fundamental operation in binary images is presented. The method computes the DT with the parallel logarithmic complexity O(log(n)) where n is the maximum diameter of the largest foreground region in the 2D binary image. Second, we define the DT in the combinatorial map (CM) structure. In the CM, by replacing each edge with two darts a smoother DT with the double resolution is derived. Moreover, we compute n different distances for the nD-map. Both methods use the hierarchical irregular pyramid structure and have the advantage of preserving topological information between regions. The operations of the proposed algorithms are totally local and lead to parallel implementations. The GPU implementation of the algorithm has high performance while the bottleneck is the bandwidth of the memory or equivalently the number of available independent processing elements. Finally, the logarithmic complexity of the algorithm speeds up the execution and suits it, particularly for large images.
Download

Paper Nr: 94
Title:

The Effects of Character-Level Data Augmentation on Style-Based Dating of Historical Manuscripts

Authors:

Lisa Koopmans, Maruf A. Dhali and Lambert Schomaker

Abstract: Identifying the production dates of historical manuscripts is one of the main goals for paleographers when studying ancient documents. Automatized methods can provide paleographers with objective tools to estimate dates more accurately. Previously, statistical features have been used to date digitized historical manuscripts based on the hypothesis that handwriting styles change over periods. However, the sparse availability of such documents poses a challenge in obtaining robust systems. Hence, the research of this article explores the influence of data augmentation on the dating of historical manuscripts. Linear Support Vector Machines were trained with k-fold cross-validation on textural and grapheme-based features extracted from historical manuscripts of different collections, including the Medieval Paleographical Scale, early Aramaic manuscripts, and the Dead Sea Scrolls. Results show that training models with augmented data improve the performance of historical manuscripts dating by 1% - 3% in cumulative scores. Additionally, this indicates further enhancement possibilities by considering models specific to the features and the documents’ scripts.
Download

Paper Nr: 102
Title:

Evaluation of Induced Expert Knowledge in Causal Structure Learning by NOTEARS

Authors:

Jawad Chowdhury, Rezaur Rashid and Gabriel Terejanu

Abstract: Causal modeling provides us with powerful counterfactual reasoning and interventional mechanism to generate predictions and reason under various what-if scenarios. However, causal discovery using observation data remains a nontrivial task due to unobserved confounding factors, finite sampling, and changes in the data distribution. These can lead to spurious cause-effect relationships. To mitigate these challenges in practice, researchers augment causal learning with known causal relations. The goal of the paper is to study the impact of expert knowledge on causal relations in the form of additional constraints used in the formulation of the nonparametric NOTEARS. We provide a comprehensive set of comparative analyses of biasing the model using different types of knowledge. We found that (i) knowledge that correct the mistakes of the NOTEARS model can lead to statistically significant improvements, (ii) constraints on active edges have a larger positive impact on causal discovery than inactive edges, and surprisingly, (iii) the induced knowledge does not correct on average more incorrect active and/or inactive edges than expected. We also demonstrate the behavior of the model and the effectiveness of domain knowledge on a real-world dataset.
Download

Paper Nr: 103
Title:

Image Inpainting Network Based on Deep Fusion of Texture and Structure

Authors:

Huilai Liang, Xichong Ling and Siyu Xia

Abstract: With the recent development of deep learning technique, automatic image inpainting has gained wider applications in computer vision and has also become a challenging topic in image processing. Recent methods typically make use of texture features in the images to make the results more realistic. However this can lead to artifacts in the processed images, one of the reasons for this is that the structural features in the image are ignored. To address this problem, we propose an image inpainting method based on deep fusion of texture and structure. Specifically, we design a dual-pyramid encoder-decoder network for preliminary fusion of texture and structure. A layer-by-layer fusion network of texture and structure is applied to further strengthen the fusion of texture and structure feature afterwards. In order to strengthen the consistency of texture and structure, we construct a multi-gated feature merging network to achieve a more realistic inpainting effect. Experiments are conducted on the CelebA and Place2 datasets. Qualitative and quantitative comparison demonstrate that our model outperforms state-of-the-art models.
Download

Paper Nr: 136
Title:

Towards Improved Indoor Location with Unmodified RFID Systems

Authors:

Rui Santos, Ricardo Alexandre, Pedro Marques, Mário Antunes, João P. Barraca, João Silva and Nuno Ferreira

Abstract: The management of health systems has been one of the main challenges in several European countries, especially where the aging population is increasing. This led to the adoption of smarter technologies as a means to automate the processes within hospitals. One of the technologies adopted is active location solutions, which allows the staff within the hospital to quickly find any sort of entity, from key persons to equipment. In this work, we focus on developing a reliable method for active location based on RSSI antennas, passive tags, and ML models. Since the tags are passive, the usage of RSSI is discouraged, since it does not vary sufficiently based on our experiments. We explored the usage of alternative features, such as the number of activations per tag within a time slot. Throughout our evaluation, we were able to reach an average error of 0.275 m which is similar to existing RSSI IPS.
Download

Paper Nr: 142
Title:

On the Hardness and Necessity of Supervised Concept Drift Detection

Authors:

Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf and Barbara Hammer

Abstract: The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models can become inaccurate and need adjustment. Many technologies for learning with drift rely on the interleaved test-train error to detect drift and trigger model updates. This type of drift detection is also used for monitoring systems aiming to detect anomalies. In this work, we analyze the relationship between concept drift and change of loss on a theoretical level. We focus on the sensitivity, specificity, and localization of change points in drift detection, putting an emphasize on the detection of real concept drift. With this focus, we compare the supervised and unsupervised setups which are already studied in the literature. We show that, unlike the unsupervised case, there is no universal supervised drift detector and that the assumed correlation between model loss and concept drift is invalid. We support our theoretical findings with empirical evidence for a combination of different models and data sets. We find that many state-of-the-art supervised drift detection methods suffer from insufficient sensitivity and specificity, and that unsupervised drift detection methods are a promising addition to existing supervised approaches.
Download

Paper Nr: 145
Title:

Noise Robustness of Data-Driven Star Classification

Authors:

Floyd Hepburn-Dickins and Michael Edwards

Abstract: Celestial navigation has fallen into the background in light of newer technologies such as global positioning systems, but research into its core component, star pattern recognition, has remained an active area of study. We examine these methods and the viability of a data-driven approach to detecting and recognising stars within images taken from the Earth’s surface. We show that synthetic datasets, necessary due to a lack of labelled real image datasets, are able to appropriately simulate the night sky from a terrestrial perspective and that such an implementation can successfully perform star patter recognition in this domain. In this work we apply three kinds of noise in a parametric fashion; positional noise, false star noise, and dropped star noise. Results show that a pattern mining approach can accurately identify stars from night sky images and our results show the impact of the above noise types on classifier performance.
Download

Paper Nr: 158
Title:

Generalized Torsion-Curvature Scale Space Descriptor for 3-Dimensional Curves

Authors:

Lynda Ayachi, Majdi Jribi and Faouzi Ghorbel

Abstract: In this paper, we propose a new method for representing 3D curves called the Generalized Torsion Curvature Scale Space (GTCSS) descriptor. This method is based on the calculation of curvature and torsion measures at different scales, and it is invariant under rigid transformations. To address the challenges associated with estimating these measures, we employ a multi-scale technique in our approach. We evaluate the effectiveness of our method through experiments, where we extract space curves from 3D objects and apply our method to pose estimation tasks. Our results demonstrate the effectiveness of the GTCSS descriptor for representing 3D curves and its potential for use in a variety of computer vision applications.
Download

Short Papers
Paper Nr: 3
Title:

MorDeephy: Face Morphing Detection via Fused Classification

Authors:

Iurii Medvedev, Farhad Shadmand and Nuno Gonçalves

Abstract: Face morphing attack detection (MAD) is one of the most challenging tasks in the field of face recognition nowadays. In this work, we introduce a novel deep learning strategy for a single image face morphing detection, which implies the discrimination of morphed face images along with a sophisticated face recognition task in a complex classification scheme. It is directed onto learning the deep facial features, which carry information about the authenticity of these features. Our work also introduces several additional contributions: the public and easy-to-use face morphing detection benchmark and the results of our wild datasets filtering strategy. Our method, which we call MorDeephy, achieved the state of the art performance and demonstrated a prominent ability for generalizing the task of morphing detection to unseen scenarios.
Download

Paper Nr: 8
Title:

Debiasing Sentence Embedders Through Contrastive Word Pairs

Authors:

Philip Kenneweg, Sarah Schröder, Alexander Schulz and Barbara Hammer

Abstract: Over the last years, various sentence embedders have been an integral part in the success of current machine learning approaches to Natural Language Processing (NLP). Unfortunately, multiple sources have shown that the bias, inherent in the datasets upon which these embedding methods are trained, is learned by them. A variety of different approaches to remove biases in embeddings exists in the literature. Most of these approaches are applicable to word embeddings and in fewer cases to sentence embeddings. It is problematic that most debiasing approaches are directly transferred from word embeddings, therefore these approaches fail to take into account the nonlinear nature of sentence embedders and the embeddings they produce. It has been shown in literature that bias information is still present if sentence embeddings are debiased using such methods. In this contribution, we explore an approach to remove linear and nonlinear bias information for NLP solutions, without impacting downstream performance. We compare our approach to common debiasing methods on classical bias metrics and on bias metrics which take nonlinear information into account.
Download

Paper Nr: 18
Title:

From Xception to NEXcepTion: New Design Decisions and Neural Architecture Search

Authors:

Hadar Shavit, Filip Jatelnicki, Pol Mor-Puigventós and Wojtek Kowalczyk

Abstract: In this paper, we present a modified Xception architecture, the NEXcepTion network. Our network has significantly better performance than the original Xception, achieving top-1 accuracy of 81.5% on the ImageNet validation dataset (an improvement of 2.5%) as well as a 28% higher throughput. Another variant of our model, NEXcepTion-TP, reaches 81.8% top-1 accuracy, similar to ConvNeXt (82.1%), while having a 27% higher throughput. Our model is the result of applying improved training procedures and new design decisions combined with an application of Neural Architecture Search (NAS) on a smaller dataset. These findings call for revisiting older architectures and reassessing their potential when combined with the latest enhancements. Our code is available at https://github.com/hadarshavit/NEXcepTion.
Download

Paper Nr: 23
Title:

Explainable Outlier Detection Using Feature Ranking for k-Nearest Neighbors, Gaussian Mixture Model and Autoencoders

Authors:

Lucas Krenmayr and Markus Goldstein

Abstract: Outlier detection is the process of detecting individual data points that deviate markedly from the majority of the data. Typical applications include intrusion detection and fraud detection. In comparison to the well-known classification tasks in machine learning, commonly unsupervised learning techniques with unlabeled data are used in outlier detection. Recent algorithms mainly focus on detecting the outliers, but do not provide any insights what caused the outlierness. Therefore, this paper presents two model-dependent approaches to provide explainability in multivariate outlier detection using feature ranking. The approaches are based on the k-nearest neighbors and Gaussian Mixture Model algorithm. In addition, these approaches are compared to an existing method based on an autoencoder neural network. For a qualitative evaluation and to illustrate the strengths and weaknesses of each method, they are applied to one synthetically generated and two real-world data sets. The results show that all methods can identify the most relevant features in synthetic and real-world data. It is also found that the explainability depends on the model being used: The Gaussian Mixture Model shows its strength in explaining outliers caused by not following feature correlations. The k-nearest neighbors and autoencoder approaches are more general and suitable for data that does not follow a Gaussian distribution.
Download

Paper Nr: 24
Title:

Topology-Preserving Reductions on (18,12) Pictures of the Face-Centered Cubic Grid

Authors:

Gábor Karai, Péter Kardos and Kálmán Palágyi

Abstract: Reductions transform binary pictures only by changing some black points to white ones. Topology preservation is a major concern of thinning algorithms that are composed of reductions. For (18,12) binary pictures on the 3D face-centered cubic (FCC) grid, we propose four sufficient conditions for topology-preserving parallel reductions that can change a set of black points simultaneously. The first two conditions examine some configurations of changed points, and they provide methods of verifying that formerly constructed parallel reductions preserve the topology. The further two conditions focus on individual points, directly provide deletion rules of topology-preserving parallel reductions, and make us possible to establish topologically correct parallel thinning algorithms.
Download

Paper Nr: 26
Title:

Severity of Catastrophic Forgetting in Object Detection for Autonomous Driving

Authors:

Christian Witte, René Schuster, Syed S. Bukhari, Patrick Trampert, Didier Stricker and Georg Schneider

Abstract: Incorporating unseen data in pre-trained neural networks remains a challenging endeavor, as complete retraining is often impracticable. Yet, training the networks sequentially on data with different distributions can lead to performance degradation for previously learned data, known as catastrophic forgetting. The sequential training paradigm and the mitigation of catastrophic forgetting are subject to Continual Learning (CL). The phenomenon of forgetting poses a challenge for applications with changing distributions and prediction objectives, including Autonomous Driving (AD). Our work aims to illustrate the severity of catastrophic forgetting for object detection for class- and domain-incremental learning. We propose four hypotheses, as we investigate the impact of the ordering of sequential increments and the underlying data distribution of AD datasets. Further, the influence of different object detection architectures is examined. The results of our empirical study highlight the major effects of forgetting for class-incremental learning. Moreover, we show that domain-incremental learning suffers less from forgetting but is highly dependent on the design of the experiments and choice of architecture.
Download

Paper Nr: 27
Title:

Co-Incrementation: Combining Co-Training and Incremental Learning for Subject-Specific Facial Expression Recognition

Authors:

Jordan Gonzalez, Thibault Geoffroy, Aurelia Deshayes and Lionel Prevost

Abstract: In this work, we propose to adapt a generic emotion recognizer to a set of individuals in order to improve its accuracy. As this adaptation is weakly supervised, we propose a hybrid framework, the so-called co-incremental learning that combines semi-supervised co-training and incremental learning. The classifier we use is a specific random forest whose internal nodes are nearest class mean classifiers. It has the ability to learn incrementally data covariate shift. We use it in a co-training process by combining multiple view of the data to handle unlabeled data and iteratively learn the model. We performed several personalization and provided a comparative study between these models and their influence on the co-incrementation process. Finally, an in-depth study of the behavior of the models before, during and after the co-incrementation process was carried out. The results, presented on a benchmark dataset, show this hybrid process increases the robustness of the model, with only a few labeled data.
Download

Paper Nr: 48
Title:

Visual Question Answering Analysis: Datasets, Methods, and Image Featurization Techniques

Authors:

Vijay Kumari, Abhimanyu Sethi, Yashvardhan Sharma and Lavika Goel

Abstract: Holistic scene understanding is a long-standing objective of core tenets of Artificial Intelligence (AI). Multimodal tasks that aim to synergize capabilities spanning multiple domains, such as visual-linguistic capabilities, into intelligent systems are thus a desideratum for the next step in AI. Visual Question Answering (VQA) systems that integrate Computer Vision and Natural Language Processing tasks into the task of answering natural language questions about an image represent one such domain. There is a need to explore Deep Learning techniques that can help to improve such systems beyond the language biases of real-world priors that presently hinder them from serving as a veritable touchstone for holistic scene understanding. Furthermore, the effectiveness of Transformer architecture for the image featurization pipeline of VQA systems remains untested. Hence, an exhaustive study on the performance of various model architectures with varied training conditions on VQA datasets like VizWiz and VQA v2 is imperative to further this area of research. This study explores architectures that utilize image and question co-attention for the task of VQA and several CNN architectures, including ResNet, VGG, EfficientNet, and DenseNet. Vision Transformer architecture is also explored for image featurization, and a myriad of loss functions such as cross-entropy, focal loss, and UniLoss are employed for training the models. Finally, the trained model is deployed using Flask, and a GUI for the same has been implemented that lets users input an image and accompanying questions about the image to generate an answer in response.
Download

Paper Nr: 50
Title:

Two-Step Graph Classification on the Basis of Hierarchical Graphs

Authors:

Anthony Gillioz and Kaspar Riesen

Abstract: A common method to solve the non-trivial task of classifying general graphs is to employ graph matching in conjunction with a distance- or similarity-based classifier. Unfortunately, optimal graph matching has a high computational complexity hindering its application on large graphs. In order to make matchings also feasible for larger graphs, it has been proposed to work on size-reduced graphs rather than on their original counterparts. In the present paper, we propose a novel method that is based on this idea to further reduce the processing time. In particular, we change the standard classification scheme into a two-step classification method. In the first step, we start with strongly reduced versions of the graphs – having a manageable amount of nodes – in order to prune as many graphs as possible. The second step – the actual classification – is then performed on the remaining graphs only (in their original size). We conduct experimental evaluations on five datasets to research the benefits and limitations of this novel two-step graph classification method. The main finding is that we can substantially speed up the graph matching while preserving satisfying classification accuracy.
Download

Paper Nr: 52
Title:

LiDAR and Camera Based 3D Object Classification in Unknown Environments Using Weakly Supervised Learning

Authors:

Siva R. Bairaju, Srinivas Yalagam and Krishna R. Konda

Abstract: Sensor redundancy is often relied upon the method in various applications to ensure robust and secure operation. Autonomous Driving (AD) and Advanced Driver Assistance Systems (ADAS) are no exceptions. camera and LiDAR are the principle sensors that are used in both applications. LiDAR is primarily used for object localization due to its active nature. A camera on the other hand is used for object classification owing to its dense response. In this paper, we present a novel neural network and training methodology for camera-based reinforcement of LiDAR object classification. The proposed method is also useful as a domain adaptation framework in an unknown environment. A pre-trained LiDAR-based object classification network is iteratively trained based on camera classification output to achieve continual improvement while in operation. The proposed system has been tested on benchmark datasets and performs well when compared with the state of the art.
Download

Paper Nr: 63
Title:

Sequential Spatial Transformer Networks for Salient Object Classification

Authors:

David Dembinsky, Fatemeh Azimi, Federico Raue, Jörn Hees, Sebastian Palacio and Andreas Dengel

Abstract: The standard classification architectures are designed and trained for obtaining impressive performance on dedicated image classification datasets, which usually contain images with a single object located at the image center. However, their accuracy drops when this assumption is violated, e.g., if the target object is cluttered with background noise or if it is not centered. In this paper, we study salient object classification: a more realistic scenario where there are multiple object instances in the scene, and we are interested in classifying the image based on the label corresponding to the most salient object. Inspired by previous works on Reinforcement Learning and Spatial Transformer Networks, we propose a model equipped with a trainable focus mechanism, which improves classification accuracy. Our experiments on the PASCAL VOC dataset show that the method is capable of increasing the intersection-ver-union of the salient object, which improves the classification accuracy by 1.82 pp overall, and 3.63 pp for smaller objects. We provide an analysis of the failing cases, discussing different aspects such as dataset bias and saliency definition on the classification output.
Download

Paper Nr: 66
Title:

Estimating Electric Vehicle Driving Range with Machine Learning

Authors:

David Albuquerque, Artur Ferreira and David Coutinho

Abstract: In the past years, we have witnessed an increase on the use of electric vehicles (EV), which are now widely accepted as reliable and eco-friendly means of transportation. When choosing an EV, usually one of the key parameters of choice for the consumer is its driving range (DR) capability. The DR depends on many factors that should be addressed when predicting its value. In some cases, the existing heuristic techniques for DR estimation provide values with large variation, which may cause driver anxiety. In this paper, we explore the use of machine learning (ML) techniques to estimate the DR. From publicly available data, we build a dataset with EV data suitable to estimate the DR. Then, we resort to regression techniques on models learned on the dataset, evaluated with standard metrics. The experimental results show that regression techniques perform adequate and smooth estimation of the DR value on both short and long trips, avoiding the need to use the previous heuristic techniques, thus minimizing the drivers anxiety and allowing better trip planning.
Download

Paper Nr: 81
Title:

When Simple Statistical Algorithms Outperform Deep Learning: A Case of Keystroke Dynamics

Authors:

Ahmed A. Wahab and Daqing Hou

Abstract: Keystroke dynamics has gained relevance over the years for its potential in solving practical problems like online fraud and account takeovers. Statistical algorithms such as distance measures have long been a common choice for keystroke authentication due to their simplicity and ease of implementation. However, deep learning has recently started to gain popularity due to their ability to achieve better performance. When should statistical algorithms be preferred over deep learning and vice-versa? To answer this question, we set up experiments to evaluate two state-of-the-art statistical algorithms: Scaled Manhattan and the Instance-based Tail Area Density (ITAD) metric, with a state-of-the-art deep learning model called TypeNet, on three datasets (one small and two large). Our results show that on the small dataset, statistical algorithms significantly outperform the deep learning approach (Equal Error Rate (EER) of 4.3% for Scaled Manhattan / 1.3% for ITAD versus 19.18% for TypeNet). However, on the two large datasets, the deep learning approach performs better (22.9% & 28.07% for Scaled Manhattan / 12.25% & 20.74% for ITAD versus 0.93% & 6.77% for TypeNet).
Download

Paper Nr: 82
Title:

Finger Region Estimation by Boundary Curve Modeling and Bezier Curve Learning

Authors:

Masakazu Fujio, Keiichiro Nakazaki, Naoto Miura, Yosuke Kaga and Kenta Takahashi

Abstract: This paper presents a shape-aware finger region segmentation method from hand images for user authentication. The recent development of encoder-decoder network-based deep learning technologies dramatically improved image segmentation accuracy. Although those methods predict the probability of belonging to each object pixel by pixel, it is impossible to consider whether the estimated region has a finger-like shape. We adopted a deep learning-based Bezier curve estimation method to realize shape-aware model training. We improved the accuracy with the case of warm color, complex background, and finger touching that would be difficult to estimate target regions using color-based heuristics or traditional pixel-by-pixel methods. We prepared ground truth data for each finger region (index finger, middle finger, ring finger, little finger), then trained both the conventional pixel-by-pixel estimation method and our Bezier curve estimation methods. Quantitative results showed that the proposed models outperform traditional methods (pixel-wise IOU 0.935) and practical speed.
Download

Paper Nr: 88
Title:

Synthetic Data for Object Classification in Industrial Applications

Authors:

August Baaz, Yonan Yonan, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez and Felix Nilsson

Abstract: One of the biggest challenges in machine learning is data collection. Training data is an important part since it determines how the model will behave. In object classification, capturing a large number of images per object and in different conditions is not always possible and can be very time-consuming and tedious. Accordingly, this work explores the creation of artificial images using a game engine to cope with limited data in the training dataset. We combine real and synthetic data to train the object classification engine, a strategy that has shown to be beneficial to increase confidence in the decisions made by the classifier, which is often critical in industrial setups. To combine real and synthetic data, we first train the classifier on a massive amount of synthetic data, and then we fine-tune it on real images. Another important result is that the amount of real images needed for fine-tuning is not very high, reaching top accuracy with just 12 or 24 images per class. This substantially reduces the requirements of capturing a great amount of real data.
Download

Paper Nr: 89
Title:

Visual Detection of Personal Protective Equipment and Safety Gear on Industry Workers

Authors:

Jonathan Karlsson, Fredrik Strand, Josef Bigun, Fernando Alonso-Fernandez, Kevin Hernandez-Diaz and Felix Nilsson

Abstract: Workplace injuries are common in today’s society due to a lack of adequately worn safety equipment. A system that only admits appropriately equipped personnel can be created to improve working conditions. The goal is thus to develop a system that will improve workers’ safety using a camera that will detect the usage of Personal Protective Equipment (PPE). To this end, we collected and labeled appropriate data from several public sources, which have been used to train and evaluate several models based on the popular YOLOv4 object detector. Our focus, driven by a collaborating industrial partner, is to implement our system into an entry control point where workers must present themselves to obtain access to a restricted area. Combined with facial identity recognition, the system would ensure that only authorized people wearing appropriate equipment are granted access. A novelty of this work is that we increase the number of classes to five objects (hardhat, safety vest, safety gloves, safety glasses, and hearing protection), whereas most existing works only focus on one or two classes, usually hardhats or vests. The AI model developed provides good detection accuracy at a distance of 3 and 5 meters in the collaborative environment where we aim at operating (mAP of 99/89%, respectively). The small size of some objects or the potential occlusion by body parts have been identified as potential factors that are detrimental to accuracy, which we have counteracted via data augmentation and cropping of the body before applying PPE detection.
Download

Paper Nr: 90
Title:

So Can We Use Intrinsic Bias Measures or Not?

Authors:

Sarah Schröder, Alexander Schulz, Philip Kenneweg and Barbara Hammer

Abstract: While text embeddings have become the state-of-the-art in many natural language processing applications, the presence of bias that such models often learn from training data can become a serious problem. As a reaction, a large variety of measures for detecting bias has been proposed. However, an extensive comparison between them does not exists so far. We aim to close this gap for the class of intrinsic bias measures in the context of pretrained language models and propose an experimental setup which allows a fair comparison by using a large set of templates for each bias measure. Our setup is based on the idea of simulating pretraining on a set of differently biased corpora, thereby obtaining a ground truth for the present bias. This allows us to evaluate in how far bias is detected by different measures and also enables to judge the robustness of bias scores.
Download

Paper Nr: 110
Title:

Modified kNN Classifier in the Output Vector Space for Robust Performance Against Adversarial Attack

Authors:

C. Lee, D. Seok, D. Shim and R. Park

Abstract: Although CNN-based classifiers have been successfully applied to many pattern classification problems, they suffer from adversarial attacks. Slightly modified images can be classified as completely different classes. It has been reported that CNN-based classifiers tend to construct decision boundaries close to training samples. In order to mitigate this problem, we applied modified kNN classifiers in the output vector space of CNN-based classifiers. Experimental results show that the proposed method noticeably reduced the classification error caused by adversarial attacks.
Download

Paper Nr: 114
Title:

Imposing Functional Priors on Bayesian Neural Networks

Authors:

Bogdan Kozyrskiy, Dimitrios Milios and Maurizio Filippone

Abstract: Specifying sensible priors for Bayesian neural networks (BNNs) is key to obtain state-of-the-art predictive performance while obtaining sound predictive uncertainties. However, this is generally difficult because of the complex way prior distributions induce distributions over the functions that BNNs can represent. Switching the focus from the prior over the weights to such functional priors allows for the reasoning on what meaningful prior information should be incorporated. We propose to enforce such meaningful functional priors through Gaussian processes (GPs), which we view as a form of implicit prior over the weights, and we employ scalable Markov chain Monte Carlo (MCMC) to obtain samples from an approximation to the posterior distribution over BNN weights. Unlike previous approaches, our proposal does not require the modification of the original BNN model, it does not require any expensive preliminary optimization, and it can use any inference techniques and any functional prior that can be expressed in closed form. We illustrate the effectiveness of our approach with an extensive experimental campaign.
Download

Paper Nr: 117
Title:

Leveraging Explainability with K-Fold Feature Selection

Authors:

Artur J. Ferreira and Mário T. Figueiredo

Abstract: Learning with high-dimensional (HD) data poses many challenges, since the large number of features often yields redundancy and irrelevance issues, which may decrease the performance of machine learning (ML) methods. Often, when learning with HD data, one resorts to feature selection (FS) approaches to avoid the curse of dimensionality. The use of FS may improve the results, but its use by itself does not lead to explainability, in the sense of identifying the small subset of core features that most influence the prediction of the ML model, which can still be seen as a black-box. In this paper, we propose k-fold feature selection (KFFS), which is a FS approach to shed some light into that black-box, by resorting to the k-fold data partition procedure and one generic unsupervised or supervised FS filter. KFFS finds small and decisive subsets of features for a classification task, at the expense of increased computation time. On HD data, KFFS finds small subsets of features, with dimensionality small enough to be analyzed by human experts (e.g, a medical doctor in a cancer detection problem). It also provides classification models with lower error rate and fewer features than those provided by the use of the individual supervised FS filter.
Download

Paper Nr: 119
Title:

Machine Fault Classification Using Hamiltonian Neural Networks

Authors:

Jeremy Shen, Jawad Chowdhury, Sourav Banerjee and Gabriel Terejanu

Abstract: A new approach is introduced to classify faults in rotating machinery based on the total energy signature estimated from sensor measurements. The overall goal is to go beyond using black-box models and incorporate additional physical constraints that govern the behavior of mechanical systems. Observational data is used to train Hamiltonian neural networks that describe the conserved energy of the system for normal and various abnormal regimes. The estimated total energy function, in the form of the weights of the Hamiltonian neural network, serves as the new feature vector to discriminate between the faults using off-the-shelf classification models. The experimental results are obtained using the MaFaulDa database, where the proposed model yields a promising area under the curve (AUC) of 0.78 for the binary classification (normal vs abnormal) and 0 .84 for the multi-class problem (normal, and 5 different abnormal regimes).
Download

Paper Nr: 125
Title:

MAGAN: A Meta-Analysis for Generative Adversarial Networks’ Latent Space

Authors:

Frederic Rizk, Rodrigue Rizk, Dominick Rizk and Chee-Hung H. Chu

Abstract: Generative Adversarial Networks (GANs) are an emerging class of deep neural networks that has sparked considerable interest in the field of unsupervised learning because of its exceptional data generation performance. Nevertheless, the GAN’s latent space that represents the core of these generative models has not been studied in depth in terms of its effect on the generated image space. In this paper, we propose and evaluate MAGAN, an algorithm for Meta-Analysis for GANs’ latent space. GAN-derived synthetic images are also evaluated in terms of their efficiency in complementing the data training, where the produced output is employed for data augmentation, mitigating the labeled data scarcity. The results suggest that GANs may be used as a parameter-controlled data generator for data-driven augmentation. The quantitative findings show that MAGAN can correctly trace the relationship between the arithmetic adjustments in the latent space and their effects on the output in the image space. We empirically determine the parameter ε for each class such that the latent space is insensitive to a shift of ε×σ from the mean vector, where σ is the standard deviation of a particular class.
Download

Paper Nr: 129
Title:

Towards Pattern Recognition with Network Science and Natural Language Processing for Information Retrieval

Authors:

Muskan Garg, Mukesh Kumar and Debabrata Samanta

Abstract: A surge in text-based information retrieval such as topic detection and tracking has increasingly shown growth from static to dynamism in the last decade. We posit the need of investigating an interdisciplinary approach of network science and natural language processing for graph-based information extraction. Post-lockdown era, it makes sense to consider Graph of Words (GoW) evolved from user-generated text from social media platforms amid increase in the internet traffic. The idea is to unfold the latent patterns in graph-based text representation with limited resource availability resulting in effective models, in comparison of computationally expensive pre-trained models, limited to a certain type of information extraction. As a solution towards advancing statistical approach for language independent models, we plot three different information retrieval applications: (i) Structural analysis: find unique patterns in domain/ language/ genre-specific GoW for keyword extraction, (ii) Language independence: design objective function for language-independent information retrieval, (iii) Dynamism: mathematical modeling for concept-drift and evolving trends/ events in dynamic GoW evolved from streaming data. We associate recent developments and open challenges with our position as potential research direction.
Download

Paper Nr: 130
Title:

Using a 1D Pose-Descriptor on the Finger-Level to Reduce the Dimensions in Hand Posture Estimation

Authors:

Amin Dadgar and Guido Brunnett

Abstract: We claim there is a simple measure to characterize all postures of every finger in human hands, each with a single and unique value. To that, we illustrate the sum of distances of fingers’ (movable) joints/nodes (or of the finger’s tip) to a locally fixed reference point on that hand (e.g., wrist joint) equals a unique value for each finger’s posture. We support our hypothesis by presenting numerical justification based on the kinematic skeleton of a human hand for four fingers and by providing evidence on two virtual hand models (which closely resemble the structure of human hands) for thumbs. The employment of this descriptor reduces the dimensionality of the finger’s space from 16 to 5 (e.g., one degree of freedom for each finger). To demonstrate the advantages of employing this measure for finger pose estimation, we utilize it as a temporal a-priori in the analysis-by-synthesis framework to constrain the posture space in searching and estimating the optimum pose of fingers more efficiently. In a set of experiments, we show the benefits of employing this descriptor in time complexity, latency, and accuracy of the pose estimation of our virtual hand.
Download

Paper Nr: 131
Title:

Patterns in Pupillary Diameter Variation While Reading Portuguese Language Texts

Authors:

João M. Romera, Rafael N. Orsi and Carlos E. Thomaz

Abstract: This work investigates patterns of pupil diameter variation during text reading based on the effects of Meares-Irlen Syndrome (MIS) using eye tracking information to estimate the mental workload required. The results show that there is an increase in the mental workload at times when the texts presented had greater intensity of visual distortion and that it is possible to linearly classify the data by multivariate statistical techniques, disclosing experimentally the implicit difficulty in such reading context.
Download

Paper Nr: 139
Title:

Arabic Handwriting off-Line Recognition Using ConvLSTM-CTC

Authors:

Takwa A. Gader, Issam Chibani and Afef K. Echi

Abstract: This work is released in the field of automatic document recognition, specifically offline Arabic handwritten recognition. Arabic writing is cursive and recognized as quite complex compared to handwritten Latin script: dependence on context, difficulties with segmentation, a large number of words, variations in the style of the writing, inter- and intra-word overlap, etc. Few works exist concerning recognizing Arabic manuscripts without constraint, which motivates us to move towards this type of document based on an approach based on deep learning. It is one of the machine-learning approaches reputed to be effective for classification problems. It is about conceiving and implementing an end-to-end system: a convolutional long-short-term memory (ConvLSTM ). It consists of a recurrent neural network for spatiotemporal prediction with convolutional structures that allow feature extraction. A connectionist temporal classification output layer processes the returned result. Our model is trained and tested using the IFN/ENIT database. We were able to achieve a recognition rate of 99.01%.
Download

Paper Nr: 146
Title:

Invertible Neural Network-Based Video Compression

Authors:

Zahra Montajabi, Vahid K. Ghassab and Nizar Bouguila

Abstract: Due to the recent advent of high-resolution mobile and camera devices, it is necessary to develop an optimal solution for saving the new video content instead of traditional compression methods. Recently, video compression received enormous attention among computer vision problems in media technologies. Using state-of-the-art video compression methods, videos can be transmitted in a better quality requiring less band-width and memory. The advent of neural network-based video compression methods remarkably promoted video coding performance. In this paper, an Invertible Neural Network (INN) is utilized to reduce the information loss problem. Unlike the classic auto-encoders which lose some information during encoding, INN can preserve more information and therefore, reconstruct videos with more clear details. Moreover, they don’t increase the complexity of the network compared to traditional auto-encoders. The proposed method is evaluated on a public dataset and the experimental results show that the proposed method outperforms existing standard video encoding schemes such as H.264 and H.265 in terms of peak signal-to-noise ratio (PSNR), video multimethod assessment fusion (VMAF), and structural similarity index measure (SSIM).
Download

Paper Nr: 150
Title:

SynFine: Boosting Image Segmentation Accuracy Through Synthetic Data Generation and Surgical Fine-Tuning

Authors:

Mehdi Mounsif, Yassine Motie, Mohamed Benabdelkrim and Florent Brondolo

Abstract: Carbon Capture and Storage (CCS) has increasingly been suggested as one of the many ways to reduce CO2 concentration in the atmosphere, hence tackling climate change and its consequences. As CCS involves robust modelling of physico-chemical mechanisms in geological formations, it benefits from CT-scans and accurate segmentation of rock core samples. Nevertheless, identifying precisely the components of a rock formation can prove challenging and could benefit from modern segmentation approaches, such as U-Net. In this context, this work introduces SynFine, a framework that relies on synthetic data generation and surgical fine-tuning to boost the performance of a model on a target data distribution with a limited number of examples. Specifically, after a pre-training phase on a source dataset, the SynFine approach identifies and fine-tunes the most responsive layers regarding the distribution shift. Our experiments show that, beyond an advantageous final performance, SynFine enables a strong reduction of the number of real-world labelled pairs for a given level of performance.
Download

Paper Nr: 154
Title:

TeTIm-Eval: A Novel Curated Evaluation Data Set for Comparing Text-to-Image Models

Authors:

Federico A. Galatolo, Mario A. Cimino and Edoardo Cogotti

Abstract: Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public.
Download

Paper Nr: 155
Title:

Agnostic eXplainable Artificial Intelligence (XAI) Method Based on Volterra Series

Authors:

Jhonatan Contreras and Thomas Bocklitz

Abstract: Convolutional Neural Networks (CNN) have shown remarkable results in several fields in recent years. Traditional performance metrics assess model performance but fail to detect biases in datasets and models. Explainable artificial intelligence (XAI) methods aim to evaluate models, identify biases, and clarify model decisions. We propose an agnostic XAI method based on the Volterra series that approximates models. Our model architecture is composed of three second-order Volterra layers. Relevant information can be extracted from the model to be approximated and used to generate relevance maps that explain the contribution of the input elements to the prediction. Our Volterra-XAI learns its Volterra kernels comprehensively and is trained using a target model outcome. Therefore, no labels are required, and even when training data is unavailable, it is still possible to generate an approximation utilizing similar data. The trustworthiness of our method can be measured by considering the reliability of the Volterra approximation in comparison with the original model. We evaluate our XAI method for the classification task on 1D Raman spectra and 2D images using two common CNN architectures without hyperparameter tuning. We present relevance maps indicating higher and lower contributions to the approximation prediction (logit).
Download

Paper Nr: 9
Title:

OCR Text Sorting Based on Multimodal and Graph Neural Network for Rich Text Image

Authors:

Zhiyuan Zhao, Shan Huang and Yong Liu

Abstract: With the growth of data, more and more image information flow into the audit and supervision system. When facing the image with rich text information, the supervision system first uses OCR detection and recognition algorithm to extract the text information in the image. The extracted text information is generally sorted according to the simple top, bottom, left and right positions, but when facing multi layout or discrete rich text images. This sorting method according to the simple logical order of position often leads to the confusion of reading order between texts, resulting in the loss of information. In order to solve this problem, we propose an image text ranking model based on visual features, text features and spatial features. The three features are extracted, fused, processed by graph neural network, and then the relationship between text regions is judged. The experimental results show that the performance of our model is better than other baselines and has a better performance in the supervision and audit system.

Paper Nr: 16
Title:

Random Quasi Intersections with Applications to Instant Machine Learning

Authors:

Alexei Mikhailov and Mikhail Karavay

Abstract: Random quasi intersections method was introduced. The number of such intersections grows exponentially with the increasing amount of pattern features, so that a non-polynomial problem in some machine learning applications emerges. However, the paper experimentally shows that randomness allows finding solutions to some visual machine learning tasks using a random quasi intersection-based fast procedure delivering 100% accuracy. Also, the paper discusses implementation of instant learning, which is, unlike deep learning, a non-iterative procedure. The inspiration comes from search methods and neuroscience. After decades of computing only one method was found able to deal efficiently with big data, - this is indexing, which is at the heart of both Google-search and large-scale DNA processing. On the other hand, it is known from neuroscience that the brain memorizes combinations of sensory inputs and interprets them as patterns. The paper discusses how to best index the combinations of pattern features, so that both encoding and decoding of patterns is robust and efficient.
Download

Paper Nr: 22
Title:

A Comparative Analysis of Classifier Performance for Epileptic Seizure Detection Using EEG Signals

Authors:

Nida Alyas, David P. Hastings and Abbas Mehrabidavoodabadi

Abstract: In middle and low-income countries, epilepsy remains undiagnosed in many instances because of an insufficient number of medical specialists and expensive EEG recording devices. In previous studies, many machine learning (ML) based methods were proposed to investigate and classify the EEG signals. However, little work has been performed with EEG data recorded with consumer-grade devices. The extraction of the most discriminating set of features and high misclassification rate is another challenge. To address these problems, this study empirically investigates several data segment sizes and chooses the optimal window size to segment the Guinea-Bissau dataset. Several statistical and spectral feature extraction methods were investigated to obtain useful sets of features from segmented epochs in combination with conventional ML algorithms and ensemble methods. The proposed framework is then implemented on a comparable dataset collected from Nigeria to validate the reliability of the framework. A comparative analysis is performed with conventional ML models and with existing techniques to prove the effectiveness of the proposed methodology. The obtained results demonstrate that XGBoost and LightGBM achieved the highest levels of performance in terms of F1 score and AUC.
Download

Paper Nr: 49
Title:

Automatic Subjective Answer Evaluation

Authors:

Vijay Kumari, Prachi Godbole and Yashvardhan Sharma

Abstract: The evaluation of answer scripts is vital for assessing a student’s performance. The manual evaluation of the answers can sometimes be biased. The assessment depends on various factors, including the evaluator’s mental state, their relationship with the student, and their level of expertise in the subject matter. These factors make evaluating descriptive answers a very tedious and time-consuming task. Automatic scoring approaches can be utilized to simplify the evaluation process. This paper presents an automated answer script evaluation model that intends to reduce the need for human intervention, minimize bias brought on by evaluator psychological changes, save time, maintain track of evaluations, and simplify extraction. The proposed method can automatically weigh the assessing element and produce results nearly identical to an instructor’s. We compared the model’s grades to the grades of the teacher, as well as the results of several keyword matching and similarity check techniques, in order to evaluate the developed model.
Download

Paper Nr: 57
Title:

Recent Advances in Statistical Mixture Models: Challenges and Applications

Authors:

Sami Bourouis

Abstract: This paper discusses current advances in mixture models, as well as modern approaches and tools that make use of mixture models. In particular, the contribution of mixture-based modeling in various area of researches is discussed. It exposes many challenging issues, especially the way of selecting the optimal model, estimating the parameters of each component, and so on. Some of newly emerging mixture model-based methods that can be applied successfully are also cited. Moreover, an overview of latest developments as well as open problems and potential research directions are discussed. This study aims to demonstrate that mixture models may be consistently proposed as a powerful tool for carrying out a variety of difficult real-life tasks. This survey can be the starting point for beginners as it allows them to better understand the current state of knowledge and assists them to develop and evaluate their own frameworks.
Download

Paper Nr: 60
Title:

A Comparative Study of Deep Learning Methods for the Detection and Classification of Natural Disasters from Social Media

Authors:

Spyros Fontalis, Alexandros Zamichos, Maria Tsourma, Anastasis Drosou and Dimitrios Tzovaras

Abstract: Disaster Management, defined as a coordinated social effort to successfully prepare for and respond to disasters, can benefit greatly as an industrial process from modern Deep Learning methods. Disaster prevention organizations can benefit greatly from the processing of disaster response data. In an attempt to detect and subsequently categorise disaster-related information from tweets via tweet text analysis, a Feedforward Neural Network (FNN), a Convolutional Neural Network, a Bi-directional Long Short-Term Memory (BLSTM), as well as several Transformer-based network architectures, namely BERT, DistilBERT, Albert, RoBERTa and DeBERTa, are employed. The two defined main tasks of the work presented in this paper are: (1) distinguishing tweets into disaster related and non relevant ones, and (2) categorising already labeled disaster tweets into eight predefined natural disaster categories. These supported types of natural disasters are earthquakes, floods, hurricanes, wildfires, tornadoes, explosions, volcano eruptions and general disasters. To achieve this goal, several accessible related datasets are collected and combined to suit the two tasks. In addition, the combination of preprocessing tasks that is most beneficial for inference is investigated. Finally, experiments have been conducted using bias mitigation techniques.
Download

Paper Nr: 70
Title:

Correlated Mutations of Positions Among Structural Proteins in Delta and Omicron Variants for SARS-CoV-2 Amino Acid Sequences

Authors:

Yuichi Shimaya and Kouich Hirata

Abstract: In this paper, we find the correlated mutations of positions among structural proteins of spike, envelop, membrane and nucleocapsid proteins in amino acid sequences of SARS-CoV-2. Here, we adopt the algorithm designed by Shimada et al. (2012) of finding the correlated mutations formulated by joint entropy. In particular, we discuss whether or not the found correlated mutations contains spike protein substitutions in SARS-CoV-2 Delta and Omicron variants.
Download

Paper Nr: 77
Title:

Speech Recognition for Minority Languages Using HuBERT and Model Adaptation

Authors:

Tomohiro Hattori and Satoshi Tamura

Abstract: In the field of speech recognition, models and datasets are becoming larger and larger. However, it is difficult to create large datasets for minority languages, which is an obstacle to improve the accuracy of speech recognition. In this study, we attempt to improve the recognition accuracy for minority languages, by utilizing models trained on large datasets of major language, followed by adapting its language model part to the target language. It is believed that deep-learning speech recognition models learn acoustic and language processing parts. Acoustic one may be common among any languages and has fewer differences than language one. Therefore, we investigate whether it is possible to build a recognizer by keeping acoustic processing learned in the other languages and adapting language processing to the minority language.
Download

Paper Nr: 78
Title:

Cardiac Arrhythmia Classification in Electrocardiogram Signals with Convolutional Neural Networks

Authors:

Igor L. Souza and Daniel O. Dantas

Abstract: Electrocardiography is a frequently used examination technique for heart disease diagnosis. Electrocardiography is essential in the clinical evaluation of patients who have heart disease. Through the electrocardiogram (ECG), medical doctors can identify whether the cardiac muscle dysfunctions presented by the patient have an inflammatory origin and early diagnosis of serious diseases that primarily affect the blood vessels and the brain. The basis of arrhythmia diagnosis is the identification of normal and abnormal heartbeats and their classification into different diagnoses based on ECG morphology. Heartbeats can be divided into five categories: non-ectopic, supraventricular ectopic, ventricular ectopic, fusion, and unknown beats. It is difficult to distinguish these heartbeats apart on the ECG as these signals are typically corrupted by outside noise. The objective of this study is to develop a classifier capable of classifying a patient’s ECG signals for the detection of arrhythmia in clinical patients. We developed a convolutional neural network (CNN) to identify five categories of heartbeats in ECG signals. Our experiment was conducted with ECG signals obtained from a publicly available MIT-BIH database. The number of instances was even out to five classes of heartbeats. The proposed model achieved an accuracy of 99.33% and an F1-score of 99.44% in the classification of ventricular ectopic beats (VEB).
Download

Paper Nr: 86
Title:

Image-Based Fire Detection in Industrial Environments with YOLOv4

Authors:

Otto Zell, Joel Pålsson, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez and Felix Nilsson

Abstract: Fires have destructive power when they break out and affect their surroundings on a devastatingly large scale. The best way to minimize their damage is to detect the fire as quickly as possible before it has a chance to grow. Accordingly, this work looks into the potential of AI to detect and recognize fires and reduce detection time using object detection on an image stream. Object detection has made giant leaps in speed and accuracy over the last six years, making real-time detection feasible. To our end, we collected and labeled appropriate data from several public sources, which have been used to train and evaluate several models based on the popular YOLOv4 object detector. Our focus, driven by a collaborating industrial partner, is to implement our system in an industrial warehouse setting, which is characterized by high ceilings. A drawback of traditional smoke detectors in this setup is that the smoke has to rise to a sufficient height. The AI models brought forward in this research managed to outperform these detectors by a significant amount of time, providing precious anticipation that could help to minimize the effects of fires further.
Download

Paper Nr: 96
Title:

Deep Analysis and Detection of Firewall Anomalies Using Knowledge Graph

Authors:

Abdelrahman O. Elfaki and Amer Aljaedi

Abstract: Implementing firewall policy with defining firewall rules is a cumulative process that could take place in different periods and depend on the network conditions, which makes it prone to errors and difficult to validate without effective tools. Such tools should be carefully designed to capture and spot firewall configuration errors and anomalies. The solution in this paper consists of four steps, which are: formalizing the firewall rules by using FOL, defining the general form of the anomaly, collecting all active destinations’ IP addresses and port numbers in updated lists, and applying the proposed FOL rules for detecting firewall anomalies. The general form has been represented by using knowledge graph for supporting visualization aiming to detect firewall anomalies by extracting knowledge from the knowledge graph and its formalization rules. The proposed method is efficient and capable of discovering all types of firewall anomalies.
Download

Paper Nr: 107
Title:

Metric-Based Few-Shot Learning for Pollen Grain Image Classification

Authors:

Philipp Viertel, Matthias König and Jan Rexilius

Abstract: Pollen is an important substance produced by seed plants. They contain the male gametes which are necessary for fertilization and the reproduction of flowering plants. The scientific study of pollen, palynology, plays a crucial role in a number of disciplines, such as allergology, ecology, forensics, as well as food-production. Current trends in climate research indicate an increasing importance of palynology, partly due to a projected rise in allergies. Pollen detection and classification in microscopic images via deep neural networks has been studied and researched, however, pollen data is often sparse or imbalanced, especially when compared to the number of plant species, which is estimated to be between 330,000 and 450,000, of which only a small percentage is investigated. In this work, we present a solution that does not require a large number of data samples by employing Few-Shot Learning. Our work shows, that by utilizing Prototypical Networks, an average classification accuracy of 90% can be achieved on state-of-the-art pollen data sets. The results can be further improved by fine-tuning the net, achieving up to 98% accuracy on novel classes. To our best knowledge, this is the first attempt at applying Few-Shot Learning in the field of pollen analysis.
Download

Paper Nr: 108
Title:

Deep Learning Based Text Translation and Summarization Tool for Hearing Impaired Using Indian Sign Language

Authors:

Anurag K. Jha, Kabita Choudhary and Sujala D. Shetty

Abstract: There have been multiple text conversions emerging with time but there has hardly been any work in the field of sign language. Even in the field of sign language multiple methods have been proposed to convert it into text via image detection, but due to the rarity of sign language corpus not much work has been put into text or speech to sign language. The proposed project intends to create a translation model to convert text or audio into sign language with its designated grammar. The process includes translation of any language to English followed by summarization of a big article or text, removal of stopwords, reordering the grammar form and stemming words into their root form. The translation is performed by mBART model, summarization is performed using BART model, conversion into animation is done via mapping words into a dictionary and replacing words by letters for unknown words. The paper uses HamNoSys (Regina et al., 1989), SiGML, BART, mBART and NLP to form the translation system. The paper aims to establish better means of communication with the deaf, dumb and people with hearing issues.
Download

Paper Nr: 109
Title:

A Semantic Frame Graph for Information Extraction

Authors:

Michał Gałusza

Abstract: The following paper describes a graphical representation of a short text based on the semantic frames (the Semantic Frame Graph) using Semantic Role Labeling (SRL). It can be a foundation of alternative approach for open information extraction (OIE). The approach postprocesses the output of pretrained SRL classifier and it does not use complex rules, training sets nor significant corpus to decompose sentences. Proposed decomposition and representation reduces number of paths between entities dropping ones that are linguistically unmotivated, generates sequences of frames as paths which can be controlled using dialog coherence approach which further increases plausibility of semantic relationship between entities.
Download

Paper Nr: 118
Title:

On the Future of Training Spiking Neural Networks

Authors:

Katharina Bendig, René Schuster and Didier Stricker

Abstract: Spiking Neural Networks have obtained a lot of attention in recent years due to their close depiction of brain functionality as well as their energy efficiency. However, the training of Spiking Neural Networks in order to reach state-of-the-art accuracy in complex tasks remains a challenge. This is caused by the inherent nonlinearity and sparsity of spikes. The most promising approaches either train Spiking Neural Networks directly or convert existing artificial neural networks into a spike setting. In this work, we will express our view on the future of Spiking Neural Networks and on which training method is the most promising for recent deep architectures.
Download

Paper Nr: 121
Title:

Learning Independently from Causality in Multi-Agent Environments

Authors:

Rafael Pina, Varuna De Silva and Corentin Artaud

Abstract: Multi-Agent Reinforcement Learning (MARL) comprises an area of growing interest in the field of machine learning. Despite notable advances, there are still problems that require investigation. The lazy agent pathology is a famous problem in MARL that denotes the event when some of the agents in a MARL team do not contribute to the common goal, letting the teammates do all the work. In this work, we aim to investigate this problem from a causality-based perspective. We intend to create the bridge between the fields of MARL and causality and argue about the usefulness of this link. We study a fully decentralised MARL setup where agents need to learn cooperation strategies and show that there is a causal relation between individual observations and the team reward. The experiments carried show how this relation can be used to improve independent agents in MARL, resulting not only on better performances as a team but also on the rise of more intelligent behaviours on individual agents.
Download

Paper Nr: 135
Title:

New Centre/Surround Retinex-like Method for Low-Count Image Reconstruction

Authors:

V. E. Antsiperov

Abstract: The work is devoted to the issues of synthesizing a new method for low-count images reconstruction based on a realistic distortion model associated with quantum (Poisson) noise. The proposed approach to the synthesis of the reconstruction methods is based on the principles and concepts of statistical learning, understood as input learning (cf. adaptive smoothing). The synthesis is focused on a special representation of images using sample of counts of controlled size (sampling representation). Based on the specifics of this representation, a generative model of an ideal image is formulated, which is then concretized to a probabilistic parametric model in the form of a system of receptive fields. This model allows for a very simple procedure for estimating the count probability density, which in turn is an estimate of the normalized intensity of the registered radiation. With the help of the latter, similarly to the scheme of wavelet thresholding algorithms, a procedure for extracting contrast in the image is built. From the perception point of view, the contrast carries the main information about the reconstructed image, so such a procedure would provide a high image perception quality. The contrast extraction is carried out by comparing the number of counts in the centre and in the concentric surround of ON/OFF receptive fields and turns out to be very similar to wavelet thresholding.
Download

Paper Nr: 140
Title:

Prostate Cancer Detection, Segmentation, and Classification using Deep Neural Networks

Authors:

Yahia Bouslimi, Takwa A. Gader and Afef K. Echi

Abstract: This paper provides a fully automated computer-aided medical diagnostic system that assists radiologists in segmenting Prostate Cancer (PCa) Lesions from multi-parametric Magnetic Resonance Imaging (mp-MRIs) and predicting whether those lesions are benign or malignant. For that, our proposed approach used deep learning neural networks models such as residual networks (ResNet) and inception networks to classify clinically relevant cancer. It also used U-Net and MultiResU-Net to automatically segment the prostate lesion from mp-MRI’s. We used two publicly available benchmark datasets: the Radboudumc and ProstateX. We tested our fully automatic system and obtained positive findings, with the AUROC of the PCa lesion classification model exceeding 98.4% accuracy. On the other hand, the MultiResU-Net model achieved an accuracy of 98.34% for PCa lesion segmentation.
Download

Paper Nr: 141
Title:

On the Convergence of Stochastic Gradient Descent in Low-Precision Number Formats

Authors:

Matteo Cacciola, Antonio Frangioni, Masoud Asgharian, Alireza Ghaffari and Vahid P. Nia

Abstract: Deep learning models are dominating almost all artificial intelligence tasks such as vision, text, and speech processing. Stochastic Gradient Descent (SGD) is the main tool for training such models, where the computations are usually performed in single-precision floating-point number format. The convergence of single-precision SGD is normally aligned with the theoretical results of real numbers since they exhibit negligible error. However, the numerical error increases when the computations are performed in low-precision number formats. This provides compelling reasons to study the SGD convergence adapted for low-precision computations. We present both deterministic and stochastic analysis of the SGD algorithm, obtaining bounds that show the effect of number format. Such bounds can provide guidelines as to how SGD convergence is affected when constraints render the possibility of performing high-precision computations remote.
Download

Paper Nr: 143
Title:

Integration of Statistical Methods and Artificial Neural Networks for the Detection of Oil Stains in the Aquatic Environment

Authors:

Monik S. Sousa and João F. Neto

Abstract: The growth in oil exploration and transport increases the risk of accidents in the aquatic environment. Early detection of oil slicks in the aquatic environment is essential to minimize the risk of accidents, as well as effective decision-making. Thus, a method for detecting oil stains is needed to reduce the damage caused by industrial activities to the environment. This article presents statistical methods of classification and machine learning to detect oil slicks on the ocean surface. For this, images from a Synthetic Aperture Radar (SAR) were used. The proposed model for detecting oil slicks uses Linear Discriminant Analysis (LDA) to generate an estimate of the class to which the database images belong (image without oil slick, and image with oil slick), and the Artificial Neural Network (ANN) to classify the data, in which these data come from the grouping of the image with the result of the LDA. With the results obtained, it is concluded that the proposed method of detecting oil slicks on the ocean surface can detect oil slicks with good accuracy.
Download

Paper Nr: 151
Title:

Exploiting Context in Handwriting Recognition Using Trainable Relaxation Labeling

Authors:

Sara Ferro, Alessandro Torcinovich, Arianna Traviglia and Marcello Pelillo

Abstract: Handwriting Text Recognition (HTR) is a fast-moving research topic in computer vision and machine learning domains. Many models have been introduced over the years, one of the most well-established ones being the Convolutional Recurrent Neural Network (CRNN), which combines convolutional feature extraction with recurrent processing of the visual embeddings. Such a model, however, presents some limitations such as a limited capability to account for contextual information. To counter this problem, we propose a new learning module built on top of the convolutional part of a classical CRNN model, derived from the relaxation labeling processes, which is able to exploit the global context reducing the local ambiguities and increasing the global consistency of the prediction. Experiments performed on three well-known handwritten recognition datasets demonstrate that the relaxation labeling procedures improve the overall transcription accuracy at both character and word levels.
Download

Paper Nr: 152
Title:

Automatic Coronary Angiogram Keyframe Extraction

Authors:

Hounaida Moalla, Aiman Ghrab, Bassem Ben Hamed, Amine Bahloul, Rania Hammami and Leila Abid

Abstract: Coronary artery disease is one of the most feared atherosclerosis complications. Doctors use coronary angiography as a diagnostic tool to diagnose a patient with obstructive coronary artery disease and treat it efficiently. The effectiveness of the doctor’s intervention strongly depends on the quality of the diagnosis. Therefore, good extraction of keyframes from coronary angiography will certainly improve the accuracy of the decision. Hence the importance is given to this step. To determine the best way to extract keyframes from coronary angiograms, we tested several methods for keyframe extraction. Our keyframe extraction method that we propose is based on the use of filters and the calculation of frame intensities of a given coronary angiogram. The pilot frame is the brightest one, and the keyframes will be its six neighboring frames. Our method Contrast Enhanced Sato filter, succeeded in extracting the right keyframes with an accuracy of around 85.74%.
Download

Paper Nr: 161
Title:

Improving the Accuracy of Tracker by Linearized Transformer

Authors:

Thang H. Dinh, Kien T. Trung, Thanh N. Chi and Long T. Quoc

Abstract: Visual object tracking seeks to correctly estimate the target’s bounding box, which is difficult due to occlusion, illumination variation, background clutters, and camera motion. Recently, Siamese-based approaches have demonstrated promising visual tracking capability. However, most modern Siamese-based methods compute target and search image features independently, then use correlation to acquire correlation information from two feature maps. The correlation operation is a straightforward fusion technique that considers the similarity between the template and the search region. This may be the limiting factor in the development of high-precision tracking algorithms. This research offers a Siamese refinement network for visual tracking that enhances and fuses template and search patch information directly without needing a correlation operation. This approach can boost any tracker performance and produces boxes without any postprocessing. Extensive experiments on visual tracking benchmarks such as VOT2018, UAV123, OTB100, and LaSOT with DiMP50 base tracker demonstrate that our method achieves state-of-the-art results. For example, on the VOT2018, LaSOT, and UAV123 test sets, our method obtains a significant improvement of 5.3% (EAO), 3.5% (AUC), and 2.9% (AUC) over the base tracker. Our network runs at approximately 30 FPS on GPU RTX 3070.
Download

Area 2 - Applications

Full Papers
Paper Nr: 7
Title:

Multi-Scale Feature Aggregation Based Multiple Instance Learning for Pathological Image Classification

Authors:

Takeshi Yoshida, Kazuki Uehara, Hidenori Sakanashi, Hirokazu Nosato and Masahiro Murakawa

Abstract: This study proposes a multi-scale attention assembler network (MSAA-Net) for multi-scale pathological image classification. The proposed method discovers crucial features by observing each scale and finding essential scales used for classification. To realize this characteristic, we introduce a two-stage feature aggregation mechanism, which first assigns the attention weights to useful local regions for each scale and then assigns the attention weights to the scale. The mechanism observes a pathological image from each scale perspective and adaptively determines the essential scale to classify from the observation results. To train the MSAA-Net, we adopt multiple instance learning (MIL), a learning approach for predicting a label corresponding to multiple images. The labeling effort reduces because the MIL trains the classification model using diagnoses for whole slide-level images obtained by daily diagnoses of pathologists instead of detailed annotations of the images. We conducted classification using two pathological image datasets to evaluate the proposed method. The results indicate that the proposed method outperforms state-of-the-art multi-scale-based methods.
Download

Paper Nr: 40
Title:

Keyframe and GAN-Based Data Augmentation for Face Anti-Spoofing

Authors:

Jarred Orfao and Dustin van der Haar

Abstract: As technology improves, criminals, find new ways to gain unauthorised access. Accordingly, face spoofing has become more prevalent in face recognition systems, requiring adequate presentation attack detection. Traditional face anti-spoofing methods used human-engineered features, and due to their limited representation capacity, these features created a gap which deep learning has filled in recent years. However, these deep learning methods still need further improvements, especially in the wild settings. In this work, we use generative models as a data augmentation strategy to improve the face anti-spoofing performance of a vision transformer. Moreover, we propose an unsupervised keyframe selection process to generate better candidate samples for more efficient training. Experiments show that our augmentation approaches improve the baseline performance of the CASIA-FASD and achieve state-of-the-art performance on the Spoof in the Wild database for protocols 2 and 3.
Download

Paper Nr: 45
Title:

Evaluating the Impact of Low-Light Image Enhancement Methods on Runner Re-Identification in the Wild

Authors:

Oliverio J. Santana, Javier Lorenzo-Navarro, David Freire-Obregón, Daniel Hernández-Sosa and Modesto Castrillón-Santana

Abstract: Person re-identification (ReID) is a trending topic in computer vision. Significant developments have been achieved, but most rely on datasets with subjects captured statically within a short period of time in rather good lighting conditions. In the wild scenarios, such as long-distance races that involve widely varying lighting conditions, from full daylight to night, present a considerable challenge. This issue cannot be addressed by increasing the exposure time on the capture device, as the runners’ motion will lead to blurred images, hampering any ReID attempts. In this paper, we survey some low-light image enhancement methods. Our results show that including an image processing step in a ReID pipeline before extracting the distinctive body appearance features from the subjects can provide significant performance improvements.
Download

Paper Nr: 58
Title:

Outlier Detection Method for Equipment Onboard Merchant Vessels

Authors:

Iori Oki, Seiji Yamada and Takashi Onoda

Abstract: The equipment onboard merchant vessels are essential for safe navigation. If an equipment malfunction occurs during a voyage, it is difficult to repair it with the same speed and accuracy as on land. Therefore, it is important to It is required to be able to repair and replace the equipment with a margin of time by detecting the signs of anomalies. In this paper, we present the results of detecting signs of anomalies from various sensor data collected using One-Class SVM. It also shows the results of interpreting the signs of anomalies and detected locations using SHAP. The results show that the proposed method can detect signs of anomalies at a point about one month before the conventional method. Therefore, the proposed method is shown to be potentially useful for the maintenance of equipment on merchant vessels.
Download

Paper Nr: 95
Title:

Strokes Trajectory Recovery for Unconstrained Handwritten Documents with Automatic Evaluation

Authors:

Sidra Hanif and Longin J. Latecki

Abstract: The focus of this paper is offline handwriting Stroke Trajectory Recovery (STR), which facilitates the tasks such as handwriting recognition and synthesis. The input is an image of handwritten text, and the output is a stroke trajectory, where each stroke is a sequence of 2D point coordinates. Usually, Dynamic Time Warping (DTW) or Euclidean distance-based loss function is employed to train the STR network. In DTW loss calculation, the predicted and ground-truth stroke sequences are aligned, and their differences are accumulated. The DTW loss penalizes the alignment of far-off points proportional to their distance. As a result, DTW loss incurs a small penalty if the predicted stroke sequence is aligned to the ground truth stroke sequence but includes stray points/ artifacts away from ground truth points. To address this issue, we propose to compute a marginal Chamfer distance between the predicted and the ground truth point sets to penalize the stray points more heavily. Our experiments show that the loss penalty incurred by complementing DTW with the marginal Chamfer distance gives better results for learning STR. We also propose an evaluation method for STR cases where ground truth stroke points are unavailable. We digitalize the predicted stroke points by rendering the stroke trajectory as an image and measuring the image similarity between the input handwriting image and the rendered digital image. We further evaluate the readability of recovered strokes. By employing an OCR system, we determine whether the input image and recovered strokes represent the same words.
Download

Paper Nr: 98
Title:

Continuous Sign-Language Recognition using Transformers and Augmented Pose Estimation

Authors:

Reemt Hinrichs, Angelo Y. Sitcheu and Jörn Ostermann

Abstract: Sign language is used by deaf to communicate with other humans. It consists of not only hand signs or gestures but encompasses also facial expressions and further body movements. To make machine-human interaction accessible for deaf, automatic sign language recognition has to be implemented which allows a machine to understand the signs and gestures of deaf. For this purpose, continous sign-language recognition, which is the mapping of a (visual) sequence of signs forming a (sign) sentence to a sequence of (text) words, has to be developed. In this work, continuous sign-language recognition using transformers is proposed. Using additional pose estimation, body markers are extracted and augmented through data imputation and velocity-like features, and then used together with a transformer network for continuous sign-language recognition. Using the proposed method, better than state-of-the-art results were obtained on the RWTH-PHOENIX-Weather 2014 dataset, achieving 19.2%/19.5% dev/test word error rate (WER) on the signer-independent subset and 16.9%/17.4% dev/test WER on the simpler multi-signer subset. The feature augmentation was found to improve the baseline word error rate by about 2.7 %/ 2.9 % dev/test.
Download

Paper Nr: 100
Title:

A Hierarchical Approach for Multilingual Speech Emotion Recognition

Authors:

Marco Nicolini and Stavros Ntalampiras

Abstract: This article approaches the Speech Emotion Recognition (SER) problem with the focus placed on multilingual settings. The proposed solution consists in a hierarchical scheme the first level of which identifies the speaker’s gender and the second level predicts the speaker’s emotional state. We elaborate with three classifiers of increased complexity, i.e. k-NN, transfer learning based on YAMNet and Bidirectional Long Short-Term Memory neural networks. Importantly, model learning, validation and testing consider the full range of the big-six emotions, while the dataset has been assembled using well-known SER datasets representing six different languages. The obtained results show differences in classifying all data against only female or male data with respect to all classifiers. Interestingly, a-priori genre recognition can boost the overall classification performance.
Download

Paper Nr: 111
Title:

Data Streams: Investigating Data Structures for Multivariate Asynchronous Time Series Prediction Problems

Authors:

Christopher Vox, David Broneske, Istiaque M. Shaikat and Gunter Saake

Abstract: Time series data are used in many practical applications, such as in the area of weather forecasting or in the automotive industry to predict the aging of a vehicle component. For these practical applications, multivariate time series are present, which are synchronous or asynchronous. Asynchronicity can be caused by different record frequencies of sensors and often causes challenges to the efficient processing of data in data analytics tasks. In the area of deep learning, several methods are used to preprocess the data for the respective models appropriately. Sometimes these data preprocessing methods result in a change of data distribution and thus, to an introduction of data based bias. Therefore, we review different data structures for deep learning with multivariate, asynchronous time series and we introduce a lightweight data structure which utilizes the idea of stacking asynchronous data for deep learning problems. As data structure we create the Triplet-Stream with decreased memory footprint, which we evaluate for one classification problem and one regression problem. The Triplet-Stream enables excellent performance on all datasets compared to current approaches.
Download

Paper Nr: 115
Title:

Image-Based Material Analysis of Ancient Historical Documents

Authors:

Thomas Reynolds, Maruf A. Dhali and Lambert Schomaker

Abstract: Researchers continually perform corroborative tests to classify ancient historical documents based on the physical materials of their writing surfaces. However, these tests, often performed on-site, requires actual access to the manuscript objects. The procedures involve a considerable amount of time and cost, and can damage the manuscripts. Developing a technique to classify such documents using only digital images can be very useful and efficient. In order to tackle this problem, this study uses images from a famous historical collection, the Dead Sea Scrolls, to propose a novel method to classify the materials of the manuscripts. The proposed classifier uses the two-dimensional Fourier Transform to identify patterns within the manuscript surfaces. Combining a binary classification system employing the transform with a majority voting process is shown to be effective for this classification task. This pilot study shows a successful classification percentage of up to 97% for a confined amount of manuscripts produced from either parchment or papyrus material. Feature vectors based on Fourier-space grid representation outperformed a concentric Fourier-space format.
Download

Paper Nr: 138
Title:

A Comparative Study of GAN Methods for Physiological Signal Generation

Authors:

Nour Neifar, Achraf Ben-Hamadou, Afef Mdhaffar, Mohamed Jmaiel and Bernd Freisleben

Abstract: Due to medical data scarcity and complex dynamics of physiological signals, different solutions based on generative adversarial networks (GANs) have been proposed to generate physiological signals, such as electrocardiograms (ECG) and photoplethysmograms (PPG). In this paper, we present a comparative study of existing methods for ECG and PPG signal generation. The competing methods are evaluated on the MIT-BIH arrhythmia and the PPG-BP datasets. Experimental results demonstrate the benefits of incorporating prior knowledge in the generation process and the robustness of these methods for the synthesis of realistic ECG and PPG signals.
Download

Short Papers
Paper Nr: 5
Title:

Deep Learning Semi-Supervised Strategy for Gamma/Hadron Classification of Imaging Atmospheric Cherenkov Telescope Events

Authors:

Diego Riquelme, Mauricio Araya, Sebastian Borquez, Boris Panes and Edson Carquin

Abstract: The new Cherenkov Telescope Array (CTA) will record astrophysical gamma-ray events with an energy coverage range, angular resolution, and flux sensitivity never achieved before. The Earth’s atmosphere produces Cherenkov’s light when a shower of particles is induced by a high-energy particle of astrophysical origin (gammas, hadrons, electrons, etc.). The energy and direction of these gamma air shower events can be reconstructed stereoscopically using imaging atmospheric Cherenkov detectors. Since most of CTA’s scientific goals focus on identifying and studying Gamma-Ray sources, it is imperative to distinguish this specific type of event from the hadronic cosmic ray background with the highest possible efficiency. Following this objective, we designed a competitive deep-learning-based approach for gamma/background classification. First, we train the model with simulated images in a standard supervised fashion. Then, we explore a novel self-supervised approach that allows the use of new unlabeled images towards a method for refining the classifier using real images captured by the telescopes. Our results show that one can use unlabeled observed data to increase the accuracy and general performance of current simulation-based classifiers, which suggests that continuous improvement of the learning model could be possible under real data conditions.
Download

Paper Nr: 10
Title:

EvLiDAR-Flow: Attention-Guided Fusion Between Point Clouds and Events for Scene Flow Estimation

Authors:

Ankit Sonthalia, Ramy Battrawy, René Schuster and Didier Stricker

Abstract: In this paper, we propose the fusion of event streams and point clouds for scene flow estimation. Bio-inspired event cameras offer significantly lower latency and higher dynamic ranges than regular RGB cameras, and are therefore appropriate for recording high-speed motions. However, events do not provide depth information, which makes them unsuitable for scene flow (3D) estimation. On the other hand, LiDAR-based approaches are well suited to scene flow estimation due to the high precision of LiDAR measurements for outdoor scenes (e.g. autonomous vehicle applications) but they fail in the presence of unstructured regions (e.g. ground surface, grass, walls, etc.). We propose our EvLiDAR-Flow, a neural network architecture equipped with an attention module for bi-directional feature fusion between an event (2D) branch and a point cloud (3D) branch. This kind of fusion helps to overcome the lack of depth information in events while enabling the LiDAR-based scene flow branch to benefit from the rich motion information encoded by events. We validate the proposed EvLiDAR-Flow by showing that it performs significantly better and is robust to the presence of ground points, in comparison to a state-of-the-art LiDAR-only scene flow estimation method.
Download

Paper Nr: 14
Title:

Fake It, Mix It, Segment It: Bridging the Domain Gap Between Lidar Sensors

Authors:

Frederik Hasecke, Pascal Colling and Anton Kummert

Abstract: Lidar segmentation provides detailed information about the environment surrounding robots or autonomous vehicles. Current state-of-the-art neural networks for lidar segmentation are tailored to specific datasets. Changing the lidar sensor without retraining on a large annotated dataset from the new sensor results in a significant decrease in performance due to a ”domain shift.” In this paper, we propose a new method for adapting lidar data to different domains by recreating annotated panoptic lidar datasets in the structure of a different lidar sensor. We minimize the domain gap by generating panoptic data from one domain in another and combining it with partially labeled data from the target domain. Our method improves the SemanticKITTI (Behley et al., 2019) to nuScenes (Caesar et al., 2020) domain adaptation performance by up to +51.5 mIoU points, and the SemanticKITTI to nuScenes domain adaptation by up to +48.3 mIoU. We compare two stateof-the-art methods for domain adaptation of lidar semantic segmentation to ours and demonstrate a significant improvement of up to +21.2 mIoU over the previous best method. Furthermore we successfully train well performing semantic segmentation networks for two entirely unlabeled datasets of the state-of-the-art lidar sensors Velodyne Alpha Prime and InnovizTwo
Download

Paper Nr: 17
Title:

Adaptive Adversarial Samples Based Active Learning for Medical Image Classification

Authors:

Siteng Ma, Yu An, Jing Wang, Aonghus Lawlor and Ruihai Dong

Abstract: Active learning (AL) is a subset of machine learning, which attempts to minimize the number of required training labels while maximizing the performance of the model. Most current research directions regarding AL focus on the improvement of query strategies. However, efficiently utilizing data may lead to more performance improvements than are thought to be achievable by changing the selection strategy. Thus, we present an adaptive adversarial sample-based approach to query unlabeled samples close to the decision boundary through the adversarial attack. Notably, based on that, we investigate the importance of using existing data effectively in AL by integrating generated adversarial samples according to consistency regularization and leveraging large numbers of unlabeled images via pseudo-labeling with the oracle-annotated instances during training. In addition, we explore an adaptive way to request labels dynamically as the model changes state. The experimental results verify our framework’s effectiveness with a significant improvement over various state-of-the-art methods for multiple medical applications. Our method achieves 3% above the supervised learning accuracy on the Messidor Dataset (the task of Diabetic Retinopathy detection) using only 34% of the whole dataset. We also conducted an extensive study on a histological Breast Cancer Diagnosis Dataset. Our code is available at https://github.com/HelenMa9998/adversarial active learning.
Download

Paper Nr: 29
Title:

Neonatal Video Database and Annotations for Vital Sign Extraction and Monitoring

Authors:

Hussein Sharafeddin, Lama Charafeddine, Jamila Khalaf, Ibrahim Kanj and Fadi A. Zaraket

Abstract: Background: The end goal of this project is to detect early signs of physiological disorders in term and preterm babies at the Neonatal Intensive Care Unit using real time camera-based non-contact vital signs monitoring technology. The contact sensors technology currently in use might cause stress, pain, and damage to the fragile skin of extremely preterm infants. Realization of the proposed camera based method might complement and eventually replace current technology. Non-invasive early detection of heart rate variability might allow earlier intervention, improve outcome, and decrease hospital stay. This study constructed a curated set of videos annotated with accurate and reliable measurements of the monitored vital parameters such as heart and respiratory rates so that further analysis of the curated data set lead towards the end goal. Body: The data collection process included 56 total hours of recording in 127 videos of 27 enlisted neonates. The video annotations include (1) vital signs acquired from bedside patient monitors at second based intervals, (2) the neonate state of health entered and manually reviewed by a healthcare provider, (3) region of interest in video frames for heart rate detection extracted semi-automatically, and (4) the anonymized and clipped region of interest videos. Conclusion: The paper presents a curated data set of 127 video recordings of deidentified neonate foreheads annotated with vital signs, and health state in XML format. The paper also presents a utility study that shows accurate results in estimating the heart rate of term and preterm neonates. We hypothesize that the data set we collected is beneficial for improving state of the art monitoring techniques. Its timely dissemination may help lead to techniques that detect anomalies earlier, hence, leading to earlier treatment and improved outcome.
Download

Paper Nr: 59
Title:

SVM Based Maximum Power Consumption Excess Forecast Alert for Large-Scale Power Consumers

Authors:

Seigo Haruta, Ken-ichi Tokoro and Takashi Onoda

Abstract: Large-scale power consumers, such as buildings and factories, make high-voltage power contracts with the Japanese electric power companies. The basic fee for high-voltage power contracts is based on the maximum power consumption in the past year. If the power consumption in the present month does not exceed the maximum power consumption in the past year, large-scale power consumers can suppress the basic fee. So, large-scale power consumers need the alert to prevent the maximum power consumption in the present month from exceeding the maximum power consumption in the past year. In this study, excess forecasting was performed considering the characteristics of power consumption in each industry. In addition, we proposed SVM improvements for imbalanced data. We applied this method to power consumption data, which is imbalanced data, to perform excess forecast. As a result, we have improved the accuracy of the excess forecast and contributed to effective alerts to many large-scale power consumers.
Download

Paper Nr: 67
Title:

State of Health Estimation of Lithium-ion Batteries Using Convolutional Neural Network with Impedance Nyquist Plots

Authors:

Yichun Li, Mina Maleki, Shadi Banitaan and Mingzuoyang Chen

Abstract: In order to maintain the Li-ion batteries in a safe operating state and to optimize their performance, a precise estimation of the state of health (SOH), which indicates the degradation level of the Li-ion batteries, has to be taken into consideration urgently. In this paper, we present a regression machine learning framework that combines a convolutional neural network (CNN) with the Nyquist plot of Electrochemical Impedance Spectroscopy (EIS) as features to estimate the SOH of Li-ion batteries with a considerable improvement in the accuracy of SOH estimation. The results indicate that the Nyquist plot based on EIS features provides more detailed information regarding battery aging than simple impedance values due to its ability to reflect impedance change over time. Furthermore, convolutional layers in the CNN model were more effective in extracting different levels of features and characterizing the degradation patterns of Li-ion batteries from EIS measurement data than using simple impedance values with a DNN model, as well as other traditional machine learning methods, such as Gaussian process regression (GPR) and support vector machine (SVM).
Download

Paper Nr: 71
Title:

Step Towards Generalization: Fault Classification in Multivariate High-Frequency Data from Different Operating Regimes of Hydraulic Rock Drill System

Authors:

Nagi Reddy, Ashit Gupta, Gauri Dhande and Vijaykumar Pasupureddy

Abstract: Hydraulic rock drills operate under harsh environments of excessive humidity and vibrations. In operation, the fundamental machine frequency is hampered by various loading disturbances created by the pressure waves generated during the rock drill application, which initiates faults at different times during a complete cycle of rock drilling. These faults include failure of internal parts, excessive channel openings and damaged parts, causing enough non-linearity in the pressure data generated. A fault in such machinery can multiply quite rapidly, leading to accidents like complete failure of the equipment and loss of life. Therefore, it is crucial to classify the fault and inform the operator of it. The fault classification challenge escalates further when the rock drill operates on previously unknown operating conditions. In the present work, we compare the performance of deep learning models like Long short-term memory, Convolutional Neural Network, and Residual Network to classify faults, whose signature is recorded in data generated at a frequency of 50kHz when a rock drill is in operation. We also demonstrate how the accuracy of models vary when the models are tested on unseen operating conditions. An overall analysis is provided to generalize a model for fault classification in industrial applications over contrasting operating conditions.
Download

Paper Nr: 73
Title:

Addressing Privacy and Security Concerns in Online Game Account Sharing: Detecting Players Using Mouse Dynamics

Authors:

Yimiao Wang and Tasmina Islam

Abstract: As the internet has taken a huge part of people’s life, the personal information an online account can hold has increased as well, resulting in many concerns related to cybersecurity and privacy. Children as a vulnerable group could participate in risky actions unconsciously causing privacy leakage, like sharing a game account. This paper discusses the possible security and privacy risks caused by game account sharing and proposes a countermeasure based on user authentication to detect the true owner of the game account using their mouse dynamics. Support Vector Machine and Random Forest have been used for classification of the true owner and the intruder using players’ mouse dynamics data captured from “Minecraft” game. This paper also investigates the effect of different feature sets in detecting the players using feature ranking algorithms.
Download

Paper Nr: 105
Title:

Fixed Tasks for Continuous Authentication via Smartphone

Authors:

Vincenzo Gattulli, Donato Impedovo, Tonino Palmisano and Lucia Sarcinella

Abstract: Mobile devices feature a variety of knowledge-based authentications such as PINs, passwords, and lock sequences. The weakness of these approaches is that once leaked and/or intercepted, the control over the device is lost and no more authentication steps are required. In this paper, the efficiency of a set of ML algorithms in authenticating users is evaluated with the aim of understanding which are the best tasks to use by submitting Fixed Tasks, which simulate the use of a device in daily life, through Touch Behaviour and motion sensors installed in the device itself. Next, a social problem is posed, in which an attempt is made to understand whether a group of subjects at a trial performed the assigned tasks correctly without permitting other people to do them instead.
Download

Paper Nr: 113
Title:

Evaluation of Factors-of-Interest in Bone Mimicking Models Based on DFT Analysis of Ultrasonic Signals

Authors:

Aleksandrs Sisojevs, Alexey Tatarinov and Anastasija Chaplinska

Abstract: Bone fragility in osteoporosis is associated with a decrease in the thickness of the cortical layer CTh in long bones and the development of internal porosity P in it. In the present work, an attempt was made to predict the factors-of-interest CTh and P based on the pattern recognition approach, where DFT analysis was applied to ultrasonic signals in surface transmission through a soft tissue layer. Compact bone was modeled with PMMA plates with gradual changes in CTh from 2 to 6 mm, and internal porosity P was created by drilling where the thickness of the porous layer P varied from 0 to 100% of CTh. The estimation method was based on a statistical analysis of the magnitude of the DFT spectrum of the ultrasonic signals. Decision rules were mathematical criteria calculated as ratios between the envelope functions of the magnitudes. Each of the objects was chosen in turn as a test object, while other specimens composed the training set. The results of the experiments showed the potential effectiveness of the CTh and P prediction, while additional physical parameters may be used as decision rules to improve the reliability of the diagnosis.
Download

Paper Nr: 120
Title:

Bayesian Iterative Closest Point for Shape Analysis of Brain Structures

Authors:

Mauricio Castaño-Aguirre, Hernan F. Garcia, Álvaro A. Orozco, Gloria L. Porras-Hurtado and David A. Cárdenas-Peña

Abstract: Machine learning in medical image analysis has proved to be a strategy that solves many problems emerging from the variability in the physician’s outlines and the amount of time each physician spends analyzing each image. One of the most critical medical image analysis approaches is Medical Image Registration which has been a topic of active research for the last few years. In this paper, we proposed a Bayesian Optimization framework for Point Cloud Registration for shape analysis of brain structures. Here, we rely on a modified version of the Iterative Closest Point (ICP) algorithm. This approach built a black box function that receives input parameters for performing an Point Cloud transformation. Then, we used a similarity metric that shows the performance of the transformation. With this similarity metric, we built a function to define a Bayesian strategy that allows us to find the global optimum of the similarity metric-based function. To this end, we used Bayesian Optimization, which performs global optimization of unknown functions making observations and performing probabilistic calculations. This model considers all the previous observations, which prevents the strategy from falling into an optimal local, as often happens in strategies based on classical optimization approaches such as Gradient Descent. Finally, we evaluate the model by performing a point cloud registration process corresponding to brain structures at different time instances. The experimental results show a faster convergence towards the global optimum and building. Besides, the proposed model evidenced robust optimization results for registration strategies in point clouds.
Download

Paper Nr: 4
Title:

Classifying Intelligence Tests Patterns Using Machine Learning Methods

Authors:

Georgios Liapis, Loukritira Stefanou and Ioannis Vlahavas

Abstract: Intelligence testing assesses a variety of cognitive abilities and is frequently used in the evaluation of people for jobs, army recruitment, scholarships, and the educational system in general. Licensed psychologists and researchers create and analyze intelligence tests, setting the difficulty layer, grading them, and weighing the results on a global scale. However, developing new model tests is a time-consuming and challenging process. In this study, we lay the groundwork for developing a model that classifies the IQ patterns, in order to generate new IQ Raven tests. More specifically, we analyze Raven’s Progressive Matrices Tests, a nonverbal multiple-choice intelligence test, and their patterns using a variety of Machine Learning (ML) techniques. In such intelligence tests, the question’s data includes mostly abstract images aligned in a grid system, with one missing element and a pattern that connects them by threes in horizontal and vertical order. These tests have been labeled based on several factors, such as the number of images, the type of pattern (e.g. counting, adding, or rotating), or their complexity and in order to classify them, various ML methods are used. Results of the current study act as a defining basis for the use of advanced Neural Network models, not only for classification but also for the generation of new IQ patterns.
Download

Paper Nr: 21
Title:

Light U-Net with a New Morphological Attention Gate Model Application to Analyse Wood Sections

Authors:

Rémi Decelle, Phuc Ngo, Isabelle Debled-Rennesson, Frédéric Mothe and Fleur Longuetaud

Abstract: This article focuses on heartwood segmentation from cross-section RGB images (see Fig.1). In this context, we propose a novel attention gate (AG) model for both improving performance and making light convolutional neural networks (CNNs). Our proposed AG is based on mathematical morphology operators. Our light CNN is based on the U-Net architecture and called Light U-net (LU-Net). Experimental results show that AGs consistently improve the prediction performance of LU-Net across different wood cross-section datasets. Our proposed morphological AG achieves better performance than original U-Net with 10 times less parameters.
Download

Paper Nr: 34
Title:

Video-Based Sign Language Digit Recognition for the Thai Language: A New Dataset and Method Comparisons

Authors:

Wuttichai Vijitkunsawat, Teeradaj Racharak, Chau Nguyen and Nguyen L. Minh

Abstract: Video-based sign language recognition aims to support deaf people, so they can communicate with others by assisting them to recognise signs from video input. Unfortunately, most existing sign language datasets are limited to a small vocabulary, especially in low-resource languages such as Thai. Recent research in the Thai community has mostly paid attention to building recognisers from static input with limited datasets, making it difficult to train machine learning models for practical applications. To overcome this limitation, this paper originally introduces a new video database for automatic sign language recognition for Thai sign language digits. Our dataset has about 63 videos for each of the nine digits and is performed by 21 signers. Preliminary baseline results for this new dataset are presented under extensive experiments. Indeed, we implement four deep-learning-based architectures: CNN-Mode, CNN-LSTM, VGG-Mode, and VGG-LSTM, and compare their performances under two scenarios: (1) the whole body pose with backgrounds, and (2) hand-cropped images only as pre-processing. The results show that VGG-LSTM with pre-processing has the best accuracy for our in-sample and out-of-sample test datasets.
Download

Paper Nr: 39
Title:

Lightweight Audio-Based Human Activity Classification Using Transfer Learning

Authors:

Marco Nicolini, Federico Simonetta and Stavros Ntalampiras

Abstract: This paper employs the acoustic modality to address the human activity recognition (HAR) problem. The cornerstone of the proposed solution is the YAMNet deep neural network, the embeddings of which comprise the input to a fully-connected linear layer trained for HAR. Importantly, the dataset is publicly available and includes the following human activities: preparing coffee, frying egg, no activity, showering, using microwave, washing dishes, washing hands, and washing teeth. The specific set of activities is representative of a standard home environment facilitating a wide range of applications. The performance offered by the proposed transfer learning-based framework surpasses the state of the art, while being able to be executed on mobile devices, such as smartphones, tablets, etc. In fact, the obtained model has been exported and thoroughly tested for real-time HAR on a smartphone device with the input being the audio captured from its microphone.
Download

Paper Nr: 41
Title:

Using a Genetic Algorithm to Update Convolutional Neural Networks for Abnormality Classification in Mammography

Authors:

Steven Wessels and Dustin van der Haar

Abstract: The processing of medical imaging studies is a costly and error-prone task. The use of deep learning algorithms for the automated classification of abnormalities can aid radiologists in interpreting medical images. This paper presents a genetic algorithm that is used to fine-tune the internal parameters of convolutional neural networks trained for abnormality classification in mammographic imaging. We used our genetic algorithm to search for the neural network weights representing the global minimum solution for ResNet50 and Xception architectures. The Xception architecture outperformed the ResNet baseline for both tasks, with the Xception baseline model achieving an AUC score of 72%. The genetic algorithm demonstrated a slight proclivity for improving the general metric evaluations of the network that it fine-tuned, but in some cases, it was still prone to miss good regions in the search space.
Download

Paper Nr: 44
Title:

Ensemble Learning for Cough-Based Subject-Independent COVID-19 Detection

Authors:

Vincenzo Conversano and Stavros Ntalampiras

Abstract: This paper belongs to the medical acoustics field and presents a solution for COVID-19 detection based on the cough sound events. Unfortunately, the use of RT-PCR Molecular Swab tests for the diagnosis of COVID-19 is associated with considerable cost, is based on availability of suitable equipment, requires a specific time period to produce the result, let alone the potential errors in the execution of the tests. Interestingly, in addition to Swab tests, cough sound events could facilitate the detection of COVID-19. Currently, there is a great deal of research in this direction, which has led to the development of publicly available datasets which have been processed, segmented, and labeled by medical experts. This work proposes an ensemble composed of a variety of classifiers suitably adapted to the present problem. Such classifiers are based on a standardized feature extraction front-end representing the involved audio signals limiting the necessity to design handcrafted features. In addition, we elaborate on a prearranged publicly available dataset and introduce an experimental protocol taking into account model bias originating from subject dependency. After thorough experiments, the proposed model was able to outperform the state of the art both in patient-dependent and -independent settings.
Download

Paper Nr: 61
Title:

PVT based Blood Vessel Segmentation and Polyp Size Estimation in Colonoscopy Images

Authors:

Insaf Setitra, Yuji Iwahori, Yacine Elhamer, Anais Mezrag, Shinji Fukui and Kunio Kasugai

Abstract: The size of colorectal polyps is one of the factors conditioning the risk of synchronous and metachronous colorectal cancer (CRC). In this work, we are interested in the automatic measurement of polyp sizes in colonoscopy videos. The study is performed in two steps: (1) first the detection and segmentation of the polyp by the neural network Polyp-PVT and then (2) the classification of the polyp into different classes (type of disease, size of the polyp). This is done by extracting information from blood vessels, a parameter that has a low variability and is present in the majority of colonoscopic videos. This method has been validated by two local Hepato-Gastro-Enterology specialists. Once the size of the polyp is extracted, a classification of polyps as susceptible malignant (polyp size ≥ 6 mm) and susceptible benign (polyp size < 6 mm) is performed. Our approach reaches an accuracy of 85.61% for the first category and 73.92% for the second one and is comparable to human classification which is estimated to 52% for beginners and 71% for experts endoscopists.
Download

Paper Nr: 62
Title:

Clustering LiDAR Data with K-means and DBSCAN

Authors:

Mafalda I. Oliveira and André S. Marcal

Abstract: Multi-object detection is an essential aspect of autonomous driving systems to guarantee the safety of self-driving vehicles. In this paper, two clustering methods, DBSCAN and K-means, are used to segment LiDAR data and recognize the objects detected by the sensors. The Honda 3D LiDAR Dataset (H3D) and BOSCH data acquired within the THEIA project were the datasets used. The clustering methods were evaluated in several traffic scenarios, with different characteristics, extracted from both datasets. To validate the clustering results, five internal indexes were computed for each scenario tested. The available ground truth data for the H3D dataset also enabled the computation of 3 basic external indexes and a composite external index, which is newly proposed. A method to compute reference bounding boxes is presented using the available labels from H3D. The overall results indicate that K-means outperformed DBSCAN in the internal validation indexes Silhouette, C-index, and Calinski-Harabasz, and DBSCAN performed better than K-means in the Dunn and Davies-Bouldin indexes. The external validation indexes indicated that DBSCAN produces the best results, supporting the fact that density clustering is well-suited for LiDAR segmentation.
Download

Paper Nr: 64
Title:

Towards Explainability in Using Deep Learning for Face Detection in Paintings

Authors:

Siwar Bengamra, Olfa Mzoughi, André Bigand and Ezzeddine Zagrouba

Abstract: Explainable Artificial Intelligence (XAI) is an active research area to interpret a neural network’s decision by ensuring transparency and trust in the task-specified learned models. In fact, despite the great success of deep learning networks in many fields, their adoption by practitioners presents some limits, one significant of them is the complex nature of these networks which prevents human comprehension of the decision-making process. This is especially the case in artworks analysis. To address this issue, we explore Detector Randomized Input Sampling for Explanation (DRISE), a visualization method for explainable artificial intelligence to comprehend and improve CNN-based face detector on Tenebrism painting images. The results obtained show local explanations for model’s prediction and consequently offer insights into the model’s decision-making. This paper will be of great help to researchers as a future support for explainability of object detection in other domain application.
Download

Paper Nr: 68
Title:

Point to Segment Distance DTW for Online Handwriting Signals Matching

Authors:

Elmokhtar Mohamed Moussa, Thibault Lelore and Harold Mouchère

Abstract: In this paper, we propose DTWseg, a modified DTW algorithm based on a point-to-segment distance instead of the euclidean point-to-point distance. Applying DTWseg to online handwriting matching proves to be advantageous compared to other algorithms as it is less sensitive to differences between signals sampling rates occurring due to acquisition frequencies or handwriting speed. It eliminates the need for a commonly practiced resampling that omits an important dynamic part of the ductus. Experiments on IRONOFF french words dataset and FLOWCHARTS dataset show DTWseg to be least impacted by sampling rate alterations. We also propose a new benchmark of state-of-the-art methods on offline handwriting to online conversion based on our new proposed metric.
Download

Paper Nr: 76
Title:

Rethinking Image-Based Table Recognition Using Weakly Supervised Methods

Authors:

Nam T. Ly, Atsuhiro Takasu, Phuc Nguyen and Hideaki Takeda

Abstract: Most of the previous methods for table recognition rely on training datasets containing many richly annotated table images. Detailed table image annotation, e.g., cell or text bounding box annotation, however, is costly and often subjective. In this paper, we propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or LaTeX) code-level annotations of table images. The proposed model consists of three main parts: an encoder for feature extraction, a structure decoder for generating table structure, and a cell decoder for predicting the content of each cell in the table. Our system is trained end-to-end by stochastic gradient descent algorithms, requiring only table images and their ground-truth HTML (or LaTeX) representations. To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available image-based table recognition dataset built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, and 640k French table images with corresponding HTML representation and cell bounding boxes. The extensive experiments on WikiTableSet and two large-scale datasets: FinTabNet and PubTabNet demonstrate that the proposed weakly supervised model achieves better, or similar accuracies compared to the state-of-the-art models on all benchmark datasets.
Download

Paper Nr: 79
Title:

Facial Paralysis Recognition Using Face Mesh-Based Learning

Authors:

Zeerak M. Baig and Dustin van der Haar

Abstract: Facial paralysis is a medical disorder caused by a compressed or enlarged seventh cranial nerve. The facial muscles become weak or paralysed because of the compression. Many medical experts believe that viral infection is the most common cause of facial paralysis; however, the origin of nerve injury is unknown. Facial paralysis hampers a patient’s ability to blink, swallow, or communicate. This article proposes deep learningbased and traditional machine learning-based approaches for facial paralysis recognition in facial images, which can aid in developing standardised medical evaluation tools. The proposed method first detects faces or faces in each image, then extracts a face mesh from the given image using Google’s Mediapipe. The face mesh descriptors are then transformed into a novel face mesh image, fed into the final component, comprised of a convolutional neural network (CNN) to perform overall predictions. The study uses YouTube facial paralysis datasets (Youtube and Stroke face) and control datasets (CK+ and TUFTS face) to train and test the model for unhealthy patients. The best approach achieved an accuracy of 98.93% with a MobilenetV2 backbone using the YouTube facial paralysis dataset and the Stroke face dataset for palsy images, thereby showing mesh learning can be accomplished using a CNN.
Download

Paper Nr: 80
Title:

Impact of Transformer-Based Models and User Clustering in Early Fake News Detection in Social Media

Authors:

Sakshi Kalra, Yashvardhan Sharma, Mehul Agrawal, Sai K. Mantri and Gajendra S. Chauhan

Abstract: People are now consuming news on social media platforms rather than through traditional sources as a result of easy access to the internet. This has allowed for the recent rise in the online dissemination of false information. The spread of false information seriously damages people’s reputations and the public’s trust in them. The research community has recently given fake news identification a great deal of attention, and prior studies have mainly concentrated on finding hints in news content or diffusion graphs. The older models, on the other hand, didn’t have the key features needed to spot fake news quickly. We focus on finding fake news by using features that are available when it is just starting to spread. The current work suggests a new framework made up of content-based features taken from news articles and social-context features taken from user characteristics and responses at the sentence level. In addition, we extend our approach to Transformer-based models and leverage user clustering to demonstrate a considerable performance gain over the original model.
Download

Paper Nr: 99
Title:

Sewer-AI: Sustainable Automated Analysis of Real-World Sewer Videos Using DNNs

Authors:

Rajarshi Biswas, Marcel Mutz, Piyush Pimplikar, Noor Ahmed and Dirk Werth

Abstract: Automated maintenance of sewer networks using computer vision techniques has gained prominence in the vision-research community. In this work, we handle sewer inspection videos with severe challenges. These obstacles hinder direct application of state-of-the-art neural networks in finding a solution. Thus, we perform an exhaustive study on the performance of highly successful neural architectures on our challenging sewer-video-dataset. For complete understanding we analyze their performance in different modes. We propose training strategies for effectively handling the different challenges and obtain balanced accuracy, F1 and F2 scores of more than 90% for 17 out of the 25 defect categories. Furthermore, for developing resource efficient, sustainable versions of the models we study the trade-off between performance and parameter pruning. We show that the drop in average performance of the networks is within 1% with more than 90% weight pruning. We test our models on the state-of-the-art Sewer-ML-dataset and obtained 100% true positive rate for 8 out of 18 defect categories in the Sewer-ML-dataset.
Download

Paper Nr: 123
Title:

Instance Segmentation Based Graph Extraction for Handwritten Circuit Diagram Images

Authors:

Johannes Bayer, Amit K. Roy and Andreas Dengel

Abstract: Handwritten circuit diagrams from educational scenarios or historic sources usually exist on analogue media. For deriving their functional principles or flaws automatically, they need to be digitized, extracting their electrical graph. Recently, the base technologies for automated pipelines facilitating this process shifted from computer vision to machine learning. This paper describes an approach for extracting both the electrical components (including their terminals and describing texts) as well their interconnections (including junctions and wire hops) by the means of instance segmentation and keypoint extraction. Consequently, the resulting graph extraction process consists of a simple two-step process of model inference and trivial geometric keypoint matching. The dataset itself, its preparation, model training and post-processing are described and publicly available.
Download

Paper Nr: 127
Title:

Towards a Neuro-Symbolic Framework for Multimodal Human-AI Interaction

Authors:

Anthony J. Scotte and Varuna De Silva

Abstract: Humans are not defined by a single means of communication: language, style, expression, body posture, emotion and attitude all contribute to the mix that makes communicating challenging to understand. As humans seek to strengthen partnerships with computers, these communication complexities need to be both understood and overcome. This paper seeks to explore a framework that combines neural networks with decision theory and probabilistic logic to tackle the complexities inherent within conversational communication. In combining the strength of each of these capabilities through a proof of concept, this paper demonstrates a potential framework of how different AI models may deepen its understanding of human conversation.
Download

Paper Nr: 132
Title:

Classification of Respiratory Diseases Using the NAO Robot

Authors:

Rafael Andrade Rodriguez, Jireh Ferroa-Guzman and Willy Ugarte

Abstract: This work proposes an interface that connects the NAO robot with a development environment in Azure Machine Learning Classic for the prediction of respiratory diseases. The developed code uses Machine Learning algorithms trained for the prediction of diseases and fatal symptoms in order to provide the user with a scope of his health status and the possible conditions associated with his age, sex, symptoms and severity. During this process, a brief discard of COVID-19 is made with the symptoms obtained, which indicates if they correspond to those of this disease. Additionally, we offer a friendly interaction with the NAO robot to facilitate the exchange of information and, at the end of the algorithm flow, it is always suggested to use a professional doctor to provide users with more details about their current status based on the overall results obtained. The tests carried out on the work show that it is possible to speed up the time of care in medical care centers in Peru through the Nao Robot. Additionally, it has been possible to predict respiratory diseases, which also helps the doctor to have a notion of the patient prognosis.
Download

Paper Nr: 134
Title:

Speech to Text Recognition for Videogame Controlling with Convolutional Neural Networks

Authors:

Joaquin Aguirre-Peralta, Marek Rivas-Zavala and Willy Ugarte

Abstract: Disability in people is a reality that has always been present throughout humanity and all nations of the planet are immersed in this reality. Being communication and interaction through technology much more important than ever, people with disabilities are the most affected by having a physical gap. There are still few tools that these people can use to interact more easily with different types of hardware, therefore, we want to provide them a playful and medical tool that can adapt to their needs and allow them to interact a little more with the people around them. From this context, we have decided to focus on people with motor disabilities of the upper limbs and based on this, we propose the use of gamification in the NLP (Natural Language Processing) area, developing a videogame consisting of three voice-operated minigames. This work has 4 stages: analysis (benchmarking), design, development and validation. In the first stage, we elaborated a benchmarking of the models. In the second stage, we describe the implementation of CNNs, together with methods such as gamification and NLP for problem solving. In the third stage, the corresponding mini-games which compose the videogame and its characteristics are described. Finally, in the last stage, the application of the videogame was validated with experts in physiotherapy. Our results show that with the training performed, the prediction of words with noise was improved from 43.49% to 74.50% and of words without noise from 63.87% to 96.36%
Download

Paper Nr: 156
Title:

FakeRevealer: A Multimodal Framework for Revealing the Falsity of Online Tweets Using Transformer-Based Architectures

Authors:

Sakshi Kalra, Yashvardhan Sharma, Priyansh Vyas and Gajendra S. Chauhan

Abstract: As the Internet has evolved, the exposure and widespread adoption of social media concepts have altered the way news is formed and published. With the help of social media, getting news is cheaper, faster, and easier. However, this has also led to an increase in the number of fake news articles, either by manipulating the text or morphing the images. The spread of fake news has become a serious issue all over the world. In one case, at least 20 people were killed just because of false information that was circulated over a social media platform. This makes it clear that social media sites need a system that uses more than one method to spot fake news stories. To solve this problem, we’ve come up with FakeRevealer, a single-configuration fake news detection system that works on transfer learning based techniques. Our multi-modal archutecture understands the textual features using a language transformer model called DistilRoBERTa and image features are extracted using the Vision Transformer (ViTs) that is pre-trained on ImageNet 21K. After feature extraction, a cosine similarity measure is used to fuse both the features. The evaluation of our proposed framework is done over publicly available twitter dataset and results shows that it outperforms current state-of-art on twitter dataset with an accuracy of 80.00% which is 2.23%more, that than the current state-of-art on twitter dataset.
Download

Paper Nr: 162
Title:

A Comparative Study of BRISK, ORB and DAISY Features for Breast Cancer Classification

Authors:

Ghada Ouddai, Ines Hamdi and Henda Ben Ghezala

Abstract: Medical data analysis is one of the most emergent fields over the past decades. In Digital histopathology, images are analysed, mainly, to detect disease or tumors and identify their types and grade. One of the most used practices in this field is the feature extraction. In this paper, we propose the application of BRISK, ORB and BRISK/DAISY on RGB histological images. The purpose of this work is to recognise the breast tumor type (benign or malignant). These features extractors are combined with BoF by kmeans and SVM. A limited amount of images is used during the training of the system. Out of the three methods, Color-BRISK/BoF/SVM solution gave the best accuracy value (72.5%) while Color-ORB/BoF/SVM was the fastest.
Download