ICPRAM 2022 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 11
Title:

Improving Graph Classification by Means of Linear Combinations of Reduced Graphs

Authors:

Anthony Gillioz and Kaspar Riesen

Abstract: The development and research of graph-based matching techniques that are both computationally efficient and accurate is a pivotal task due to the rapid growth of data acquisition and the omnipresence of structural data. In the present paper, we propose a novel framework using information gained from diversely reduced graph spaces to improve the classification accuracy of a structural classifier. The basic idea consists of three subsequent steps. First, the original graphs are reduced to different size levels with the aid of node centrality measures. Second, we compute the distances between the reduced graphs in the corresponding graph subspaces. Finally, the distances are linearly combined and fed into a distance-based classifier to produce the final classification. On six graph datasets we empirically demonstrate that classifiers clearly benefit from the combined distances obtained in the graph subspaces.
Download

Paper Nr: 18
Title:

Neural Network-based Human Motion Smoother

Authors:

Mathias Bastholm, Stella Graßhof and Sami S. Brandt

Abstract: Recording real life human motion as a skinned mesh animation with an acceptable quality is usually difficult. Even though recent advances in pose estimation have enabled motion capture from off-the-shelf webcams, the low quality makes it infeasible for use in production quality animation. This work proposes to use recent advances in the prediction of human motion through neural networks to augment low quality human motion, in an effort to bridge the gap between cheap recording methods and high quality recording. First, a model, competitive with prior work in short-term human motion prediction, is constructed. Then, the model is trained to clean up motion from two low quality input sources, mimicking a real world scenario of recording human motion through two webcams. Experiments on simulated data show that the model is capable of significantly reducing noise, and it opens the way for future work to test the model on annotated data.
Download

Paper Nr: 35
Title:

Reduction of Variance-related Error through Ensembling: Deep Double Descent and Out-of-Distribution Generalization

Authors:

Pavlos Rath-Manakidis, Hlynur D. Hlynsson and Laurenz Wiskott

Abstract: Prediction variance on unseen data harms the generalization performance of deep neural network classifiers. We assess the utility of forming ensembles of deep neural networks in the context of double descent (DD) on image classification tasks to mitigate the effects of model variance. To that end, we propose a method for using geometric-mean based ensembling as an approximate bias-variance decomposition of a training procedure’s test error. In ensembling equivalent models we observe that ensemble formation is more beneficial the more the models are correlated with each other. Our results show that small models afford ensembles that outperform single large models while requiring considerably fewer parameters and computational steps. We offer an explanation for this phenomenon in terms of model-internal correlations. We also find that deep DD that depends on the existence of label noise can be mitigated by using ensembles of models subject to identical label noise almost as thoroughly as by ensembles of networks each trained subject to i.i.d. noise. In the context of data drift, we find that out-of-distribution performance of ensembles can be assessed by their in-distribution performance. This aids in ascertaining the utility of ensembling for generalization.
Download

Paper Nr: 49
Title:

Image Coding by Samples of Counts as an Imitation of the Light Detection by the Retina

Authors:

V. A. Kershner and V. E. Antsiperov

Abstract: The results of the study of a new method of image coding based on samples of counts are presented. The method is based on the concept of an ideal image, motivated by the mechanisms of light perception by the retina. In this regard, the article discusses general statistical issues of the interaction of radiation with matter and based on a semiclassical approach, formalizes the concepts of an ideal imaging device and an ideal image as a point Poisson 2D-process. At the centre of the discussion is the problem of reducing the dimension of an ideal image to a fixed (controlled) size representation by a sample of counts. The results of illustrative computational experiments on counting representation/coding of digital raster images are also presented.
Download

Paper Nr: 50
Title:

An Ensemble Learning Approach using Decision Fusion for the Recognition of Arabic Handwritten Characters

Authors:

Rihab Dhief, Rabaa Youssef and Amel Benazza

Abstract: The Arabic handwritten character recognition is a research challenge due to the complexity and variability of forms and writing styles of the Arabic alphabet. The current work focuses not only on reducing the complexity of the feature extraction step but also on improving the Arabic characters’ classification rate. First, we lighten the preprocessing step by using a grayscale skeletonization technique easily adjustable to image noise and contrast. It is then used to extract structural features such as Freeman chain code and Heutte descriptors. Second, a new model using the fusion of results from machine learning algorithms is built and tested on two grayscale images’ datasets: IFHCDB and AIA9K. The proposed approach is compared to state-of-the-art methods based on deep learning architecture and highlights a promising performance by achieving an accuracy of 97.97% and 92.91% respectively on IFHCDB and AIA9K datasets, which outperforms the classic machine learning algorithms and the deep neural network chosen architectures.
Download

Paper Nr: 52
Title:

A Step Towards Learning Contraction Kernels for Irregular Image Pyramid

Authors:

Darshan Batavia, Rocio Gonzalez-Diaz and Walter G. Kropatsch

Abstract: A structure preserving irregular image pyramid can be computed by applying basic graph operations (contraction and removal of edges) on the 4-adjacent neighbourhood graph of an image. In this paper, we derive an objective function that classifies the edges as contractible or removable for building an irregular graph pyramid. The objective function is based on the cost of the edges in the contraction kernel (sub-graph selected for contraction) together with the size of the contraction kernel. Based on the objective function, we also provide an algorithm that decomposes a 2D image into monotonically connected regions of the image surface, called slope regions. We proved that the proposed algorithm results in a graph-based irregular image pyramid that preserves the structure and the topology of the critical points (the local maxima, the local minima, and the saddles). Later we introduce the concept of the dictionary for the connected components of the contraction kernel, consisting of sub-graphs that can be combined together to form a set of contraction kernels. A favorable contraction kernel can be selected that best satisfies the objective function. Lastly, we show the experimental verification for the claims related to the objective function and the cost of the contraction kernel. The outcome of this paper can be envisioned as a step towards learning the contraction kernel for the construction of an irregular image pyramid.
Download

Paper Nr: 60
Title:

Generative Adversarial Examples for Sequential Text Recognition Models with Artistic Text Style

Authors:

Yanhong Liu, Fengming Cao and Yuqi Zhang

Abstract: The deep neural networks (DNNs) based sequential text recognition (STR) has made great progress in recent years. Although highly related to security issues, STR has been paid rare attention on its weakness and robustness. Most existing studies have generated adversarial examples for DNN models conducting non-sequential prediction tasks such as classification, segmentation, object detection etc. Recently, research efforts have shifted beyond the Lp norm-bounded attack and generated realistic adversarial examples with semantic meanings. We follow this trend and propose a general framework of generating novel adversarial text images for STR models, based on the technique of artistic text style transfer. Experimental results show that our crafted adversarial examples are highly stealthy and the attack success rates for fooling state-of-the-art STR models can achieve up to 100%. Our framework is flexible to create natural adversarial artistic text images with controllable stylistic degree to evaluate the robustness of STR models.
Download

Paper Nr: 61
Title:

The U-Net based GLOW for Optical-Flow-Free Video Interframe Generation

Authors:

Saem Park, Donghoon Han and Nojun Kwak

Abstract: Video frame interpolation is the task of creating an interframe between two adjacent frames along the time axis. So, instead of simply averaging two adjacent frames to create an intermediate image, this operation should maintain semantic continuity with the adjacent frames. Most conventional methods use optical flow, and various tools such as occlusion handling and object smoothing are indispensable. Since the use of these various tools leads to complex problems, we tried to tackle the video interframe generation problem without using problematic optical flow. To enable this, we have tried to use a deep neural network with an invertible structure, and developed an U-Net based Generative Flow which is a modified normalizing flow. In addition, we propose a learning method with a new consistency loss in the latent space to maintain semantic temporal consistency between frames. The resolution of the generated image is guaranteed to be identical to that of the original images by using an invertible network. Furthermore, as it is not a random image like the ones by generative models, our network guarantees stable outputs without flicker. Through experiments, we confirmed the feasibility of the proposed algorithm and would like to suggest the U-Net based Generative Flow as a new possibility for baseline in video frame interpolation. This paper is meaningful in that it is the new attempt to use invertible networks instead of optical flows for video interpolation.
Download

Paper Nr: 65
Title:

An Effective Deep Network for Head Pose Estimation without Keypoints

Authors:

Chien Thai, Viet Tran, Minh Bui, Huong Ninh and Hai Tran

Abstract: Human head pose estimation is an essential problem in facial analysis in recent years that has a lot of computer vision applications such as gaze estimation, virtual reality, driver assistance. Because of the importance of the head pose estimation problem, it is necessary to design a compact model to resolve this task in order to reduce the computational cost when deploying on facial analysis-based applications such as large camera surveillance systems, AI cameras while maintaining accuracy. In this work, we propose a lightweight model that effectively addresses the head pose estimation problem. Our approach has two main steps. 1) We first train many teacher models on the synthesis dataset - 300W-LPA to get the head pose pseudo labels. 2) We design an architecture with the ResNet18 backbone and train our proposed model with the ensemble of these pseudo labels via the knowledge distillation process. To evaluate the effectiveness of our model, we use AFLW-2000 and BIWI - two real-world head pose datasets. Experimental results show that our proposed model significantly improves the accuracy in comparison with the state-of-the-art head pose estimation methods. Furthermore, our model has the real-time speed of ∼300 FPS when inferring on Tesla V100.
Download

Paper Nr: 71
Title:

Transformation-Equivariant Representation Learning with Barber-Agakov and InfoNCE Mutual Information Estimation

Authors:

Marshal A. Sinaga, T. Basarrudin and Adila A. Krisnadhi

Abstract: The success of deep learning on computer vision tasks is due to the convolution layer being equivariant to the translation. Several works attempt to extend the notion of equivariance into more general transformations. Autoencoding variational transformation (AVT) achieves state of art by approaching the problem from the information theory perspective. The model involves the computation of mutual information, which leads to a more general transformation-equivariant representation model. In this research, we investigate the alternatives of AVT called variational transformation-equivariant (VTE). We utilize the Barber-Agakov and information noise contrastive mutual information estimation to optimize VTE. Furthermore, we also propose a sequential mechanism that involves a self-supervised learning model called predictive-transformation to train our VTE. Results of experiments demonstrate that VTE outperforms AVT on image classification tasks.
Download

Paper Nr: 99
Title:

iRNN: Integer-only Recurrent Neural Network

Authors:

Eyyüb Sari, Vanessa Courville and Vahid Partovi Nia

Abstract: Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence, and attention. The interaction between these elements prevents running them on integer-only operations without a significant performance drop. Deploying RNNs that include layer normalization and attention on integer-only arithmetic is still an open problem. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear approximation of activations (PWL), to serve a wide range of RNNs on various applications. The proposed method is proven to work on RNNbased language models and challenging automatic speech recognition, enabling AI applications on the edge. Our iRNN maintains similar performance as its full-precision counterpart, their deployment on smartphones improves the runtime performance by 2×, and reduces the model size by 4×.
Download

Paper Nr: 101
Title:

Gesture Recognition on a New Multi-Modal Hand Gesture Dataset

Authors:

Monika Schak and Alexander Gepperth

Abstract: We present a new large-scale multi-modal dataset for free-hand gesture recognition. The freely available dataset consists of 79,881 sequences, grouped into six classes representing typical hand gestures in human-machine interaction. Each sample contains four independent modalities (arriving at different frequencies) recorded from two independent sensors: a fixed 3D camera for video, audio and 3D, and a wearable acceleration sensor attached to the wrist. The gesture classes are specifically chosen with investigations on multi-modal fusion in mind. For example, two gesture classes can be distinguished mainly by audio, while the four others are not exhibiting audio signals – besides white noise. An important point concerning this dataset is that it is recorded from a single person. While this reduces variability somewhat, it virtually eliminates the risk of incorrectly performed gestures, thus enhancing the quality of the data. By implementing a simple LSTM-based gesture classifier in a live system, we can demonstrate that generalization to other persons is nevertheless high. In addition, we show the validity and internal consistency of the data by training LSTM and DNN classifiers relying on a single modality to high precision.
Download

Paper Nr: 105
Title:

Survival Analysis Algorithms based on Decision Trees with Weighted Log-rank Criteria

Authors:

Iulii Vasilev, Mikhail Petrovskiy and Igor Mashechkin

Abstract: Survival Analysis is an important tool to predict time-to-event in many applications, including but not limited to medicine, insurance, manufacturing and others. The state-of-the-art statistical approach is based on Cox proportional hazards. Though, from a practical point of view, it has several important disadvantages, such as strong assumptions on proportional over time hazard functions and linear relationship between time independent covariates and the log hazard. Another technical issue is an inability to deal with missing data directly. To overcome these disadvantages machine learning survival models based on recursive partitioning approach have been developed recently. In this paper, we propose a new survival decision tree model that uses weighted log-rank split criteria. Unlike traditional log-rank criteria the weighted ones allow to give different priority to events with different time stamps. It works with missing data directly while searching the best splitting point, its size is controlled by p-value threshold with Bonferroni adjustment and quantile based discretization is used to decrease the number of potential candidates for splitting points. Also, we investigate how to improve the accuracy of the model with bagging ensemble of the proposed decision tree models. We introduce an experimental comparison of the proposed methods against Cox proportional risk regression and existing tree-based survival models and their ensembles. According to the obtained experimental results, the proposed methods show better performance on several benchmark public medical datasets in terms of Concordance index and Integrated Brier Score metrics.
Download

Paper Nr: 107
Title:

EuclidNets: Combining Hardware and Architecture Design for Efficient Training and Inference

Authors:

Mariana O. Prazeres, Xinlin Li, Adam Oberman and Vahid P. Nia

Abstract: In order to deploy deep neural networks on edge devices, compressed (resource efficient) networks need to be developed. While established compression methods, such as quantization, pruning, and architecture search are designed for conventional hardware, further gains are possible if compressed architectures are coupled with novel hardware designs. In this work, we propose EuclidNet, a compressed network designed to be implemented on hardware which replaces multiplication, wx, with squared difference (x − w)2. EuclidNet allows for a low precision hardware implementation which is about twice as efficient (in term of logic gate counts) as the comparable conventional hardware, with acceptably small loss of accuracy. Moreover, the network can be trained and quantized using standard methods, without requiring additional training time. Codes and pre-trained models are available.
Download

Short Papers
Paper Nr: 16
Title:

Channel Selection for Motor Imagery Task Classification using Non-linear Separability Measurement

Authors:

Stuti Chug and Vandana Agarwal

Abstract: The EEG based motor imagery task classification requires only those channels which contribute to the maximum separability of the training data of different classes. The irrelevant channels are therefore not considered in the formation of feature vectors used in classification. In this paper, we propose a novel algorithm for efficient channel selection (NLMCS). The algorithm computes the proposed metric λ for non-linearity measurement (NLM) and uses this for channel selection. The algorithm is validated on the benchmarked BCI competition IV datasets IIa and IIb. The selected channels are then used for extracting Haar wavelet features and subjected for classification using Support vector Machine. The minimum value of λ corresponds to the optimal channel selection resulting in the best accuracy of motor imagery task classification. The mean Kappa coefficient computed for BCI competition IV IIa dataset using the proposed algorithm is 0.65 and it outperforms some existing approaches.
Download

Paper Nr: 17
Title:

Single-step Adversarial Training for Semantic Segmentation

Authors:

Daniel Wiens and Barbara Hammer

Abstract: Even though deep neural networks succeed on many different tasks including semantic segmentation, they lack on robustness against adversarial examples. To counteract this exploit, often adversarial training is used. However, it is known that adversarial training with weak adversarial attacks (e.g. using the Fast Gradient Sign Method) does not improve the robustness against stronger attacks. Recent research shows that it is possible to increase the robustness of such single-step methods by choosing an appropriate step size during the training. Finding such a step size, without increasing the computational effort of single-step adversarial training, is still an open challenge. In this work we address the computationally particularly demanding task of semantic segmentation and propose a new step size control algorithm that increases the robustness of single-step adversarial training. The proposed algorithm does not increase the computational effort of single-step adversarial training considerably and also simplifies training, because it is free of meta-parameter. We show that the robustness of our approach can compete with multi-step adversarial training on two popular benchmarks for semantic segmentation.
Download

Paper Nr: 20
Title:

A k-Means Algorithm for Clustering with Soft Must-link and Cannot-link Constraints

Authors:

Philipp Baumann and Dorit S. Hochbaum

Abstract: The k-means algorithm is one of the most widely-used algorithms in clustering. It is known to be effective when the clusters are homogeneous and well separated in the feature space. When this is not the case, incorporating pairwise must-link and cannot-link constraints can improve the quality of the resulting clusters. Various extensions of the k-means algorithm have been proposed that incorporate the must-link and cannot-link constraints using heuristics. We introduce a different approach that uses a new mixed-integer programming formulation. In our approach, the pairwise constraints are incorporated as soft-constraints that can be violated subject to a penalty. In a computational study based on 25 data sets, we compare the proposed algorithm to a state-of-the-art algorithm that was previously shown to dominate the other algorithms in this area. The results demonstrate that the proposed algorithm provides better clusterings and requires considerably less running time than the state-of-the-art algorithm. Moreover, we found that the ability to vary the penalty is beneficial in situations where the pairwise constraints are noisy due to corrupt ground truth.
Download

Paper Nr: 21
Title:

On Learning-free Detection and Representation of Textline Texture in Digitized Documents

Authors:

Dominik Hauser, Christoffer Kassens and H. Siegfried Stiehl

Abstract: Textline detection and extraction is an integral part of any document analysis and recognition (DAR) system bridging the signal2symbol gap in order to relate a raw digital document of whatever sort to the computational analysis up to understanding of its semantic content. Key is the computational recovery of a rich representation of the salient visual structure which we conceive texture composed of periodic and differently scaled textlines in blocks with varying local spatial frequency and orientation. Our novel learning-free approach capitalizes on i) a texture model based upon linear system theory and ii) the complex Gabor transform utilizing both real even and imaginary odd kernels for the purpose of imposing a quadrilinear representation of textline characteristics as in typography. The resulting representation of textlines, be they either linear, curvilinear or even circular, then serves as input to subsequent computational processes. Via an experimental methodology allowing for controlled experiments with a broad range of digital data of increasing complexity (e.g. from synthetic 1D data to historical newspapers up to medieval manuscripts), we demonstrate the validity of our approach, discuss success and failure, and propose ensuing research.
Download

Paper Nr: 26
Title:

A Deep Convolutional and Recurrent Approach for Large Vocabulary Arabic Word Recognition

Authors:

Faten Ziadi, Imen Ben Cheikh and Mohamed Jemni

Abstract: In this paper, we propose a convolutional recurrent approach for Arabic word recognition. We handle a large vocabulary of Arabic decomposable words, which are factored according to their roots and schemes. Exploiting derivational morphology, we have conceived as the first step a convolutional neural network, which classifies Arabic roots extracted from a set of word samples int the APTI database. In order to further exploit linguistic knowledge, we have accomplished the word recognition process through a recurrent network, especially LSTM. Thanks to its recurrence and memory cabability, the LSTM model focuses not only prefixes, infixes and suffixes listed in chronological order, but also on the relation between them in order to recognize word patterns and some flexional details such as, gender, number, tense, etc.
Download

Paper Nr: 28
Title:

Dynamic Latent Scale for GAN Inversion

Authors:

Jeongik Cho and Adam Krzyzak

Abstract: When the latent random variable of GAN is an i.i.d. random variable, the encoder trained with mean squared error loss to invert the generator does not converge because the generator loses the information of the latent random variable. In this paper, we introduce a dynamic latent scale GAN, a method for training a generator that does not lose the information of the latent random variable, and an encoder that inverts the generator. Dynamic latent scale GAN dynamically scales each element of the latent random variable during GAN training to adjust the entropy of the latent random variable. As training progresses, the entropy of the latent random variable decreases until the generator does not lose the information of the latent random variable, which enables the encoder trained with squared error loss to converge. The scale of the latent random variable is approximated by tracing the element-wise variance of the predicted latent random variable from previous training steps. Since the scale of latent random variable changes dynamically, the encoder should be trained with the generator during GAN training. The encoder can be integrated with the discriminator, and the loss for the encoder is added to the generator loss for fast training.
Download

Paper Nr: 32
Title:

1-Attempt 4-Cycle Parallel Thinning Algorithms

Authors:

Kálmán Palágyi and Gábor Németh

Abstract: Thinning is a frequently applied skeletonization technique. It is an iterative object-reduction in a topology-preserving way: the outmost layer of an object is deleted, and the entire process is repeated until stability is reached. In the case of an 1-attempt thinning algorithm, if a border pixel is not deleted in the very first time, it cannot be deleted in the remaining phases of the thinning process. This paper shows that two 4-cycle parallel 2D thinning algorithms (i.e., one subiteration-based and one subfield-based) are 1-attempt. In addition, we illustrate that both algorithms are considerably faster if we know that they fulfill the 1-attempt property.
Download

Paper Nr: 36
Title:

DenseHMM: Learning Hidden Markov Models by Learning Dense Representations

Authors:

Joachim Sicking, Maximilian Pintz, Maram Akila and Tim Wirtz

Abstract: We propose DenseHMM – a modification of Hidden Markov Models (HMMs) that allows to learn dense representations of both the hidden states and the (discrete) observables. Compared to the standard HMM, transition probabilities are not atomic but composed of these representations via kernelization. Our approach enables constraint-free and gradient-based optimization. We propose two optimization schemes that make use of this: a modification of the Baum-Welch algorithm and a direct co-occurrence optimization. The latter one is highly scalable and comes empirically without loss of performance compared to standard HMMs. We show that the non-linearity of the kernelization is crucial for the expressiveness of the representations. The properties of the DenseHMM like learned co-occurrences and log-likelihoods are studied empirically on synthetic and biomedical datasets.
Download

Paper Nr: 37
Title:

GAMS: Graph Augmentation with Module Swapping

Authors:

Alessandro Bicciato and Andrea Torsello

Abstract: Data augmentation is a widely adopted approach to solve the large-data requirements of modern deep learning techniques by generating new data instances from an existing dataset. While there is a huge literature and experience on augmentation for vectorial or image-based data, there is relatively little work on graph-based representations. This is largely due to complex, non-Euclidean structure of graphs, which limits our abilities to determine operations that do not modify the original semantic grouping. In this paper, we propose an alternative method for enlarging the graph set of graph neural network datasets by creating new graphs and keeping the properties of the originals. The proposal starts from the assumptions that the graphs compose a set of smaller motifs into larger structures. To this end, we extract modules by grouping nodes in an unsupervised way, and then swap similar modules between different graphs reconstructing the missing connectivity based on the original edge statistics and node similarity. We then test the performance of the proposed augmentation approach against state-of-the-art approaches, showing that on datasets, where the information is dominated by structure rather than node labels, we obtain a significant improvement with respect to alternatives.
Download

Paper Nr: 39
Title:

Relative Position φ-Descriptor Computation for Complex Polygonal Objects

Authors:

Tyler Laforet and Pascal Matsakis

Abstract: In regular conversation, one often refers to the spatial relationships between objects via their positions relative to each other. Relative Position Descriptors (RPDs) are a type of image descriptor tuned to extract these spatial relationships from pairs of objects. Of the existing RPDs, the φ-descriptor covers the widest variety of spatial relationships. Currently, algorithms exist for its computation in the case of both 2D raster and vector objects. However, the algorithm for 2D vector calculation can only handle pairs of simple polygons and lacks some key features, including support for objects with disjoint parts/holes, shared polygon vertices/edges, and various spatial relationships. This paper presents an approach for complex polygonal object φ-descriptor computation, built upon the previous. The new algorithm utilizes the analysis of object boundaries, polygon edges that represent changes in spatial relationships, and brings it more in-line with the 2D raster approach.
Download

Paper Nr: 46
Title:

Subfield-based Parallel Kernel-thinning Algorithms on the BCC Grid

Authors:

Gábor Karai, Péter Kardos and Kálmán Palágyi

Abstract: Kernel-thinning is a widely used technique for extracting the topological kernel from a digital object (i.e., producing a minimal structure that is topologically equivalent to the original elongated object). In this paper, two subfield-based parallel kernel-thinning algorithms acting on the non-standard body centered cubic (BCC) grid are presented. Our algorithms combine a sufficient condition for topology preservation with two types of partitionings of the BCC grid, thus both algorithms are topology-preserving. According to our best knowledge, the reported algorithms are the very first parallel thinning algorithms on the BCC grid.
Download

Paper Nr: 48
Title:

Image-set based Classification using Multiple Pseudo-whitened Mutual Subspace Method

Authors:

Osamu Yamaguchi and Kazuhiro Fukui

Abstract: This paper proposes a new image-set-based classification method, called Multiple Pseudo-Whitened Mutual Subspace Method (MPWMSM), constructed under multiple pseudo-whitening. Further, it proposes to combine this method with Convolutional Neural Network (CNN) features to perform higher discriminative performance. MPWMSM is a type of subspace representation-based method like the mutual subspace method (MSM). In these methods, an image set is compactly represented by a subspace in high dimensional vector space, and the similarity between two image sets is calculated by using the canonical angles between two corresponding class subspaces. The key idea of MPWMSM is twofold. The first is to conduct multiple different whitening transformations of class subspaces in parallel as a natural extension of the whitened mutual subspace method (WMSM). The second is to discard a part of a sum space of class subspaces in forming the whitening transformation to increase the classification ability and the robustness against noise. We demonstrate the effectiveness of our method on tasks of 3D object classification using multi-view images and hand-gesture recognition and further verify the validity of the combination with CNN features through the Youtube Face dataset (YTF) recognition experiment.
Download

Paper Nr: 69
Title:

An Approach for Parameters Evaluation in Layered Structural Materials based on DFT Analysis of Ultrasonic Signals

Authors:

Aleksandrs Sisojevs, Alexey Tatarinov, Mihails Kovalovs, Olga Krutikova and Anastasija Chaplinska

Abstract: An adequate assessment of the condition of versatile structural materials of different origin, from hard biological tissues (cortical bone) to objects of engineering infrastructure facilities (concrete), may encounter difficulties due to their complex and multilayer structure. Traditional ultrasonic testing based on the measurement of single parameters do not allow separating the complex influences of acting factors. Thus, the diagnosis of osteoporosis is complicated by the adverse influence of the thickness of the layer of soft tissue covering bone, when assessing the porosity of the bone. In the evaluation of deterioration processes in concrete, it is important to discriminate the depth of the deteriorated surface layer of concrete and the degree of the material degradation in this layer. The evaluation approach implementing the methods of pattern recognition has been proposed. The initial data set comprised ultrasonic signals obtained at different frequencies in specimens with different values of the parametrs according to a planned grid of the parameters of ineterest. The signals were obtained by surface profiling of the specimens by a pair of emitting and receiving transducers. In this study, an approach to evaluate parameters of interest using pattern recognition methods applied to ultrasonic signals processed by the Digital Fourier Transform was verified. The estimation model was based on the statistical analysis of the magnitude of the spectrum of the original ultrasonic signals. Decision rules were created based on the testing of a number of specimens forming the training set and calculation of the statistical criteria. Comparative testing of examination specimens demonstrated the adequacy of the proposed method as a potentially universal approach for evaluation of different kind of objects.
Download

Paper Nr: 84
Title:

Instance Selection on CNNs for Alzheimer’s Disease Classification from MRI

Authors:

J. A. Castro-Silva, M. N. Moreno-García, Lorena Guachi-Guachi and D. H. Peluffo-Ordóñez

Abstract: The selection of more informative instances from a dataset is an important preprocessing step that can be applied in many classification tasks. Since databases are becoming increasingly large, instance selection techniques have been used to reduce the data to a manageable size. Besides, the use of test data in any part of the training process, called data leakage, can produce a biased evaluation of classification algorithms. In this context, this work introduces an instance selection methodology to avoid data leakage using an early subject, volume, and slice dataset split, and a novel percentile-position-analysis method to identify the regions with the most informative instances. The proposed methodology includes four stages. First, 3D magnetic resonance images are prepared to extract 2D slices of all subjects and only one volume per subject. Second, the extracted 2D slices are evaluated in a percentile distribution fashion in order to select the most insightful 2D instances. Third, image preprocessing techniques are used to suppress noisy data, preserving semantic information in the image. Finally, the selected instances are used to generate the training, validation and test datasets. Preliminary tests are carried out referring to the OASIS-3 dataset to demonstrate the impact of the number of slices per subject, the preprocessing techniques, and the instance selection method on the overall performance of CNN-based classification models such as DenseNet121 and EfficientNetB0. The proposed methodology achieved a competitive overall accuracy at a slice level of about 77.01% in comparison to 76.94% reported by benchmark- and-recent works conducting experiments on the same dataset and focusing on instance selection approaches.
Download

Paper Nr: 93
Title:

Learning Cross-modal Representations with Multi-relations for Image Captioning

Authors:

Peng Cheng, Tung Le, Teeradaj Racharak, Cao Yiming, Kong Weikun and Minh L. Nguyen

Abstract: Image captioning is a cross-domain study that generates image description sentences based on a given image. Recently, (Li et al., 2020b) shows that concatenating sentences, object tags, and region features as a unified representation enables to overcome state-of-the-art works in different vision-and-language-related tasks. Such results have inspired us to investigate and propose two new learning methods that exploit the relation representation in the model and improve the model’s generation results in this paper. To the best of our knowledge, we are the first that exploit both relations extracted from text and images for image captioning. Our idea is motivated by the phenomenon that humans can correct other people’s descriptions by knowing the relationship between objects in an image while observing the same image. We conduct experiments based on the MS COCO dataset (Lin et al., 2014) and show that our method can yield the higher SPICE score than the baseline.
Download

Paper Nr: 94
Title:

Generative Model for Autoencoders Learning by Image Sampling Representations

Authors:

V. E. Antsiperov

Abstract: The article substantiates a generative model for autoencoders, learning by the input image representation based on a sample of random counts. This representation is used instead of the ideal image model, which usually involves too cumbersome descriptions of the source data. So, the reduction of the ideal image concept to sampling representations of fixed (controlled) size is one of the main goals of the article. It is shown that the corresponding statistical description of the sampling representation can be factorized into the product of the distributions of individual counts, which fits well into the naive Bayesian approach and some other machine learning procedures. Guided by that association the analogue of the well-known EM algorithm – the iterative partition–maximization procedure for generative autoencoders is synthesized. So, the second main goal of the article is to substantiate the partition–maximization procedure basing on the relation between autoencoder image restoration criteria and statistical maximum likelihood parameters estimation. We succeed this by modelling the input count probability distribution by the parameterized mixtures, considering the hidden mixture variables as autoencoder’s internal (coding) data.
Download

Paper Nr: 10
Title:

Automatic Characteristic Line Drawing Generation using Pix2pix

Authors:

Kazuki Yanagida, Keiji Gyohten, Hidehiro Ohki and Toshiya Takami

Abstract: A technology known as pix2pix has made it possible to automatically color line drawings. However, its accuracy is based on the quality of the characteristic lines, which emphasize the characteristics of the subject drawn in the line drawing. In this study, we propose a method for automatically generating characteristic lines in line drawings. The proposed method uses pix2pix to learn the relationship between the contour line drawing and line drawing with characteristic lines. The obtained model can automatically generate a line drawing with the characteristic lines from the contour line drawing. In addition, the quality of the characteristic lines could be adjusted by adding various degrees of blurring to the training images. In our experiments, we qualitatively evaluated the line drawings of shoes generated using the proposed method. We also applied an existing automatic coloring method using pix2pix to line drawings generated using the proposed method and confirmed that the desired colored line drawing could be obtained.
Download

Paper Nr: 14
Title:

Noise in Datasets: What Are the Impacts on Classification Performance?

Authors:

Rashida Hasan and Cheehung H. Chu

Abstract: Classification is one of the fundamental tasks in machine learning. The quality of data is important in constructing any machine learning model with good prediction performance. Real-world data often suffer from noise which is usually referred to as errors, irregularities, and corruptions in a dataset. However, we have no control over the quality of data used in classification tasks. The presence of noise in a dataset poses three major negative consequences, viz. (i) a decrease in the classification accuracy (ii) an increase in the complexity of the induced classifier (iii) an increase in the training time. Therefore, it is important to systematically explore the effects of noise in classification performance. Even though there have been published studies on the effect of noise either for some particular learner or for some particular noise type, there is a lack of study where the impact of different noise on different learners has been investigated. In this work, we focus on both scenarios: various learners and various noise types and provide a detailed analysis of their effects on the prediction performance. We use five different classifiers (J48, Naive Bayes, Support Vector Machine, k-Nearest Neighbor, Random Forest) and 10 benchmark datasets from the UCI machine learning repository and three publicly available image datasets. Our results can be used to guide the development of noise handling mechanisms.
Download

Paper Nr: 19
Title:

Taking Advantage of Typical Testor Algorithms for Computing Non-reducible Descriptors

Authors:

Manuel S. Lazo-Cortés, José F. Martínez-Trinidad, J. A. Carrasco-Ochoa, Ventzeslav Valev, Mohammad A. Shamshiri and Adam Krzyżak

Abstract: The concepts of non-reducible descriptor (NRD) and typical testor (TT) have been used for solving quite different pattern recognition problems, the former related to feature selection problems and the latter related to supervised classification. Both TT and NRD concepts are based on the idea of discriminating objects belonging to different classes. In this paper, we theoretically examine the connection between these two concepts. Then, as an example of the usefulness of our study, we present how the algorithms for computing typical testors can be used for computing non-reducible descriptors. We also discuss several future research directions motivated by this work.
Download

Paper Nr: 38
Title:

Automatic Identification of Non-biting Midges (Chironomidae) using Object Detection and Deep Learning Techniques

Authors:

Jack Hollister, Rodrigo Vega and M. A. Hannan Bin Azhar

Abstract: This paper introduces an automated method for the identification of chironomid larvae mounted on microscope slides in the form of a computer-based identification tool using deep learning techniques. Using images of chironomid head capsules, a series of object detection models were created to classify three genera. These models were then used to show how pre-training preparation could improve the final performance. The model comparisons included two object detection frameworks (Faster-RCNN and SSD frameworks), three balanced image sets (with and without augmentation) and variations of two hyperparameter values (Learning Rate and Intersection Over Union). All models were reported using mean average precision or mAP. Multiple runs of each model configuration were carried out to assess statistical significance of the results. The highest mAP value achieved was 0.751 by Faster-RCNN. Statistical analysis revealed significant differences in mAP values between the two frameworks. When experimenting with hyperparameter values, the combination of learning rates and model architectures showed significant relationships. Although all models produced similar accuracy results (94.4% - 97.8%), the confidence scores varied widely.
Download

Paper Nr: 42
Title:

Computing the Variations of Edit Distance for Rooted Labaled Caterpillars

Authors:

Manami Hagihara, Takuya Yoshino and Kouich Hirata

Abstract: In this paper, we pay our attention to top-down distance, LCA-preserving distance and bottom-up distance for rooted labeled caterpillars (caterpillars, for short), as the variations of the edit distance. Here, the top-down distance is the edit distance that the deletion and the insertion are allowed to just leaves, the LCA-preserving distance is one to just either leaves or vertices with one child and the bottom-up distance is one to just the root. Then, we show that the top-down and the bottom-up distances for caterpillars can be computed in O(n) time and the LCA-preserving distance for caterpillars in O(n2) time. Furthermore, we give experimental results of computing these variations for caterpillars in real data.
Download

Paper Nr: 43
Title:

Caterpillar Inclusion: Inclusion Problem for Rooted Labeled Caterpillars

Authors:

Tomoya Miyazaki, Manami Hagihara and Kouich Hirata

Abstract: In this paper, we investigate an inclusion problem for rooted labeled caterpillars (resp., caterpillars, for short), which we call a caterpillar inclusion. The caterpillar inclusion is to determine whether or not a text caterpillar T achieves to a pattern caterpillar P by deleting vertices in T. Then, we design the algorithm of the caterpillar inclusion for P and T in O((h + H)σ ) time, where h is the height of P, H is the height of T and σ is the number of labels occurring in P and T. Also we give experimental results for the algorithm by using real data for caterpillars.
Download

Paper Nr: 77
Title:

Improving Usual Naive Bayes Classifier Performances with Neural Naive Bayes based Models

Authors:

Elie Azeraf, Emmanuel Monfrini and Wojciech Pieczynski

Abstract: Naive Bayes is a popular probabilistic model appreciated for its simplicity and interpretability. However, the usual form of the related classifier suffers from two significant problems. First, as caring about the observations’ law, it cannot consider complex features. Moreover, it considers the conditional independence of the observations given the hidden variable. This paper introduces the original Neural Naive Bayes, modeling the classifier’s parameters induced from the Naive Bayes with neural network functions. This method allows for correcting the first default. We also introduce new Neural Pooled Markov Chain models, alleviating the conditional independence assumption. We empirically study the benefits of these models for Sentiment Analysis, dividing the error rate of the usual classifier by 4.5 on the IMDB dataset with the FastText embedding, and achieving an equivalent F1 as RoBERTa on TweetEval emotion dataset, while being more than a thousand times faster for inference.
Download

Paper Nr: 83
Title:

Towards an Ensemble Approach for Sensor Data Sensemaking

Authors:

Athanasios Tsitsipas

Abstract: In a world of uncertainty and incompleteness, one must “make sense” of found observations. Cyber-physical systems output large quantities of data, opening massive opportunities and challenges for scalable techniques to gain exciting insights. One intriguing challenge is the process of Sensor Data Sensemaking. The research presents an approach to handle this process by bringing together the strands of data and knowledge in a single architecture in an interpretable and expressive way. Differently from other works, the use of interpretable patterns from streaming data is in the spotlight. In addition, background knowledge over these patterns gasps the intention to give meaning to these patterns with several possible explanations. A hybrid implementation realises the approach following big data processing models.
Download

Paper Nr: 89
Title:

The Influence of Labeling Techniques in Classifying Human Manipulation Movement of Different Speed

Authors:

Sadique A. Siddiqui, Lisa Gutzeit and Frank Kirchner

Abstract: Human action recognition aims to understand and identify different human behaviors and designate appropriate labels for each movement’s action. In this work, we investigate the influence of labeling methods on the classification of human movements on data recorded using a marker-based motion capture system. The dataset is labeled using two different approaches, one based on video data of the movements, the other based on the movement trajectories recorded using the motion capture system. The data was recorded from one participant performing a stacking scenario comprising simple arm movements at three different speeds (slow, normal, fast). Machine learning algorithms that include k-Nearest Neighbor, Random Forest, Extreme Gradient Boosting classifier, Convolutional Neural networks (CNN), Long Short-Term Memory networks (LSTM), and a combination of CNN-LSTM networks are compared on their performance in recognition of these arm movements. The models were trained on actions performed on slow and normal speed movements segments and generalized on actions consisting of fast-paced human movement. It was observed that all the models trained on normal-paced data labeled using trajectories have almost 20% improvement in accuracy on test data in comparison to the models trained on data labeled using videos of the performed experiments.
Download

Paper Nr: 100
Title:

A Step Towards the Explainability of Microarray Data for Cancer Diagnosis with Machine Learning Techniques

Authors:

Adara R. Nogueira, Artur J. Ferreira and Mário T. Figueiredo

Abstract: Detecting diseases, such as cancer, from from gene expression data has assumed great importance and is a very active area of research. Today, many gene expression datasets are publicly available, which consist of microarray data with information on the activation (or not) of thousands of genes, in sets of patients that have (or not) a certain disease. These datasets consist of high-dimensional feature vectors (very large numbers of genes), which raises difficulties for human analysis and interpretation with the goal of identifying the most relevant genes for detecting the presence of a particular disease. In this paper, we propose to take a step towards the explainability of these disease detection methods, by applying feature discretization and feature selection techniques. We accurately classify microarray data, while substantially reducing and identifying subsets of relevant genes. These small subsets of genes are thus easier to interpret by human experts, thus potentially providing valuable information about which genes are involved in a given disease.
Download

Paper Nr: 102
Title:

Boosting the Performance of Deep Approaches through Fusion with Handcrafted Features

Authors:

Dimitrios Koutrintzes, Eirini Mathe and Evaggelos Spyrou

Abstract: Contemporary human activity recognition approaches are heavily based on deep neural network architectures, since the latter do not require neither significant domain knowledge, nor complex algorithms for feature extraction, while they are able to demonstrate strong performance. Therefore, handcrafted features are nowadays rarely used. In this paper we demonstrate that these features are able to learn complementary representations of input data and are able to boost the performance of deep approaches, i.e., when both deep and handcrafted features are fused. To this goal, we choose an existing set of handcrafted features, extracted from 3D skeletal joints. We compare its performance with two approaches. The first one is based on a visual representation of skeletal data, while the second is a rank pooling approach on raw RGB data. We show that when fusing both types of features, the overall performance is significantly increased. We evaluate our approach using a publicly available, challenging dataset of human activities.
Download

Area 2 - Applications

Full Papers
Paper Nr: 24
Title:

Batch Constrained Bayesian Optimization for Ultrasonic Wire Bonding Feed-forward Control Design

Authors:

Michael Hesse, Matthias Hunstig, Julia Timmermann and Ansgar Trächtler

Abstract: Ultrasonic wire bonding is a solid-state joining process used to form electrical interconnections in micro and power electronics and batteries. A high frequency oscillation causes a metallurgical bond deformation in the contact area. Due to the numerous physical influencing factors, it is very difficult to accurately capture this process in a model. Therefore, our goal is to determine a suitable feed-forward control strategy for the bonding process even without detailed model knowledge. We propose the use of batch constrained Bayesian optimization for the control design. Hence, Bayesian optimization is precisely adapted to the application of bonding: the constraint is used to check one quality feature of the process and the use of batches leads to more efficient experiments. Our approach is suitable to determine a feed-forward control for the bonding process that provides very high quality bonds without using a physical model. We also show that the quality of the Bayesian optimization based control outperforms random search as well as manual search by a user. Using a simple prior knowledge model derived from data further improves the quality of the connection. The Bayesian optimization approach offers the possibility to perform a sensitivity analysis of the control parameters, which allows to evaluate the influence of each control parameter on the bond quality. In summary, Bayesian optimization applied to the bonding process provides an excellent opportunity to develop a feed-forward control without full modeling of the underlying physical processes.
Download

Paper Nr: 27
Title:

TaylorMade Visual Burr Detection for High-mix Low-volume Production of Non-convex Cylindrical Metal Objects

Authors:

Tashiro Kyosuke, Takeda Koji, Aoki Shogo, Ye Haoming, Hiroki Tomoe and Tanaka Kanji

Abstract: Visual defect detection (VDD) for high-mix low-volume production of non-convex metal objects, such as high-pressure cylindrical piping joint parts (VDD-HPPPs), is challenging because subtle difference in domain (e.g., metal objects, imaging device, viewpoints, lighting) significantly affects the specular reflection characteristics of individual metal object types. In this paper, we address this issue by introducing a tailor-made VDD framework that can be automatically adapted to a new domain. Specifically, we formulate this adaptation task as the problem of network architecture search (NAS) on a deep object-detection network, in which the network architecture is searched via reinforcement learning. We demonstrate the effectiveness of the proposed framework using the VDD-HPPPs task as a factory case study. Experimental results show that the proposed method achieved higher burr detection accuracy compared with the baseline method for data with different training/test domains for the non-convex HPPPs, which are particularly affected by domain shifts.
Download

Paper Nr: 29
Title:

Nearest-neighbor Search from Large Datasets using Narrow Sketches

Authors:

Naoya Higuchi, Yasunobu Imamura, Vladimir Mic, Takeshi Shinohara, Kouichi Hirata and Tetsuji Kuboyama

Abstract: We consider the nearest-neighbor search on large-scale high-dimensional datasets that cannot fit in the main memory. Sketches are bit strings that compactly express data points. Although it is usually thought that wide sketches are needed for high-precision searches, we use relatively narrow sketches such as 22-bit or 24-bit, to select a small set of candidates for the search. We use an asymmetric distance between data points and sketches as the criteria for candidate selection, instead of traditionally used Hamming distance. It can be considered a distance partially restoring quantization error. We utilize an efficient one-by-one sketch enumeration in the order of the partially restored distance to realize a fast candidate selection. We use two datasets to demonstrate the effectiveness of the method: YFCC100M-HNfc6 consisting of about 100 million 4,096 dimensional image descriptors and DEEP1B consisting of 1 billion 96 dimensional vectors. Using a standard desktop computer, we conducted a nearest-neighbor search for a query on datasets stored on SSD, where vectors are represented by 8-bit integers. The proposed method executes the search in 5.8 seconds for the 400GB dataset YFCC100M, and 0.24 seconds for the 100GB dataset DEEP1B, while keeping the recall of 90%.
Download

Paper Nr: 33
Title:

PRiDAN: Person Re-identification from Drones with Adaptive Weights and Expanded Neighbourhood

Authors:

Chatchanan Varojpipath and Krystian Mikolajczyk

Abstract: There has been a growing interest in drone applications and many computer vision tasks were specifically adapted to drone scenarios such as SLAM, object detection, depth estimation, etc. Person re-identification is one of the tasks that can be effectively performed from drones and new datasets specifically geared towards aerial person imagery emerge. In addition to the common problems found in almost every person re-ID dataset, the most significant difference to static CCTV re-ID is the very different human pose across views from the top and similar appearance of different people but also motion blur, light conditions, low resolution and occlusions. To address these problems, we propose to combine a Part-based Convolutional Baseline (PCB), which exploits local features, with an adaptive weight distribution strategy, which assigns different weights to similar and dissimilar samples. The result shows that our method outperforms the state of the arts by a large margin. In addition, we propose a re-ranking method which aggregates Expanded Cross Neighborhood (ECN) distance and Jaccard distance to compute the final ranking. Compared to the existing methods, our re-ranking achieves 3.30% and 3.03% improvement on mAP and rank-1 accuracy, respectively.
Download

Paper Nr: 40
Title:

Towards Cargo Wagons Brake Health Scoring through Image Processing

Authors:

Andres F. Posada-Moreno, Thomas Otte, Damir Pehar, Marc Haßler, Holger Bartels, Anas Abdelrazeq and Frank Hees

Abstract: The increase of integrated logistics is generating the progressive integration of rail transport systems on a global scale. This raises the challenge of the safe and compliant operation of an increasing number of assets. Within this context, inspection of in-service cargo wagons becomes increasingly important. Among the wagon components, the brake pads are essential and must be constantly inspected and timely changed before any failure. This publication presents a novel system for the automated scoring of cargo wagon brakes through image processing and deep learning algorithms. The main goal of this system is to provide insightful information which can improve the observability of assets, as well as enable augmented decision-making in maintenance inspection processes. Through this work, a four-step novel approach is described. First, an image acquisition system was developed. Then, an object detection model is used to extract the important cargo wagon components. Next, images containing the extracted brakes are analyzed to extract the most relevant keypoints of the brakes. Finally, the ratio between the distances of multiple keypoints is used to score each brake and provide insightful information regarding their health. After implementation, the proposed approach is tested and the resulting scores are explored.
Download

Paper Nr: 41
Title:

Condition Monitoring of Rail Infrastructure and Rolling Stock using Acceleration Sensor Data of on-Rail Freight Wagons

Authors:

Thomas Otte, Andres F. Posada-Moreno, Fabian Hübenthal, Marc Haßler, Holger Bartels, Anas Abdelrazeq and Frank Hees

Abstract: In various industry sectors all over the world, the ongoing digital transformation helps to unlock benefits for individual components, involved processes, stakeholders as well as the overarching system (e.g., the national economy). In this context, the rail transport sector can particularly benefit from the increased prevalence of sensor systems and the thereby increased availability of related data. As rail transport, by nature, is an integrated transport mode that contains both freight and passenger transport within the same transport network, benefits achieved for the service quality of freight transport also lead to improvements for passenger transport (e.g., punctuality or uptime of rolling stock). This technical paper presents a method to monitor the condition of the existing rail infrastructure as well as the rolling stock by obtaining insights from raw sensor data (e.g., locations and acceleration data). The data is collected with telemetry-units (i.e. multiple sensors integrated with a telematics device to enable data transmission) mounted on a fleet of on-rail freight wagons. In addition, the proposed method is applied to an exemplary set of extracted real-world data.
Download

Paper Nr: 53
Title:

Calibration of a Telecentric Structured-light Device for Micrometric 3D Reconstruction

Authors:

Mara Pistellato, Andrea Albarelli and Filippo Bergamasco

Abstract: Structured-light 3D reconstruction techniques are employed in a wide range of applications for industrial inspection. In particular, some tasks require micrometric precision for the identification of microscopic surface irregularities. We propose a novel calibration technique for structured-light systems adopting telecentric lenses for both camera and projector. The device exploits a fixed light pattern (striped-based) to perform accurate microscopic surface reconstruction and measurements. Our method employs a sphere with a known radius as calibration target and takes advantage of the orthographic projection model of the telecentric lenses to recover the bundle of planes originated by the projector. Once the sheaf of parallel planes is properly described in the camera reference frame, the triangulation of the surface’s object hit by the light stripes is immediate. Moreover, we tested our technique in a real-world scenario for industrial surface inspection by implementing a complete pipeline to recover the intersections between the projected planes and the surface. Experimental analysis shows the robustness of the proposed approach against synthetic and real-world test data.
Download

Paper Nr: 55
Title:

Efficient Multi-angle Audio-visual Speech Recognition using Parallel WaveGAN based Scene Classifier

Authors:

Shinnosuke Isobe, Satoshi Tamura, Yuuto Gotoh and Masaki Nose

Abstract: Recently, Audio-Visual Speech Recognition (AVSR), one of robust Automatic Speech Recognition (ASR) methods against acoustic noise, has been widely researched. AVSR combines ASR and Visual Speech Recognition (VSR). Considering real applications, we need to develop VSR that can accept frontal and non-frontal face images, and reduce computational time for image processing. In this paper, we propose an efficient multi-angle AVSR method using a Parallel-WaveGAN-based scene classifier. The classifier estimates whether given speech data were recorded in clean or noisy environments. Multi-angle AVSR is conducted if our scene classification detected noisy environments to enhance the recognition accuracy, whereas only ASR is performed if the classifier predicts clean speech data to avoid the increase of processing time. We evaluated our framework using two multi-angle audio-visual database: an English corpus OuluVS2 having 5 views and a Japanese phrase corpus GAMVA consisting of 12 views. Experimental results show that the scene classifier worked well, and using multi-angle AVSR achieved higher recognition accuracy than ASR. In addition, our approach could save processing time by switching recognizers according to noise condition.
Download

Paper Nr: 87
Title:

Boosting Re-identification in the Ultra-running Scenario

Authors:

Miguel Á. Medina, Javier Lorenzo-Navarro, David Freire-Obregón, Oliverio J. Santana, Daniel Hernández-Sosa and Modesto C. Santana

Abstract: In the context of ultra-running (longer than a marathon distance), whole-body based re-identification (ReId) state of the art approaches have reported moderated success due to the challenging unrestricted characteristics of the long-term scenario, as very different illuminations, accessories (backpacks, caps, sunglasses), and/or changes of clothes are present. In this paper, we explore the integration of two elements in the ReId process: 1) an additional biometric cue such as the face, and 2) the particular spatio-temporal context information present in these competitions. Preliminary results confirm the limited relevance of the facial cue in the (not high resolution) ReId scenario and the great benefits of the contextual information to reduce the gallery size and consequently improve the overall ReId performance.
Download

Paper Nr: 95
Title:

Sieving Camera Trap Sequences in the Wild

Authors:

Anoushka Banerjee, Dileep A. Dinesh and Arnav Bhavsar

Abstract: Camera trap sequences are a treasure trove for wildlife data. Camera traps are susceptible to false triggers caused by ground heat flux and wind leading to empty frames. Empty frames are also generated if the animal moves out of the camera field of view in between the firing of a shot. The time lost in manually sieving the surfeit empty frames restraint the camera trap data usage. Camouflage, occlusion, motion blur, poor illumination, and a small region of interest not only make wildlife subject detection a difficult task for human experts but also add to the challenge of sifting empty frames from animal containing frames. Thus, in this work, we attempt to automate empty frame removal and animal detection in camera trap sequences using deep learning algorithms such as vision transformer (ViT), faster region based convolution networks (Faster R-CNN), and DEtection TRansformer (DETR). Each biodiversity hotspot has its characteristic seasonal variations and flora and fauna distribution that juxtapose the need for domain generalization and adaptation in the leveraged deep learning algorithms. Therefore, we address the challenge of adapting our models to a few locations and generalising to the unseen location where training data is scarce.
Download

Paper Nr: 106
Title:

On the Statistical Independence of Parametric Representations in Biometric Cryptosystems: Evaluation and Improvement

Authors:

Riccardo Musto, Emanuele Maiorana, Ridvan S. Kuzu, Gabriel E. Hine and Patrizio Campisi

Abstract: Biometric recognition is nowadays employed in several real-world applications to automatically authenticate legitimate users. Nonetheless, using biometric traits as personal identifiers raises many privacy and security issues, not affecting traditional approaches performing automatic people recognition. In order to cope with such concerns, and to guarantee the required level of security to the employed biometric templates, several protection schemes have been designed and proposed. The robustness against possible attacks brought to such approaches has been typically investigated under the assumption that the employed biometric representations comprise mutually independent coefficients. Unfortunately, the parametric representations adopted in most biometric recognition systems commonly consist of strongly correlated features, which may be therefore unsuitable to be used in biometric cryptosystems since they would lower the achievable security. In this paper we propose a framework for evaluating the statistical independence of features employed in biometric recognition systems. Furthermore, we investigate the feasibility of improving the mutual independence of representations defined through deep learning approaches by resorting to architectures involving autoencoders, and evaluate the characteristics of the novel templates through the introduced metrics. Tests performed using templates derived from finger-vein patterns are performed to evaluate the introduced framework for statistical independence and the proposed template generation strategies.
Download

Paper Nr: 108
Title:

A Light Source Calibration Technique for Multi-camera Inspection Devices

Authors:

Mara Pistellato, Mauro Noris, Andrea Albarelli and Filippo Bergamasco

Abstract: Industrial manufacturing processes often involve a visual control system to detect possible product defects during production. Such inspection devices usually include one or more cameras and several light sources designed to highlight surface imperfections under different illumination conditions (e.g. bumps, scratches, holes). In such scenarios, a preliminary calibration procedure of each component is a mandatory step to recover the system’s geometrical configuration and thus ensure a good process accuracy. In this paper we propose a procedure to estimate the position of each light source with respect to a camera network using an inexpensive Lambertian spherical target. For each light source, the target is acquired at different positions from different cameras, and an initial guess of the corresponding light vector is recovered from the analysis of the collected intensity isocurves. Then, an energy minimization process based on the Lambertian shading model refines the result for a precise 3D localization. We tested our approach in an industrial setup, performing extensive experiments on synthetic and real-world data to demonstrate the accuracy of the proposed approach.
Download

Short Papers
Paper Nr: 1
Title:

Real-time Weapon Detection in Videos

Authors:

Ahmed Nazeem, Xinzhu Bei, Ruobing Chen and Shreyas Shrivastava

Abstract: Real-time weapon detection in video is a challenging object detection task due to the small size of weapons relative to the image size. Thus, we try to solve the common problem that object detectors deteriorate dramatically as the object becomes smaller. In this manuscript, we aim to detect small-scale non-concealed rifles and handguns. Our contribution in this paper is (i) proposing a scale-invariant object detection framework that is particularly effective with small objects classification, (ii) designing anchor scales based on the effective receptive fields to extend the Single Shot Detection (SSD) model to take an input image of resolution 900*900, and (iii) proposing customized focal loss with hard-mining. Our proposed model achieved a recall rate of 86% (94% on rifles and 74% on handguns) with a false positive rate of 0.07% on a self-collected test set of 33K non-weapon images and 5K weapon images.
Download

Paper Nr: 5
Title:

An Integrated Recurrent Neural Network and Regression Model with Spatial and Climatic Couplings for Vector-borne Disease Dynamics

Authors:

Zhijian Li, Jack Xin and Guofa Zhou

Abstract: We developed an integrated recurrent neural network and nonlinear regression spatio-temporal model for vector-borne disease evolution. We take into account climate data and seasonality as external factors that correlate with disease transmitting insects (e.g. flies), also spill-over infections from neighboring regions surrounding a region of interest. The climate data is encoded to the model through a quadratic embedding scheme motivated by recommendation systems. The neighboring regions’ influence is modeled by a long short-term memory neural network. The integrated model is trained by stochastic gradient descent and tested on leishmaniasis data in Sri Lanka from 2013-2018 where infection outbreaks occurred. Our model out-performed ARIMA models across a number of regions with high infections, and an associated ablation study renders support to our modeling hypothesis and ideas.
Download

Paper Nr: 15
Title:

Towards a Low-cost Vision System for Real-time Pavement Condition Assessment

Authors:

Kehinde Olufowobi and Nic Herndon

Abstract: Although advances in camera and sensing technology in the last decade helped propel the automation of pavement distress detection and characterization, increased equipment acquisition and running costs limit access to the most effective solutions. Furthermore, some of these advanced techniques require substantial human involvement to process and analyze data correctly. We propose a cost-effective, end-to-end automated approach to pavement condition assessment that employs a neural object detector to identify and measure instances of pavement distress in real time from oblique two-dimensional imagery acquired using an unmanned aerial vehicle. A state-of-the-art object detector architecture is applied to identify and localize pavement distress instances in these images. Camera data, information about Street View image acquisition conditions, and the principles of photogrammetry and planar homography are exploited to construct a mapping for translating pixel distances to real-world distances. This capability is integrated into the neural network inference process to derive an end-to-end system for real-time distress identification and measurement.
Download

Paper Nr: 31
Title:

Optimization of Sensor Placement for Birds Acoustic Detection in Complex Fields

Authors:

Damien Goetschi, Valère Martin, Richard Baltensperger, Marc Vonlanthen, Donatien D. Roziers and Francesco Carrino

Abstract: Birds nest in multifunctional semi-natural environments. Intensification of agriculture and forestry prevents their successful breeding, threatening globally their survival. Early bird detection allows for targeted conservation actions, such as local (temporary) habitat protection. The conservationist thus looks for at detecting priority bird species as soon as a territory is occupied, for instance using acoustic surveillance network. We present a comprehensive method to optimize acoustic coverage with a minimum number of sensors in the network. Our method includes a sound propagation model and algorithms for optimized sensor placement. Relevant parameters (e.g., topography, soil type, height of vegetation, weather, etc.) for the sound propagation model are automatically extracted from an area of interest. We implemented and compared Particle Swarm Optimization and Genetic Algorithms-based approaches to solve the optimisation problem.
Download

Paper Nr: 34
Title:

Evaluation of Generative Adversarial Network Generated Super Resolution Images for Micro Expression Recognition

Authors:

Pratikshya Sharma, Sonya Coleman, Pratheepan Yogarajah, Laurence Taggart and Pradeepa Samarasinghe

Abstract: The Advancements in micro expression recognition techniques are accelerating at an exceptional rate in recent years. Envisaging a real environment, the recordings captured in our everyday life are prime sources for many studies, but these data often suffer from poor quality. Consequently, this has opened up a new research direction involving low resolution micro expression images. Identifying a particular class of micro expression among several classes is extremely challenging due to less distinct inter-class discriminative features. Low resolution of such images further diminishes the discriminative power of micro facial features. Undoubtedly, this increases the recognition challenge by twofold. To address the issue of low-resolution for facial micro expression, this work proposes a novel approach that employs a super resolution technique using Generative Adversarial Network and its variant. Additionally, Local Binary Pattern & Local phase quantization on three orthogonal planes are used for extracting facial micro features. The overall performance is evaluated based on recognition accuracy obtained using a support vector machine. Also, image quality metrics are used for evaluating reconstruction performance. Low resolution images simulated from the SMIC-HS dataset are used for testing the proposed approach and experimental results demonstrate its usefulness.
Download

Paper Nr: 44
Title:

Visual-only Voice Activity Detection using Human Motion in Conference Video

Authors:

Keisuke Yamazaki, Satoshi Tamura, Yuuto Gotoh and Masaki Nose

Abstract: In this paper, we propose a visual-only Voice Activity Detection (VAD) method using human movements. Although audio VAD is commonly used in many applications, it has a problem it is not robust in noisy environments. In such the cases, multi-modal VAD using speech and mouth information is effective. However, due to the current pandemic situation, people wear masks causing we cannot observe mouths. On the other hand, utilizing a video capturing the entire of a speaker is useful for visual VAD, because gestures and motions may contribute to identify speech segments. In our scheme, we firstly obtain dynamic images which represent motion of a person. Secondly, we fuse dynamic and original images using Multi-Modal Transfer Module (MMTM). To evaluate the effectiveness of our scheme, we conducted experiments using conference videos. The results show that the proposed model has better than the baseline. Furthermore, through model visualization we confirmed that the proposed model focused much more on speakers.
Download

Paper Nr: 56
Title:

Community Detection based on Node Relationship Classification

Authors:

Shunjie Yuan, Hefeng Zeng and Chao Wang

Abstract: Community detection is a salient task in network analysis to understand the intrinsic structure of networks. In this paper, we propose a novel community detection algorithm based on node relationship classification. The node relationship between two neighboring nodes is defined as whether they affiliate to the same community. A trained binary classifier is deployed to classify the node relationship, which considers both the local influence from the two nodes themselves and the global influence from the whole network. According to the classified node relationship, community structure can be detected naturally. The experimental results on both real-world and synthetic networks demonstrate that our algorithm has a better performance compared to other representative algorithms.
Download

Paper Nr: 63
Title:

Deep Neural Network for Estimating Value of Quality of Life in Driving Scenes

Authors:

Shinji Fukui, Naoki Watanabe, Yuji Iwahori, Pittipol Kantavat, Boonserm Kijsirikul, Hiroyuki Takeshita, Yoshitsugu Hayashi and Akihiko Okazaki

Abstract: The purpose of this research is to estimate a value of Quality of Life (QoL) of an image in a driving scene from only the image. The system suggesting optimal transportation methods and routes from a current place to a destination has been developed. The QoL value is used for the system. A method to estimate the QoL value easily is needed. This paper proposes a method for estimating the QoL value of the image. The image is segmented by a semantic segmentation method based on the Deep Neural Network (DNN). The rates of the total amount of the object region of each object class to the whole image region are calculated. The rates are used as indicators for estimating the QoL value. The MultiLayer Perceptron (MLP) learns the relationship between the QoL value and the rates. The DNN for estimating the QoL value from only the input image is constructed by connecting the DNN based semantic segmentation model and the MLP. The effectiveness of the proposed method is demonstrated by the experiments.
Download

Paper Nr: 66
Title:

Semi-Supervised Cloud Detection with Weakly Labeled RGB Aerial Images using Generative Adversarial Networks

Authors:

Toon Stuyck, Axel-Jan Rousseau, Mattia Vallerio and Eric Demeester

Abstract: Despite extensive efforts, it is still very challenging to correctly detect clouds automatically from RGB images. In this paper, an automated and effective cloud detection method is proposed based on a semi-supervised generative adversarial networks that was originally designed for anomaly detection in combination with structural similarity. By only training the networks on cloudless RGB images, the generator network is able to learn the distribution of normal input images and is able to generate realistic and contextually similar images. If an image with clouds is introduced, the network will fail to recreate a realistic and contextually similar image. Using this information combined with the structural similarity index, we are able to automatically and effectively segment anomalies, which in this case are clouds. The proposed method compares favourably to other commonly used cloud detection methods on RGB images.
Download

Paper Nr: 74
Title:

Effect of Data Augmentation Methods on Face Image Classification Results

Authors:

Ingrid Hrga and Marina Ivasic-Kos

Abstract: Data augmentation encompasses a set of techniques to increase the size of a dataset artificially. Insufficient training data means that the network will be susceptible to the problem of overfitting, leading to a poor generalization capability of the network. Therefore, research efforts are focused on developing various augmentation strategies. Simple affine transformations are commonly used to expand a set. However, more advanced methods, such as information dropping or random mixing, are becoming increasingly popular. We analyze different data augmentation techniques suitable for the image classification task in this paper. We investigate how the choice of a particular approach affects the classification results depending on the size of the training dataset, the type of transfer learning applied, and the task's difficulty, which we determine based on the objectivity or subjectivity of the target attribute. Our results show that the choice of augmentation method becomes crucial in the case of more challenging tasks, especially when using a pre-trained model as a feature extractor. Moreover, the methods that showed above-average results on smaller sets may not be the optimal choice on a larger set and vice versa.
Download

Paper Nr: 75
Title:

Refined co-SVD Recommender Algorithm: Data Processing and Performance Metrics

Authors:

Jia Ming Low, Ian T. Tan and Chern Hong Lim

Abstract: A resurgence of research interest in recommender systems can be attributed to the widely publicized Netflix competition with the grand prize of USD 1 million. The competition enabled the promising collaborative filtering algorithms to come to prominence due to the availability of a large dataset and from it, the growth in the use of matrix factorization. There have been many recommender system projects centered around use of matrix factorization, with the co-SVD approach being one of the most promising. However, the field is chaotic using different benchmarks and evaluation metrics. Not only the performance metrics reported are not consistent, but it is difficult to reproduce existing research when details of the data processing and hyper-parameters lack clarity. This paper is to address these shortcomings and provide researchers in this field with a current baseline through the provision of detailed implementation of the co-SVD approach. To facilitate progress for future researchers, it will also provide results from an up-to-date dataset using pertinent evaluation metrics such as the top-N recommendations and the normalized discounted cumulative gain measures.
Download

Paper Nr: 76
Title:

On the Choice of General Purpose Classifiers in Learned Bloom Filters: An Initial Analysis Within Basic Filters

Authors:

Giacomo Fumagalli, Davide Raimondi, Raffaele Giancarlo, Dario Malchiodi and Marco Frasca

Abstract: Bloom Filters are a fundamental and pervasive data structure. Within the growing area of Learned Data Structures, several Learned versions of Bloom Filters have been considered, yielding advantages over classic Filters. Each of them uses a classifier, which is the Learned part of the data structure. Although it has a central role in those new filters, and its space footprint as well as classification time may affect the performance of the Learned Filter, no systematic study of which specific classifier to use in which circumstances is available. We report progress in this area here, providing also initial guidelines on which classifier to choose among five classic classification paradigms.
Download

Paper Nr: 78
Title:

Three-step Approach for Localization, Instance Segmentation and Multi-facet Classification of Individual Logs in Wooden Piles

Authors:

Christoph Praschl and Gerald A. Zwettler

Abstract: The inspection of products and the assessment of quality is connected with high costs and time effort in many industrial domains. This also applies to the forestry industry. Utilizing state-of-the-art deep learning models allows the analysis automation of wooden piles in a vision-based manner. In this work a three-step approach is presented for the localization, segmentation and multi-facet classification of individual logs based on a client/server architecture allowing to determine the quality, volume and like this the value of a wooden pile based on a smartphone application. Using multiple YOLOv4 and U-NET models leads to a client-side log localization accuracy of 82.9% with low storage requirements of 23 MB and a server-side log detection accuracy of 94.1%, together with a log type classification accuracy of 95% and 96% according to the quality assessment of spruce logs. In addition, the trained segmentation model reaches an accuracy of 89%.
Download

Paper Nr: 80
Title:

LSU-DS: An Uruguayan Sign Language Public Dataset for Automatic Recognition

Authors:

Ariel E. Stassi, Marcela Tancredi, Roberto Aguirre, Alvaro Gómez, Bruno Carballido, Andrés Méndez, Sergio Beheregaray, Alejandro Fojo, Víctor Koleszar and Gregory Randall

Abstract: The first Uruguayan Sign Language public dataset for automatic recognition (LSU-DS) is presented. The dataset can be used both for linguistic studies and for automatic recognition at different levels: alphabet, isolated signs, and sentences. LSU-DS consists of several repetitions of three linguistic tasks by 10 signers. The registers were acquired in an indoor context and with controlled lighting. The signers were freely dressed without gloves or specific markers for recognition. The recordings were acquired by 3 simultaneous cameras calibrated for stereo vision. The dataset is openly available to the community and includes gloss information as well as both the videos and the 3D models generated by OpenPose and MediaPipe for all acquired sequences.
Download

Paper Nr: 90
Title:

LSTM Network based on Prosodic Features for the Classification of Injunction in French Oral Utterances

Authors:

Asma Bougrine, Philippe Ravier, Abdenour Hacine-Gharbi and Hanane Ouachour

Abstract: The classification of injunction in french oral speech is a difficult task since no standard linguistic structure is known in the french language. Thus, prosodic features of the speech could be permitted indicators for this task, especially the logarithmic energy. Our aim is to validate the predominance of the log energy prosodic feature by using conventional classifiers such as SVM or K-NN. Second, we intend to improve the classification rates by using a deep LSTM recurrent network. When applied on the RAVIOLI database, the log energy feature showed indeed the best classification rates (CR) for all classifiers with CR = 82% for SVM and CR = 71.42% for K-NN. When applying the LSTM network on our data, the CR reached a not better value of 79.49% by using the log energy feature alone. More surprisingly, the CR significantly increased to 96.15% by using the 6 prosodic features. We conclude that deep learning methods need as much data as possible for reaching high performance, even the less informative ones, especially when the dataset is small. The counterpart of deep learning methods remains the difficulty of optimal parameters tuning.
Download

Paper Nr: 91
Title:

Does Melania Trump Have a Body Double from the Perspective of Automatic Face Verification?

Authors:

Khawla Mallat, Fabiola Becerra-Riera, Annette Morales-González, Heydi Méndez-Vázquez and Jean-Luc Dugelay

Abstract: With the growing number of users getting updated about current events through social media, the spread of misinformation is increasing and thus endorsing conspiracy belief. During the last presidential election campaign in the USA, the conspiracy theory claiming the existence of a body double that stands in for the former first lady Melania Trump had made international news headlines. Fighting the spread of misinformation is crucial as it is threatening the society by manipulating the public opinion. In this paper, we explore whether automatic face verification can help in verifying widespread misinformation on social media, dealing particularly with the conspiracy theory related to Melania Trump replacement. We employed four different state-of-the-art descriptors for face recognition to verify the integrity of the claim of the studied conspiracy theory. In addition, we assessed the impact of different image quality metrics on the variation of the face verification scores. Two sets of image quality metrics were considered: acquisition-related metrics and subject-related metrics.
Download

Paper Nr: 92
Title:

Towards a More Reliable and Reproducible Protocol of Source Camera Recognition

Authors:

Alexandre Berthet, Chiara Galdi and Jean-Luc Dugelay

Abstract: Source digital camera recognition is an important branch of digital image forensics, which aims at authenticating cameras from the captured images. By analysing the noise artifacts left on the images, it is possible to recognize the label: brand, model and device of the camera (e.g. Nikon - NikonD70 - NikonD70 of Alice). Camera recognition is increasingly difficult as the label become more precise. In the specific case of source camera recognition based on deep learning, literature has widely addressed recognition of the camera model, while the recognition of the instance of the camera (i.e. device) is currently under-studied. Moreover, we have identified a lack of protocols for performance assessment: state-of-the-art methods are usually assessed on databases that have specific compositions, such as the Dresden Image database (74 cameras of 27 models). However, using only one database for evaluation does not reflect reality, where it may be necessary to analyse different sets of devices that are more or less difficult to classify. Also, for some scenarios, verification (1-to-1) is better suited to camera recognition than identification (1-to-N). Based on these elements, we propose a more reliable and reproducible protocol for verification of the source camera made of three different levels (basic, intermediate and advanced) of increasing difficulty, based on camera labels (brand, model and device). State-of-the-art methods are tested with the proposed protocol on the Dresden Image Database and on SOCRatES. The obtained results prove our assumptions, with a relative drop in performance, up to 49.08% between the basic and advanced difficulty levels. Our protocol is able to assess the robustness of methods for source camera recognition, as it tests whether they are really able to correctly classify cameras in realistic contexts.
Download

Paper Nr: 104
Title:

Predicting Depression with Text, Image, and Profile Data from Social Media

Authors:

N. Ignatiev, I. Smirnov and M. Stankevich

Abstract: In this study, we focused on the task of identifying depressed users based on their digital media on a social network. We processed over 60,000 images, 95,000 posts, and 9,000 subscription items related to 619 user profiles on the VKontakte social media network. Beck Depression Inventory screenings were used to assess the presence of depression among these users and divide them into depression and control groups. We retrieved 6 different text based feature sets, images, and general profile data. The experimental evaluation was designed around using all available data from user profiles and creating a prediction pipeline that can process data samples regardless of the availability of text or image data in the user profile. The best result achieved a 69% F1-score with a stacking classifier approach.
Download

Paper Nr: 111
Title:

Combining Deep Learning Model and Evolutionary Optimization for Parameters Identification of NMR Signal

Authors:

Ivan Ryzhikov, Ekaterina Nikolskaya and Yrjö Hiltunen

Abstract: In this study we combine deep learning predictive models and evolutionary optimization algorithm to solve parameter identification problem. We consider parameter identification problem coming from nuclear magnetic resonance signals. We use observation data of sludges and solving water content analysis problem. The content of the liquid flow is the basis of production control of sludge dewatering in various industries. Increasing control performance brings significant economic effect. Since we know the mathematical model of the signal, we reduce content analysis problem to optimization problem and parameters estimation problem. We investigate these approaches and propose a combined approach, which involves predictive models in initial optimization alternative set generation. In numerical research we prove that proposed approach outperforms separate optimization-based approach and predictive models. In examination part, we test approach on signals that were not involved in predictive model learning or optimization algorithm parameters tuning. In this study we utilized standard differential evolution algorithm and multi-layer perceptron.
Download

Paper Nr: 7
Title:

Data Collection and Analysis of Print and Fan Fiction Classification

Authors:

Channing Donaldson and James Pope

Abstract: Fan fiction has provided opportunities for genre enthusiasts to produce their own story lines from existing print fiction. It has also introduced concerns including intellectual property issues for traditional print publishers. An interesting and difficult problem is determining whether a given segment of text is fan fiction or print fiction. Classifying unstructured text remains a critical step for many intelligent systems. In this paper we detail how a significant volume of print and fan fiction was obtained. The data is processed using a proposed pipeline and then analysed using various supervised machine learning classifiers. Given 5 to 10 sentences, our results show an accuracy of 80-90% can be achieved using traditional approaches. To our knowledge this is the first study that explores this type of fiction classification problem.
Download

Paper Nr: 13
Title:

A 3D Matching Method to Compare a Scan to Its Reference using 3D Registration and Monte Carlo Metropolis Hastings Optimization for Industrial Inspection Applications

Authors:

Clément Dubosq and Andréa Guerrero

Abstract: Currently in industry, inspection tasks are essential to ensure a product efficacity and reliability. Some automated tools to inspect, i.e. to detect defect exist, but they are not adapted to an industrial inspection application. Most of industrial inspection is human made. In this article, we propose a new algorithm to match a 3D point-cloud to its 3D reference to track visual defects. First, we reconstruct a 3D model of an object using Iterative Closest Points (ICP) algorithm. Then, we propose an ICP initialization based on a Monte Carlo Metropolis-Hasting optimization to match a partial point-cloud to its model. We applied our algorithm to the data measured from a Time-of-Flight sensor and a RGB camera. We present the results and performance of this approach for objects of different complexities and sizes. The proposed methodology shows good results and adaptability compared to a state-of-the-art method called Go-ICP.
Download

Paper Nr: 25
Title:

Object Detection as Campylobacter Bacteria and Phagocytotic Activity of Leukocytes in Gram Stained Smears Images

Authors:

Kyohei Yoshihara and Kouich Hirata

Abstract: In this paper, we apply object detection to Gram stained smear images, where objects are Campylobacter bacteria and phagocytotic activity of leukocytes. Then, we adopt three CNN-based object detectors of Faster R-CNN, RetinaNet and YOLOv5. The outline of the detection is first to annotate the regions of objects as Campylobacter bacteria and phagocytotic activity of leukocytes in training images, and then to detect the regions of objects in the remained test images by using the detectors. Finally, we give experimental results of detecting Campylobacter bacteria and phagocytotic activity of leukocytes in Gram stained smear images by using the detectors.
Download

Paper Nr: 30
Title:

Table-structure Recognition Method Consisting of Plural Neural Network Modules

Authors:

Hiroyuki Aoyagi, Teruhito Kanazawa, Atsuhiro Takasu, Fumito Uwano and Manabu Ohta

Abstract: In academic papers, tables are often used to summarize experimental results. However, graphs are more suitable than tables for grasping many experimental results at a glance because of the high visibility. Therefore, automatic graph generation from a table has been studied. Because the structure and style of a table vary depending on the authors, this paper proposes a table-structure recognition method using plural neural network (NN) modules. The proposed method consists of four NN modules: two of them merge detected tokens in a table, one estimates implicit ruled lines that are necessary to separate cells but undrawn, and the last estimates cells by merging the tokens. We demonstrated the effectiveness of the proposed method by experiments using the ICDAR 2013 table competition dataset. Consequently, the proposed method achieved an F-measure of 0.972, outperforming those of our earlier work (Ohta et al., 2021) by 1.7 percentage points and of the top-ranked participant in that competition by 2.6 percentage points.
Download

Paper Nr: 51
Title:

Exploiting Ontology to Build Bayesian Network

Authors:

Ahmed Mabrouk, Sarra Ben Abbes, Lynda Temal, Ledia Isaj and Philippe Calvez

Abstract: Exploiting experts’ domain knowledge represented in the ontology can significantly enhance the quality of the Bayesian network (BN) structure learning. However, in practice, using such information is not a trivial task. In fact, knowledge encompassed in ontologies doesn’t share the same semantics as those represented in a BN. To tackle this issue, a large effort has been devoted to create a bridge between both models. But, as far as we know, most state-of-the-art approaches require a Bayesian network-specific ontology for which the BN structure could be easily derived. In this paper, we propose a generic method that allows deriving knowledge from ontology to enhance the learning process of BN. We provide several steps to infer dependencies as well as orientations of some edges between variables. The proposition is implemented and applied to the wind energy domain.
Download

Paper Nr: 54
Title:

Explainable Clustering Applied to the Definition of Terrestrial Biomes

Authors:

Mohamed R. Sidoumou, Alisa Kim, Jeremy Walton, Douglas I. Kelley, Robert J. Parker and Ranjini Swaminathan

Abstract: We present an explainable clustering approach for use with 3D tensor data and use it to define terrestrial biomes from observations in an automatic, data-driven fashion. Our approach allows us to use a larger number of features than is feasible for current empirical methods for defining biomes, which typically rely on expert knowledge and are inherently more subjective than our approach. The data consists of 2D maps of geophysical observation variables, which are rescaled and stacked to form a 3D tensor. We adapt an image segmentation algorithm to divide the tensor into homogeneous regions before partitioning the data using the k-means algorithm. We add explainability to the classification by approximating the clusters with a compact decision tree whose size is limited. Preliminary results show that, with a few exceptions, each cluster represents a biome which can be defined with a single decision rule.
Download

Paper Nr: 58
Title:

Hybrid Model-based Defect Analysis of Thermoelectric Cooler Components

Authors:

Yu Lu, Weifang Xie and Jianlong Huang

Abstract: The surface inspection of thermoelectric cooler (TEC) components is an important step in ensuring the quality of semiconductor refrigeration devices. Based on the presence of surface defects, we divided the TEC components into two types: with and without defects. A hybrid model combining the visual geometry group 16 (VGG16) network and squeeze-excitation network (SENet) is proposed to classify TEC component images. The convolutional layer in the VGG16-SENet model is first used to extract features from the TEC component images, the SENet layer is then used to optimize the features extracted by the convolutional layer, and the softmax function is finally used for classification. Compared with existing mainstream models, the hybrid classification model proposed herein can extract more representative and optimal feature maps from the TEC component images. The model parameters are adjusted using the training set, and the test set is used to evaluate its accuracy. Experiments are conducted to show that the model can effectively improve the accuracy and applicability of image classifications. The accuracy, recall, precision, and F1 Score were used as evaluation indicators for the model, with the values reaching 88.87%, 89.63%, 91.24%, and 90.42%, respectively, which are considered excellent. Compared with other existing convolutional neural network models, the computational efficiency of the hybrid model is significantly higher than that of the original VGG16 model. We expect that this algorithm can be continuously optimized to achieve better recognition capabilities and also applied to industrial production.

Paper Nr: 62
Title:

An Open-source Library for Processing of 3D Data from Indoor Scenes

Authors:

José María Martínez-Otzeta, Iñigo Mendialdua, Itsaso Rodríguez-Moreno, Igor Rodriguez and Basilio Sierra

Abstract: In recent years affordable 3D data acquisition devices have appeared in the market. Researchers and developers have been able to use them in a much larger scale than ever in a wide range of applications, from robotics to autonomous driving. One of these applications is the processing of 3D indoor scenes, usually in the context of autonomous navigation of mobile robots, but also in building mapping for map reconstruction or assessment of the location of structural elements. In this paper we report on the development of an open source Python package (indoor3d) for processing of 3D data obtained indoors. This package is built on top of the Open3D package, with the aim of making easier to perform common tasks that arise in indoor data processing. It has already been helpful in tasks in two different projects: in one of them was useful in the search of structural elements in a pointcloud obtained by a HoloLens device, and in the other in the location of the handle of a door for a mobile robot navigation application.
Download

Paper Nr: 64
Title:

Towards an Interpretable Spanish Sign Language Recognizer

Authors:

Itsaso Rodríguez-Moreno, José María Martínez-Otzeta, Izaro Goienetxea and Basilio Sierra

Abstract: A significant part of the global population lives with hearing impairments, and the number of affected people is expected to increase in the coming decades. People with hearing problems experience daily difficulties in their interaction with non-deaf people, due to the lack of a widespread knowledge of sign languages by the general public. In this paper we present a blueprint for a sign language recognizer that takes advantage of the internal structure of the signs of the Spanish Sign Language (SSL). While the current dominant approaches are those based in deep learning and training with lot of recorded examples, we propose a system in which the signs are decomposed into constituents which are in turn recognized by a classical classifier and then assessed if their combination is congruent with a regular expression associated with a whole sign. While the deep learning with many examples approach works for every possible collection of signs, our suggestion is that we could leverage the known structure of the sign language in order to create simpler and more interpretable classifiers that could offer a good trade-off between accuracy and interpretability. This characteristic makes this approach adequate for using the system as part of a tutor or to gain insight into the inner workings of the recognizer.
Download

Paper Nr: 67
Title:

HistShot: A Shot Type Dataset based on Historical Documentation during WWII

Authors:

Daniel Helm, Florian Kleber and Martin Kampel

Abstract: Automated shot type classification plays a significant role in film preservation and indexing of film datasets. In this paper a historical shot type dataset (HistShot) is presented, where the frames have been extracted from original historical documentary films. A center frame of each shot has been chosen for the dataset and is annotated according to the following shot types: Close-Up (CU), Medium-Shot (MS), Long-Shot (LS), Extreme-Long-Shot (ELS), Intertitle (I), and Not Available/None (NA). The validity to choose the center frame is shown by a user study. Additionally, standard CNN-based methods (ResNet50, VGG16) have been applied to provide a baseline for the HistShot dataset.
Download

Paper Nr: 68
Title:

Cyber Aggression and Cyberbullying Identification on Social Networks

Authors:

Vincenzo Gattulli, Donato Impedovo, Giuseppe Pirlo and Lucia Sarcinella

Abstract: Bullying includes aggression, harassment, and discrimination. The phenomenon has widespread with the great diffusion of many social networks. Thus, the cyber aggression iteration turns into a more serious problem called Cyberbullying. In this work an automatic identification system built up on the most performing set of techniques available in literature is presented. Textual comments of various Italian Twitter posts have been processed to identify the aggressive phenomenon. The challenge has been also identifying aggressive profiles who repeat their malicious work on social networks. Two different experiments have been performed with the aim of the detection of Cyber Aggression and Cyberbullying. The best results were obtained by the Random Forest classifier, trained on an ad-hoc Dataset that contemplates a series of comments extracted from Twitter and tagged manually. The system currently presented is an excellent tool to counter the phenomenon of Cyberbullying, but there are certainly many improvements to be made to improve the performance of the system.
Download

Paper Nr: 72
Title:

Compact, Accurate and Low-cost Hand Tracking System based on LEAP Motion Controllers and Raspberry Pi

Authors:

Giuseppe Placidi, Alessandro Di Matteo, Filippo Mignosi, Matteo Polsinelli and Matteo Spezialetti

Abstract: The large diffusion of low-cost computer vision (CV) hand tracking sensors used for hand gesture recognition, has allowed the development of precise and low cost touchless tracking systems. The main problem with CV solutions is how to cope with occlusions, very frequent when the hand has to grasp a tool, and self-occlusions occurring when some joint obscures some other. In most cases occlusions are solved by using synchronized multiple stereo sensors. Virtual Glove (VG) is one of the CV-based systems that uses two orthogonal LEAP sensors integrated into a single system. The VG system is driven by a Personal Computer in which both a master operating system (OS) and a virtual machine have to be installed in order to drive the two sensors (just one sensor at a time can be driven by a single OS instance). This is a strong limitation because VG has to run on a powerful PC, thus resulting in a not properly low-cost and portable solution. We propose a VG architecture based on three Raspberry Pi (RP), each consisting of a cheap single board computer with Linux OS. The proposed architecture assigns an RPi to each LEAP and a third RP to collect data from the other two. The third RP merges, in real time, data into a single hand model and makes it available, through an API, to be rendered in a web application or inside a Virtual Reality (VR) interface. The detailed design is proposed, the architecture is implemented and experimental benchmark measurements, demonstrating the RPi-based VG real-time behaviour while containing costs and power consumption, are presented and discussed. The proposed architecture could open the way to develop modular hand tracking systems based on more than two LEAPs, each associated to one RP, in order to further improve robustness.
Download

Paper Nr: 79
Title:

A New Neural Network Model for Prediction Next Stage of Alzheimer’s Disease

Authors:

Nour Zawawi, Heba G. Saber, Mohamed Hashem and Tarek F. Gharib

Abstract: Alzheimer’s disease (AD) is a brain-related illness; The risk of development is minimized when diagnosed early. The early detection and treatment of Alzheimer’s disease are crucial since they can decrease disease progression, improve symptom management, allow patients to receive timely guidance and support, and save money on healthcare. Regrettably, much current research focuses on characterizing illness states in their current phases rather than forecasting disease development. Because Alzheimer’s disease generally progresses in phases over time, we believe that analyzing time-sequential data can help with disease prediction. Long short-term memory (LSTM) is a recurrent neural network that links previous input to the current task. A new Alzheimer’s Disease Random Forest (RF) LSTM Prediction Model (RFLSTM-PM) is proposed to capture the conditions between characteristics and the next stage of Alzheimer’s Disease after noticing that a patient’s data could be beneficial in predicting disease progression. Experiments reveal that our approach beats most existing models and can help with early-onset AD prediction. Furthermore, tests show that it can recognize disease- related brain regions across multiple data modalities (Magnetic resonance imaging (MRI), Neurological Test). Also, it showed decreased value in Mean Absolute Error and Root Mean Square Error for forecasting the progression of the disease.
Download

Paper Nr: 81
Title:

View-invariant 3D Skeleton-based Human Activity Recognition based on Transformer and Spatio-temporal Features

Authors:

Ahmed Snoun, Tahani Bouchrika and Olfa Jemai

Abstract: With the emergence of depth sensors, real-time 3D human skeleton estimation have become easier to accomplish. Thus, methods for human activity recognition (HAR) based on 3D skeleton have become increasingly accessible. In this paper, we introduce a new approach for human activity recognition using 3D skeletal data. Our approach generates a set of spatio-temporal and view-invariant features from the skeleton joints. Then, the extracted features are analyzed using a typical Transformer encoder in order to recognize the activity. In fact, Transformers, which are based on self-attention mechanism, have been successful in many domains in the last few years, which makes them suitable for HAR. The proposed approach shows promising performance on different well-known datasets that provide 3D skeleton data, namely, KARD, Florence 3D, UTKinect Action 3D and MSR Action 3D.
Download

Paper Nr: 85
Title:

Exploring Enterprise Operating Indicator Data by Hierarchical Forecasting and Root Cause Analysis

Authors:

Yue Pang, Jing Pan, Xiaogang Li, Jianbin Zheng, Tan Sun and Qinxin Li

Abstract: Enterprise operating indicators analysis is essential for the decision maker to grasp the situation of enterprise operation. In this work, time series prediction and root cause analysis algorithms are adopted to form a multi-dimensional analysis method, which is used to accurately and rapidly locate enterprise operational anomaly. The method is conducted on real operating indicator data from a financial technology company, and the experimental results validate the effectiveness of multi-dimensional analysis method.
Download

Paper Nr: 88
Title:

Domain Generalization for Activity Recognition: Learn from Visible, Infer with Thermal

Authors:

Yannick Zoetgnande and Jean L. Dillenseger

Abstract: We proposed a solution based on I3D and optical flow to learn common characteristics between thermal and visible videos. For this purpose we proposed a new database to evaluate our solution. The new model comprises an optical flow extractor; a feature extractor based on I3D, a domain classifier, and an activity recognition classifier. We learn invariant characteristics computed from the optical flow. We have simulated several source domains, and we have shown that it is possible to obtain excellent results on a modality that was not used during the training. Such techniques can be used when there is only one source and one target domain.
Download