ICPRAM 2021 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 11
Title:

Self-adaptive Norm Update for Faster Gradient-based L2 Adversarial Attacks and Defenses

Authors:

Yanhong Liu and Fengming Cao

Abstract: Adversarial training has been shown as one of the most effective defense techniques against adversarial attacks. However, it is based on generating strong adversarial examples by attacks in each iteration of its training process. Research efforts have always been paid to reduce the time overhead of attacks, without impacting their efficiency. The recent work of Decoupled Direction and Norm (DDN) pushed forward the progress on the gradient-based L2 attack with low norm, by adjusting the norm of the noise in each iteration based on whether the last perturbed image is adversarial or not. In this paper, we propose a self-adaptive way of adjusting the L2 norm, by considering whether the perturbed images in the last two iterations are both adversarial or not. Experiments conducted on the MNIST, CIFAR-10 and ImageNet datasets show that our proposed attack achieves comparable or even better performance than DDN with up to 30% less number of iterations. Models trained with our attack achieve comparable robustness to those trained with the DDN attack on the MNIST and CIFAR-10 datasets, by taking around 20% less training time, when the attacks are limited to a maximum norm.
Download

Paper Nr: 18
Title:

FLIC: Fast Lidar Image Clustering

Authors:

Frederik Hasecke, Lukas Hahn and Anton Kummert

Abstract: In this work, we propose an algorithmic approach for real-time instance segmentation of Lidar sensor data. We show how our method uses the underlying way of data acquisition to retain three-dimensional measurement information, while being narrowed down to a two-dimensional binary representation for fast computation. Doing so, we reframe the three-dimensional clustering problem to a two-dimensional connected-component labelling task. We further introduce what we call Map Connections, to make our approach robust against over- segmenting instances and improve assignment in cases of partial occlusions. Through detailed evaluation on public data and comparison with established methods, we show that these aspects improve the segmentation quality beyond the results offered by other three-dimensional cluster mechanisms. Our algorithm can run at up to 165 Hz on a 64 channel Velodyne Lidar dataset on a single CPU core.
Download

Paper Nr: 22
Title:

Multi-level Feature Selection for Oriented Object Detection

Authors:

Chen Jiang, Yefan Jiang, Zhangxing Bian, Fan Yang and Siyu Xia

Abstract: Horizontal object detection has made significant progress, but the representation of horizontal bounding box still has application limitations for oriented objects. In this paper, we propose an end-to-end rotation detector to localize and classify oriented targets precisely. Firstly, we introduce the path aggregation module, to shorten the path of feature propagation. To distribute region proposals to the most suitable feature map, we propose the feature selection module instead of using selection mechanism based on the size of region proposals. What’s more, for rotation detection, we adopt eight-parameter representation method to parametrize the oriented bounding box and we add a novel loss to handle the boundary problems resulting from the representation way. Our experiments are evaluated on DOTA and HRSC2016 datasets.
Download

Paper Nr: 24
Title:

Demonstrating the Vulnerability of RGB-D based Face Recognition to GAN-generated Depth-map Injection

Authors:

Valeria Chiesa, Chiara Galdi and Jean-Luc Dugelay

Abstract: RGB-D cameras are devices able to collect additional information, compared to classical RGB devices, about the observed scene: its depth (D). This has made RGB-D very suitable for many image processing tasks, including presentation attack detection (PAD) in face recognition systems. This work aims at demonstrating that thanks to novel techniques developed in recent years, such as generative adversarial networks (GANs), face PAD systems based on RGB-D are now vulnerable to logical access attack. In this work, a GAN is trained to generate a depth map from an input 2D RGB face image. The attacker can then fool the system by injecting a photo of the authorized user along with the generated depth map. Among all RGB-D devices, this work focuses on light-field cameras but the proposed framework can be easily adapted for other RGB-D devices. The GAN is trained on the IST-EURECOM light-field face database (LFFD). The attack is simulated thanks to the IST lenslet light field face spoofing database (LLFFSD). A third dataset is used to show that the proposed approach generalizes well on a different face database.
Download

Paper Nr: 27
Title:

MetaBox+: A New Region based Active Learning Method for Semantic Segmentation using Priority Maps

Authors:

Pascal Colling, Lutz Roese-Koerner, Hanno Gottschalk and Matthias Rottmann

Abstract: We present a novel region based active learning method for semantic image segmentation, called MetaBox+. For acquisition, we train a meta regression model to estimate the segment-wise Intersection over Union (IoU) of each predicted segment of unlabeled images. This can be understood as an estimation of segment-wise prediction quality. Queried regions are supposed to minimize to competing targets, i.e., low predicted IoU values / segmentation quality and low estimated annotation costs. For estimating the latter we propose a simple but practical method for annotation cost estimation. We compare our method to entropy based methods, where we consider the entropy as uncertainty of the prediction. The comparison and analysis of the results provide insights into annotation costs as well as robustness and variance of the methods. Numerical experiments conducted with two different networks on the Cityscapes dataset clearly demonstrate a reduction of annotation effort compared to random acquisition. Noteworthily, we achieve 95% of the mean Intersection over Union (mIoU), using MetaBox+ compared to when training with the full dataset, with only 10.47% / 32.01% annotation effort for the two networks, respectively
Download

Paper Nr: 57
Title:

Speech Recognition using Deep Canonical Correlation Analysis in Noisy Environments

Authors:

Shinnosuke Isobe, Satoshi Tamura and Satoru Hayamizu

Abstract: In this paper, we propose a method to improve the accuracy of speech recognition in noisy environments by utilizing Deep Canonical Correlation Analysis (DCCA). DCCA generates projections from two modalities into one common space, so that the correlation of projected vectors could be maximized. Our idea is to employ DCCA techniques with audio and visual modalities to enhance the robustness of Automatic Speech Recognition (ASR); A) noisy audio features can be recovered by clean visual features, and B) an ASR model can be trained using audio and visual features, as data augmentation. We evaluated our method using an audiovisual corpus CENSREC-1-AV and a noise database DEMAND. Compared to conventional ASR and feature- fusion-based audio-visual speech recognition, our DCCA-based recognizers achieved better performance. In addition, experimental results shows that utilizing DCCA enables us to get better results in various noisy environments, thanks to the visual modality. Furthermore, it is found that DCCA can be used as a data augmentation scheme if only a few training data are available, by incorporating visual DCCA features to build an audio-only ASR model, in addition to audio DCCA features.
Download

Paper Nr: 67
Title:

Capsule Networks with Intersection over Union Loss for Binary Image Segmentation

Authors:

Floris Van Beers

Abstract: With the development of Capsule Networks and their adaptation to the task of semantic segmentation, it has become important to determine which hyperparameters perform best for this new type of image processing model. One such parameter is the loss function, for which the baseline is usually cross entropy loss. In recent work on other models, Intersection over Union (IoU) loss has been shown to be effective. This work explores the application of IoU loss to segmentational capsule networks. For this purpose experiments are performed on two datasets: a medical dataset, LUNA16, and a dataset of faces (LFW). Results show marginal to significant improvements when using the IoU loss function as compared to the baseline Binary Cross-Entropy. From this can be concluded that the search for optimal loss functions is not finished and new loss functions may further improve performance of existing models.
Download

Paper Nr: 70
Title:

Generalized Dilation Structures in Convolutional Neural Networks

Authors:

Gavneet S. Chadha, Jan N. Reimann and Andreas Schwung

Abstract: Convolutional neural networks are known to provide superior performance in various application fields such as image recognition, natural language processing and time series analysis owing to their strong ability to learn spatial and temporal features in the input domain. One of the most profound types of convolution kernels presented in literature is the dilated convolution kernel used primarily for aggregating information from a larger perspective or receptive field. However, the dilation rate and thereby the structure of the kernel has to be fixed a priori, which limits the flexibility of these convolution kernels. In this study, we propose a generalized dilation network where arbitrary dilation structures within a specific dilation rate can be learned. To this end, we derive an end-to-end learnable architecture for dilation layers using the constrained log-barrier method. We test the proposed architecture on various image recognition tasks by investigating and comparing with the SimpleNet architecture. The results illustrate the applicability of the generalized dilation layers and their superior performance.
Download

Paper Nr: 74
Title:

Exploring Motion Boundaries in an End-to-End Network for Vision-based Parkinson’s Severity Assessment

Authors:

Amirhossein Dadashzadeh, Alan Whone, Michal Rolinski and Majid Mirmehdi

Abstract: Evaluating neurological disorders such as Parkinsons disease (PD) is a challenging task that requires the assessment of several motor and non-motor functions. In this paper, we present an end-to-end deep learning framework to measure PD severity in two important components, hand movement and gait, of the Unified Parkinsons Disease Rating Scale (UPDRS). Our method leverages on an Inflated 3D CNN trained by a temporal segment framework to learn spatial and long temporal structure in video data. We also deploy a temporal attention mechanism to boost the performance of our model. Further, motion boundaries are explored as an extra input modality to assist in obfuscating the effects of camera motion for better movement assessment. We ablate the effects of different data modalities on the accuracy of the proposed network and compare with other popular architectures. We evaluate our proposed method on a dataset of 25 PD patients, obtaining 72.3% and 77.1% top-1 accuracy on hand movement and gait tasks respectively.
Download

Paper Nr: 76
Title:

Improved HTM Spatial Pooler with Homeostatic Plasticity Control

Authors:

Damir Dobric, Andreas Pech, Bogdan Ghita and Thomas Wennekers

Abstract: Hierarchical Temporal Memory (HTM) - Spatial Pooler (SP) is a Learning Algorithm for learning of spatial patterns inspired by the neo-cortex. It is designed to learn the pattern in a few iteration steps and to generate the Sparse Distributed Representation (SDR) of the input. It encodes spatially similar inputs into the same or similar SDRs memorized as a population of active neurons organized in groups called micro-columns. Findings in this research show that produced SDRs can be forgotten during the training progress, which causes the SP to learn the same pattern again and converts into the new SDR. This work shows that instable learning behaviour of the SP is caused by the internal boosting algorithm inspired by the homeostatic plasticity mechanism. Previous findings in neurosciences show that this mechanism is only active during the development of new-born mammals and later deactivated or shifted from cortical layer L4, where the SP is supposed to be active. The same mechanism was used in this work. The SP algorithm was extended with the new homeostatic plasticity component that controls the boosting and deactivates it after entering the stable state. Results show that learned SDRs remain stable during the lifetime of the Spatial Pooler.
Download

Paper Nr: 95
Title:

Converting Image Labels to Meaningful and Information-rich Embeddings

Authors:

Savvas Karatsiolis and Andreas Kamilaris

Abstract: A challenge of the computer vision community is to understand the semantics of an image that will allow for higher quality image generation based on existing high-level features and better analysis of (semi-) labeled datasets. Categorical labels aggregate a huge amount of information into a binary value which conceals valuable high-level concepts from the Machine Learning models. Towards addressing this challenge, this paper introduces a method, called Occlusion-based Latent Representations (OLR), for converting image labels to meaningful representations that capture a significant amount of data semantics. Besides being information-rich, these representations compose a disentangled low-dimensional latent space where each image label is encoded into a separate vector. We evaluate the quality of these representations in a series of experiments whose results suggest that the proposed model can capture data concepts and discover data interrelations.
Download

Paper Nr: 98
Title:

Exploring Slow Feature Analysis for Extracting Generative Latent Factors

Authors:

Max Menne, Merlin Schüler and Laurenz Wiskott

Abstract: In this work, we explore generative models based on temporally coherent representations. For this, we incorporate Slow Feature Analysis (SFA) into the encoder of a typical autoencoder architecture. We show that the latent factors extracted by SFA, while allowing for meaningful reconstruction, also result in a well-structured, continuous and complete latent space – favorable properties for generative tasks. To complete the generative model for single samples, we demonstrate the construction of suitable prior distributions based on inherent characteristics of slow features. The efficacy of this method is illustrated on a variant of the Moving MNIST dataset with increased number of generation parameters. By the use of a forecasting model in latent space, we find that the learned representations are also suitable for the generation of image sequences.
Download

Short Papers
Paper Nr: 2
Title:

State Tracking in the Presence of Heavy-tailed Observations

Authors:

Yaman Kindap

Abstract: In this paper, we define a state-space model with discrete latent states and a multivariate heavy-tailed observation density for applications in tracking the state of a system with observations including extreme deviations from the median. We use a Gaussian distribution with an unknown variance parameter which has a Gamma distribution prior depending on the state of the system to model the observation density. The key contribution of the paper is the theoretical formulation of such a state-space model which makes use of scale mixtures of Gaussians to yield an exact inference method. We derive the framework for estimation of the states and how to estimate the parameters of the model. We demonstrate the performance of the model on synthetically generated data sets.
Download

Paper Nr: 8
Title:

Active Output Selection Strategies for Multiple Learning Regression Models

Authors:

Adrian Prochaska, Julien Pillas and Bernard Bäker

Abstract: Active learning shows promise to decrease test bench time for model-based drivability calibration. This paper presents a new strategy for active output selection, which suits the needs of calibration tasks. The strategy is actively learning multiple outputs in the same input space. It chooses the output model with the highest cross-validation error as leading. The presented method is applied to three different toy examples with noise in a real world range and to a benchmark dataset. The results are analyzed and compared to other existing strategies. In a best case scenario, the presented strategy is able to decrease the number of points by up to 30 % compared to a sequential space-filling design while outperforming other existing active learning strategies. The results are promising but also show that the algorithm has to be improved to increase robustness for noisy environments. Further reasearch will focus on improving the algorithm and applying it to a real-world example.
Download

Paper Nr: 9
Title:

Continuous Driver Activity Recognition from Short Isolated Action Sequences

Authors:

Patrick Weyers and Anton Kummert

Abstract: Advanced driver monitoring systems significantly increase safety by detecting driver drowsiness or distraction. Knowing the driver’s current state or actions allows for adaptive warning strategies or prediction of the driver’s response time to take back the control of a semi-autonomous vehicle. We present an online driver monitoring system for detecting characteristic actions and states inside a car interior by analysing the full driver seat region. With the proposed training method, a recurrent neural network for online sequence analysis is capable of learning from isolated action sequences only. The proposed method allows training of a recurrent neural network from snippets of actions, while this network can be applied to continuous video streams at runtime. With a mean average precision of 0.77, we reach better classification results on our test data than commonly used methods.
Download

Paper Nr: 12
Title:

Task Specific Image Enhancement for Improving the Accuracy of CNNs

Authors:

Norbert Mitschke, Yunou Ji and Michael Heizmann

Abstract: Choosing an appropriate pre-processing and image enhancement step for CNNs can have a positive effect on the performance. Pre-processing and image enhancement are in contrast to augmentation deterministically applied on every image of a data set and can be interpreted as a normalizing way to construct invariant features. In this paper we present a method that determines the optimal composition and strength of various image enhancement methods by a neural network with a new type of layer that learns the parameters of optimal image enhancement. We apply this procedure on different image classification data sets, which leads to an improvement of the information content of the images with respect to the specific task and thus also to an improvement of the resulting test accuracy. For example, we can reduce the classification error for our benchmark data sets clearly.
Download

Paper Nr: 15
Title:

Improving the Grid-based Clustering by Identifying Cluster Center Nodes and Boundary Nodes Adaptively

Authors:

Yaru Li, Yue Xi and Yonggang Lu

Abstract: Clustering analysis is a data analysis technology, which divides data objects into different clusters according to the similarity between them. The density-based clustering methods can identify clusters with arbitrary shapes, but its time complexity can be very high with the increasing of the number and the dimension of the data points. The grid-based clustering methods are usually used to deal with the problem. However, the performance of these grid-based methods is often affected by the identification of the cluster center and boundary based on global thresholds. Therefore, in this paper, an adaptive grid-based clustering method is proposed, in which the definition of cluster center nodes and boundary nodes is based on relative density values between data points, without using a global threshold. First, the new definitions of the cluster center nodes and boundary nodes are given, and then the clustering results are obtained by an initial clustering process and a merging process of the ordered grid nodes according to the density values. Experiments on several synthetic and real-world datasets show the superiority of the proposed method.
Download

Paper Nr: 20
Title:

Revisiting the Deformable Convolution by Visualization

Authors:

Yuqi Zhang, Yuyang Xie, Linfeng Luo and Fengming Cao

Abstract: The deformable convolution improves the performance by a large margin across various tasks in computer vision. The detailed analysis of the deformable convolution attracts less attention than the application of it. To strengthen the understanding of the deformable convolution, the offset fields of the deformable convolution in object detectors are visualized with proposed visualizing methods. After projecting the offset fields to the feature map coordinates, we find that the displacement condenses the features of each object to the object center and it learns to segment objects even without segmentation annotations. Meanwhile, projecting the offset fields to the kernel coordinates demonstrates that the displacement inside each kernel is able to predict the size of the object on it. The two findings indicate the offset field learns to predict the location and the size of the object, which are crucial in understanding the image. The visualization in this work explicitly shows the power of the deformable convolution by decoding the information in the offset fields. The ablation studies of the two projections of the offset fields reveal that the projection in the kernel viewpoint contributes mostly in current object detectors.
Download

Paper Nr: 21
Title:

Interpreting Convolutional Networks Trained on Textual Data

Authors:

Reza Marzban and Christopher Crick

Abstract: There have been many advances in the artificial intelligence field due to the emergence of deep learning. In almost all sub-fields, artificial neural networks have reached or exceeded human-level performance. However, most of the models are not interpretable. As a result, it is hard to trust their decisions, especially in life and death scenarios. In recent years, there has been a movement toward creating explainable artificial intelligence, but most work to date has concentrated on image processing models, as it is easier for humans to perceive visual patterns. There has been little work in other fields like natural language processing. In this paper, we train a convolutional model on textual data and analyze the global logic of the model by studying its filter values. In the end, we find the most important words in our corpus to our model’s logic and remove the rest (95%). New models trained on just the 5% most important words can achieve the same performance as the original model while reducing training time by more than half. Approaches such as this will help us to understand NLP models, explain their decisions according to their word choices, and improve them by finding blind spots and biases.
Download

Paper Nr: 29
Title:

Synergy Conformal Prediction for Regression

Authors:

Niharika Gauraha and Ola Spjuth

Abstract: Large and distributed data sets pose many challenges for machine learning, including requirements on computational resources and training time. One approach is to train multiple models in parallel on subsets of data and aggregate the resulting predictions. Large data sets can then be partitioned into smaller chunks, and for distributed data sets the need for pooling can be avoided. Combining results from conformal predictors using synergy rules has been shown to have advantageous properties for classification problems. In this paper we extend the methodology to regression problems, and we show that it produces valid and efficient predictors compared to inductive conformal predictors and cross-conformal predictors for 10 different data sets from the UCI machine learning repository using three different machine learning methods. The approach offers a straightforward and compelling alternative to pooling data, such as when working in distributed environments.
Download

Paper Nr: 37
Title:

Lifting Sequence Length Limitations of NLP Models using Autoencoders

Authors:

Reza Marzban and Christopher Crick

Abstract: Natural Language Processing (NLP) is an important subfield within Machine Learning, and various deep learning architectures and preprocessing techniques have led to many improvements. Long short-term memory (LSTM) is the most well-known architecture for time series and textual data. Recently, models like Bidirectional Encoder Representations from Transformers (BERT), which rely on pre-training with unsupervised data and using transfer learning, have made a huge impact on NLP. All of these models work well on short to average-length texts, but they are all limited in the sequence lengths they can accept. In this paper, we propose inserting an encoder in front of each model to overcome this limitation. If the data contains long texts, doing so substantially improves classification accuracy (by around 15% in our experiments). Otherwise, if the corpus consists of short texts which existing models can handle, the presence of the encoder does not hurt performance. Our encoder can be applied to any type of model that deals with textual data, and it will empower the model to overcome length limitations.
Download

Paper Nr: 43
Title:

Reduced Precision Strategies for Deep Learning: A High Energy Physics Generative Adversarial Network Use Case

Authors:

Florian Rehm, Sofia Vallecorsa, Vikram Saletore, Hans Pabst, Adel Chaibi, Valeriu Codreanu, Kerstin Borras and Dirk Krücker

Abstract: Deep learning is finding its way into high energy physics by replacing traditional Monte Carlo simulations. However, deep learning still requires an excessive amount of computational resources. A promising approach to make deep learning more efficient is to quantize the parameters of the neural networks to reduced precision. Reduced precision computing is extensively used in modern deep learning and results to lower execution inference time, smaller memory footprint and less memory bandwidth. In this paper we analyse the effects of low precision inference on a complex deep generative adversarial network model. The use case which we are addressing is calorimeter detector simulations of subatomic particle interactions in accelerator based high energy physics. We employ the novel Intel low precision optimization tool (iLoT) for quantization and compare the results to the quantized model from TensorFlow Lite. In the performance benchmark we gain a speed-up of 1.73x on Intel hardware for the quantized iLoT model compared to the initial, not quantized, model. With different physics-inspired self-developed metrics, we validate that the quantized iLoT model shows a lower loss of physical accuracy in comparison to the TensorFlow Lite model.
Download

Paper Nr: 45
Title:

Few-Shot Class Incremental Learning with Generative Feature Replay

Authors:

Abhilash R. Shankarampeta and Koichiro Yamauchi

Abstract: The humans can learn novel concepts from only a few examples effortlessly and learn additional tasks without forgetting previous ones. Making machines to learn incrementally from only a few instances is very challenging due to catastrophic forgetting between new and previously learned tasks; this can be solved by generative image replay. However, image generation with only a few examples is a challenging task. In this work, we propose a feature replay approach instead of image replay for few-shot learning scenarios. A feature extractor with feature distillation is combined with feature replay at the classifier level to tackle catastrophic forgetting.
Download

Paper Nr: 47
Title:

Seam Carving for Image Classification Privacy

Authors:

James Pope and Mark Terwilliger

Abstract: The advent of storing images on cloud platforms has introduced serious privacy concerns. The images are routinely scanned by machine learning algorithms to determine the contents. Usually the scanning is for marketing purposes but more malevolent purposes include criminal activity and government surveillance. The images are automatically analysed by machine learning algorithms. Notably, deep convolutional neural networks perform very well at identifying image classes. Obviously, the images could be encrypted before storing to cloud platforms and then decrypted after downloading. This would certainly obfuscate the images. However, many users prefer to be able to peruse the images on the cloud platform. This creates a difficult problem in which users prefer images stored in a way so that a human can understand them but machine learning algorithms cannot. This paper proposes a novel technique, termed seam doppelganger, for formatting images using seam carving to identify seams for replacement. The approach degrades typical image classification performance in order to provide privacy while leaving the image human-understandable. Furthermore, the technique can be largely reversed providing a reasonable facsimile of the original image. Using the ImageNet database for birds, we show how the approach degrades a state-of-the-art residual network (ResNet50) for various amounts of seam replacements.
Download

Paper Nr: 68
Title:

Locating Datacenter Link Faults with a Directed Graph Convolutional Neural Network

Authors:

Michael P. Kenning, Jingjing Deng, Michael Edwards and Xianghua Xie

Abstract: Datacenters alongside many domains are well represented by directed graphs, and there are many datacenter problems where deeply learned graph models may prove advantageous. Yet few applications of graph-based convolutional neural networks (GCNNs) to datacenters exist. Few of the GCNNs in the literature are explicitly designed for directed graphs, partly owed to the relative dearth of GCNNs designed specifically for directed graphs. We present therefore a convolutional operation for directed graphs, which we apply to learning to locate the faulty links in datacenters. Moreover, since the detection problem would be phrased as link-wise classification, we propose constructing a directed linegraph, where the problem is instead phrased as a vertex-wise classification. We find that our model detects more link faults than the comparison models, as measured by McNemar’s test, and outperforms the comparison models in respect of the F1-score, precision and recall.
Download

Paper Nr: 80
Title:

Symbolic Translation of Time Series using Piecewise N-gram Similarity Voting

Authors:

Siegfried Delannoy, Émilie P. Caillault, André Bigand and Kevin Rousseeuw

Abstract: This paper studies a way to discriminate user behaviour from their viewed pages in a web-application. This technique is on similarity measure selection and time sequence splitting techniques. Using temporal splitting techniques, the proposed similarity measures greatly improve the result accuracy. We applied these ones on several datasets from the well known UCR Archive and our research is focused on a private dataset (ORIENTOI) and a public one called UCR-CBF. Some of the proposed temporal tricks appear to make similarity measures efficient with noises. They make them possible to deal with repeating terms, which is a drawback for most of the similarity measures. Thus the similarity measures are shown to reach the state of the art on UCR datasets. We also evaluated the proposed technique on our private (ORIENTOI) dataset with success. We finally discuss about the weakness of our method and the ways to improve it.
Download

Paper Nr: 87
Title:

Discrete Wavelet based Features for PCG Signal Classification using Hidden Markov Models

Authors:

Rima Touahria, Abdenour Hacine-Gharbi and Philippe Ravier

Abstract: This paper proposes the use of several features based on Discrete Wavelet Transform as novel descriptors for the application of classifying normal or abnormal phonocardiogram (PCG) signals, using Hidden Markov Models (HMM). The feature extraction of the first descriptor called “DWE” consists in converting each PCG signal into a sequence of features vectors. Each vector is composed of the energy of the wavelet coefficients computed at each decomposition level from an analysis window. The second descriptor “LWE” consists in applying the logarithm of DWE features, while the third descriptor “WCC” applies the DCT on the LWE features vector. This work aims to find the relevant descriptor using PCG Classification Rate criterion. This is achieved by implementing a standard system of classification using the HMM classifier combined with MFCC features descriptor. Each class is modeled by HMM model associated to GMM model. Several experiences are carried out to find the best configuration of HMM models and to select the optimal mother wavelet with its optimal decomposition level. The results obtained from a comparative study, have shown that the LWE descriptor using Daubechies wavelets at order 2 at level 7, gives the highest performance classification rate, with a more compact features representation than the MFCC descriptor.
Download

Paper Nr: 90
Title:

New Maximum Similarity Method for Object Identification in Photon Counting Imaging

Authors:

V. E. Antsiperov

Abstract: The paper discusses a new approach to recognition / identification of the test objects according to their intensity shape in the images registered by photon counting detectors. The main problem analyzed within the framework of the proposed approach is related to the identification decision (inference ) based on a registered set of discrete photocounts (p̃hotons) regarding the similarity of the shape of the object's intensity in the image to the shape of previously observed objects (precedents). It is shown that when the intensity shape is approximated by a mixture of Gaussian components within the framework of this approach, a recurrent identification algorithm can be synthesized, similar to the well-known K-means clustering algorithm in the machine (statistical) learning.
Download

Paper Nr: 91
Title:

SIGRNN: Synthetic Minority Instances Generation in Imbalanced Datasets using a Recurrent Neural Network

Authors:

Reda Al-Bahrani, Dipendra Jha, Qiao Kang, Sunwoo Lee, Zijiang Yang, Wei-Keng Liao, Ankit Agrawal and Alok Choudhary

Abstract: Machine learning models trained on imbalanced datasets tend to produce sub-optimal results. This happens because the learning of the minority classes is dominated by the learning of the majority class. Recommendations to overcome this obstacle include oversampling the minority class by synthesizing new instances and using different performance measures. We propose a novel approach to handle the imbalance in datasets by using a sequence-to-sequence recurrent neural network to synthesize minority class instances. The generative neural network is trained on the minority class instances to learn its data distribution; the generative neural network is then used to synthesize minority class instances; these instances are used to augment the original dataset and balance the minority class. We evaluate our proposed approach against several imbalanced datasets. We train Decision Tree models on the original and augmented datasets and compare their results against the Synthetic Minority Over-sampling TEchnique (SMOTE), Adaptive Synthetic sampling (ADASYN) and Synthetic Minority Over-sampling TEchnique-Nominal Continuous (SMOTE-NC). All results are an average of multiple runs and the results are compared across four different performance metrics. SIGRNN performs well compared to SMOTE and ADASYN, specifically in lower percentage increments to the minority class. Also, SIGRNN outperforms SMOTE-NC on datasets having nominal features.
Download

Paper Nr: 99
Title:

On the Improvement of Feature Selection Techniques: The Fitness Filter

Authors:

Artur J. Ferreira and Mário T. Figueiredo

Abstract: The need for feature selection (FS) techniques is central in many machine learning and pattern recognition problems. FS is a vast research field and therefore we now have many FS techniques proposed in the literature, applied in the context of quite different problems. Some of these FS techniques follow the relevance-redundancy (RR) framework to select the best subset of features. In this paper, we propose a supervised filter FS technique, named as fitness filter, that follows the RR framework and uses data discretization. This technique can be used directly on low or medium dimensional data or it can be applied as a post-processing technique to other FS techniques. Specifically, when used as a post-processing technique, it further reduces the dimensionality of the feature space found by common FS techniques and often improves the classification accuracy.
Download

Paper Nr: 7
Title:

Indextron

Authors:

Alexei Mikhailov and Mikhail Karavay

Abstract: How to do pattern recognition without artificial neural networks, Bayesian classifiers, vector support machines and other mechanisms that are widely used for machine learning? The problem with pattern recognition machines is time and energy demanding training because lots of coefficients need to be worked out. The paper introduces an indexing model that performs training by memorizing inverse patterns mostly avoiding any calculations. The computational experiments indicate the potential of the indexing model for artificial intelligence applications and, possibly, its relevance to neurobiological studies as well.
Download

Paper Nr: 10
Title:

Empirical Evaluation on Utilizing CNN-features for Seismic Patch Classification

Authors:

Chun-Xia Zhang, Xiao-Li Wei and Sang-Woon Kim

Abstract: This paper empirically evaluates two kinds of features, which are extracted respectively with neural networks and traditional statistical methods, to improve the performance of seismic patch image classification. The convolutional neural networks (CNNs) are now the state-of-the-art approach for a lot of applications in various fields, including computer vision and pattern recognition. In relation to feature extraction, it turns out that generic feature descriptors extracted from CNNs, named CNN-features, are very powerful. It is also well known that combining CNN-features with traditional (non)linear classifiers improves classification performance. In this paper, the above classification scheme was applied to seismic patch classification application. CNN-features were acquired first and then used to learn SVMs. Experiments using synthetic and real-world seismic patch data demonstrated some improvement in classification performance, as expected. To find out why the classification performance improved when using CNN-features, data complexities of the traditional feature extraction techniques like PCA and the CNN-features were measured and compared. From this comparison, we confirmed that the discriminative power of the CNN-features is the strongest. In particular, the use of transfer learning techniques to obtain CNN’s architectures to extract the CNN-features greatly reduced the extraction time without sacrificing the discriminative power of the extracted features.
Download

Paper Nr: 28
Title:

An ALPR System-based Deep Networks for the Detection and Recognition

Authors:

Mouad Bensouilah, Mohamed N. Zennir and Mokhtar Taffar

Abstract: Automatic license plates reading (ALPR), from images or videos, is a research topic that is still relevant in the field of computer vision. In this article, we propose a new dataset and a robust ALPR system based on the YOLO object detector of literature. The trained Convolutional Neural Networks (CNN) allow us to extract features from license plates and label them through Recurrent Neural Networks (RNN) specialized character recognition. RNN are supported by GRU units instead of LSTM units that are generally used in the literature. The experiments results were conclusive reaching a recognition rate of 92%.
Download

Paper Nr: 31
Title:

Automated Machine Learning for Wind Farms Location

Authors:

Olivier Parisot and Thomas Tamisier

Abstract: Automated Machine Learning aims at preparing effective Machine Learning models with little or no data science expertise. Tedious tasks like preprocessing, algorithm selection and hyper-parameters optimization are then automatized: end-users just have to apply and deploy the model that best suits the real world problem. In this paper, we experiment Automated Machine Learning to leverage open data sources for predicting potential next wind farms location in Luxembourg, France, Belgium and Germany.
Download

Paper Nr: 40
Title:

Optimization of Image Embeddings for Few Shot Learning

Authors:

Arvind Srinivasan, Aprameya Bharadwaj, Manasa Sathyan and S. Natarajan

Abstract: In this paper, we improve the image embeddings generated in the graph neural network solution for few shot learning. We propose alternate architectures for existing networks such as Inception-Net, U-Net, Attention U-Net, and Squeeze-Net to generate embeddings and increase the accuracy of the models. We improve the quality of embeddings created at the cost of the time taken to generate them. The proposed implementations outperform the existing state of the art methods for 1-shot and 5-shot learning on the Omniglot dataset. The experiments involved a testing set and training set which had no common classes between them. The results for 5-way and 10-way/20-way tests have been tabulated.
Download

Paper Nr: 42
Title:

A Comparison of Few-shot Classification of Human Movement Trajectories

Authors:

Lisa Gutzeit

Abstract: In the active research area of human action recognition, a lot of different approaches to classify behavior have been proposed and evaluated. However, evaluations on movement recognition with a limited number of training examples, also known as Few-shot classification, are rare. In many applications, the generation of labeled training data is expensive. Manual efforts can be reduced if algorithms are used which give reliable results on small datasets. In this paper, three recognition methods are compared on gesture and stick-throwing movements of different complexity performed individually without detailed instructions in experiments in which the number of the examples used for training is limited. Movements were recorded with marker-based motion capture systems. Three classification algorithms, the Hidden Markov Model, Long Short-Term Memory network and k-Nearest Neighbor, are compared on their performance in recognition of these arm movements. The methods are evaluated regarding accuracy with limited training data, computation time and generalization to different subjects. The best results regarding training with a small number of examples and generalization are achieved with LSTM classification. The shortest calculation times are observed with k-NN classification, which shows also very good classification accuracies on data of low complexity.
Download

Paper Nr: 49
Title:

Domain Shift in Capsule Networks

Authors:

Rajath S., Sumukh A. K. and S. Natarajan

Abstract: Capsule Networks are an exciting deep learning architecture which overcomes some of the shortcomings of Convolutional Neural Networks (CNNs). Capsule networks aim to capture spatial relationships between parts of an object and exhibits viewpoint invariance. In practical computer vision, the training data distribution is different from the test distribution and the covariate shift affects the performance of the model. This problem is called Domain Shift. In this paper, we analyze how well capsule networks adapt to new domains by experimenting with multiple routing algorithms and comparing it with CNNs.
Download

Paper Nr: 50
Title:

Web based Object Annotation Tool using a Triplet-ReID Sorting Approach

Authors:

Afonso Costa, André L. Ferreira and João M. Fernandes

Abstract: The robustness of the object detection methods has seen an increasing attention, which leads to a desire for more control over the training and testing phases. In practice, the need for labelling unique objects present on a dataset can be of help. However, manually labelling datasets of considerable size can be impractical. This paper describes an approach to improve labelling information of a dataset by supporting an object re-identification task. The primary objective is to find repeated objects in the dataset. The proposed solution relies on a web-based application that allows the user to choose which of the similar objects returned by the Triplet-ReID method are in fact the same as the query object. The effectiveness of the method was tested on a dataset with considerable object variability. Experimental results show a viable sorting performance associated with considerable speed improvement when compared to a traditional labelling approach. In fact, a dataset with 55 unique objects in a total of 1098 images would take 18 hours with a traditional tool and 12 hours with proposed one. Moreover, given the generic architecture of the developed framework, it can certainly be applied to a wide range of use cases.
Download

Paper Nr: 51
Title:

Surface EMG Signal Classification for Parkinson’s Disease using WCC Descriptor and ANN Classifier

Authors:

Hichem Bengacemi, Abdenour Hacine-Gharbi, Philippe Ravier, Karim Abed-Meraim and Olivier Buttelli

Abstract: To increase the diagnostic accuracy, artificial intelligence techniques can be used as a medical support. The Electromyography (EMG) signals are used in the neuromuscular dysfunction evaluation. The aim of this paper is to construct an automatic system of neuromuscular dysfunction identification in the case of the Parkinson disease based on surface EMG (sEMG) signals. Our proposed system uses artificial neural network method (ANN) to discriminate healthy EMG signals (normal) from abnormal EMG signals (Parkinson). After detecting the EMG activity regions using Fine Modified Adaptive Linear Energy Detecor (FM-ALED) method, Discrete Wavelet Transform (DWT) has been used for feature extraction. An experimental analysis is carried out using ECOTECH’s project dataset using principally the Accuracy (Acc). Moreover, a multi-class neural networks classification system combined with the voting rule and Wavelet Cepstral Coefficient (WCC) for healthy and Parkinsonian subjects identification has been developed. The diagnosis accuracy assessment is carried out by conducting various experiments on surface EMG signals. Proposed methodology leads to a classification accuracy of 100%.
Download

Paper Nr: 55
Title:

Predicting Malware Attacks using Machine Learning and AutoAI

Authors:

Mark Sokolov and Nic Herndon

Abstract: Machine learning is one of the fastest-growing fields and its application to cybersecurity is increasing. In order to protect people from malicious attacks, several machine learning algorithms have been used to predict them. In addition, with the increase of malware threats in our world, a lot of companies use AutoAI to help protect their systems. However, when a dataset is large and sparse, conventional machine learning algorithms and AutoAI don’t generate the best results. In this paper, we propose an Ensemble of Light Gradient Boosted Machines to predict malware attacks on computing systems. We use a dataset provided by Microsoft to show that this proposed method achieves an increase in accuracy over AutoAI.
Download

Paper Nr: 58
Title:

Knowledge Acquisition on Team Management Aimed at Automation with Use of the System of Organizational Terms

Authors:

Olaf Flak

Abstract: The aim of the paper is to present a new approach to knowledge acquisition on team management based on the original methodological concept called the system of organizational terms. The topic of knowledge acquisition on team management is important because of a lack of development in managerial work automation in recent years. The scientific problem is how to acquire knowledge on team management in the holistic, coherent and formalized way and how to represent team management in order to automate it. Both aspects of this scientific problem are described in this paper. On the one hand there is a common perspective met in management studies, and on the other hand also the original perspective of the system of organizational terms was presented. In the paper there is also a short description of a solution for this scientific problem and examples of previous research verifying the system of organizational terms as a method of knowledge acquisition on team management and team management representation aimed at automation this area of human life.
Download

Paper Nr: 78
Title:

Opinion Mining using TRC Techniques

Authors:

Nirach Romyen, Sureeporn Nualnim, Maleerat Maliyaem, Pudsadee Boonrawd, Kanchana Viriyapant and Tongpool Heeptaisong

Abstract: Sentiment analysis is a recent research field in Natural Language Processing (NLP). Text mining and computational techniques determine the sentiment discovered from text. This paper proposes a sentiment analysis using the Text-Representing Centroid (TRC). TRC is a method to determine minimum average distance to all words of the respective document, it also deploys a co-occurrence graph to represent existing relationships among terms in a customer’s reviews on particular products and services. A corpus that contains 800 randomly selected hotel reviews from TripAdvisor website is used to evaluate performance by comparison between TRC method and expert’s judgment review. The results show 75% accuracy over Thai customer’s reviews.
Download

Paper Nr: 92
Title:

A Spatial-temporal Graph based Hybrid Infectious Disease Model with Application to COVID-19

Authors:

Yunling Zheng, Zhijian Li, Jack Xin and Guofa Zhou

Abstract: As the COVID-19 pandemic evolves, reliable prediction plays an important role in policymaking. The classical infectious disease model SEIR (susceptible-exposed-infectious-recovered) is a compact yet simplistic temporal model. The data-driven machine learning models such as RNN (recurrent neural networks) can suffer in case of limited time series data such as COVID-19. In this paper, we combine SEIR and RNN on a graph structure to develop a hybrid spatio-temporal model to achieve both accuracy and efficiency in training and forecasting. We introduce two features on the graph structure: node feature (local temporal infection trend) and edge feature (geographic neighbor effect). For node feature, we derive a discrete recursion (called I-equation) from SEIR so that gradient descend method applies readily to its optimization. For edge feature, we design an RNN model to capture the neighboring effect and regularize the landscape of loss function so that local minima are effective and robust for prediction. The resulting hybrid model (called IeRNN) improves the prediction accuracy on state-level COVID-19 new case data from the US, out-performing standard temporal models (RNN, SEIR, and ARIMA) in 1-day and 7-day ahead forecasting. Our model accommodates various degrees of reopening and provides potential outcomes for policymakers.
Download

Area 2 - Applications

Full Papers
Paper Nr: 34
Title:

Movement Control with Vehicle-to-Vehicle Communication by using End-to-End Deep Learning for Autonomous Driving

Authors:

Zelin Zhang and Jun Ohya

Abstract: In recent years, autonomous driving through deep learning has gained more and more attention. This paper proposes a novel Vehicle-to-Vehicle (V2V) communication based autonomous vehicle driving system that takes advantage of both spatial and temporal information. The proposed system consists of a novel combination of CNN layers and LSTM layers for controlling steering angle and speed by taking advantage of the information from both the autonomous vehicle and cooperative vehicle. The CNN layers process the input sequential image frames, and the LSTM layers process historical data to predict the steering angle and speed of the autonomous vehicle. To confirm the validity of the proposed system, we conducted experiments for evaluating the MSE of the steering angle and vehicle speed using the Udacity dataset. Experimental results are summarized as follows. (1) “with a cooperative car” significantly works better than “without”. (2) Among all the network, the Res-Net performs the best. (3) Utilizing the LSTM with Res-Net, which processes the historical motion data, performs better than “no LSTM”. (4) As the number of inputted sequential frames, eight frames turn out to work best. (5) As the distance between the autonomous host and cooperative vehicle, ten to forty meters turn out to achieve the robust result on the autonomous driving movement control.
Download

Paper Nr: 35
Title:

Flexcoder: Practical Program Synthesis with Flexible Input Lengths and Expressive Lambda Functions

Authors:

Bálint Gyarmathy, Bálint Mucsányi, Ádám Czapp, Dávid Szilágyi and Balázs Pintér

Abstract: We introduce a flexible program synthesis model to predict function compositions that transform given inputs to given outputs. We process input lists in a sequential manner, allowing our system to generalize to a wide range of input lengths. We separate the operator and the operand in the lambda functions to achieve significantly wider parameter ranges compared to previous works. The evaluations show that this approach is competitive with state-of-the-art systems while it’s much more flexible in terms of the input length, the lambda functions, and the integer range of the inputs and outputs. We believe that this flexibility is an important step towards solving real-world problems with example-based program synthesis.
Download

Paper Nr: 38
Title:

Single Stage Class Agnostic Common Object Detection: A Simple Baseline

Authors:

Chuong H. Nguyen, Thuy C. Nguyen, Anh H. Vo and Yamazaki Masayuki

Abstract: This paper addresses the problem of common object detection, which aims to detect objects of similar categories from a set of images. Although it shares some similarities with the standard object detection and co-segmentation, common object detection, recently promoted by (Jiang et al., 2019), has some unique advantages and challenges. First, it is designed to work on both closed-set and open-set conditions, a.k.a. known and unknown objects. Second, it must be able to match objects of the same category but not restricted to the same instance, texture, or posture. Third, it can distinguish multiple objects. In this work, we introduce the Single Stage Common Object Detection (SSCOD) to detect class-agnostic common objects from an image set. The proposed method is built upon the standard single-stage object detector. Furthermore, an embedded branch is introduced to generate the object’s representation feature, and their similarity is measured by cosine distance. Experiments are conducted on PASCAL VOC 2007 and COCO 2014 datasets. While being simple and flexible, our proposed SSCOD built upon ATSSNet performs significantly better than the baseline of the standard object detection, while still be able to match objects of unknown categories. Our source code can be found at (URL).
Download

Paper Nr: 41
Title:

TrajNet: An Efficient and Effective Neural Network for Vehicle Trajectory Classification

Authors:

Jiyong Oh, Kil-Taek Lim and Yun-Su Chung

Abstract: Vehicle trajectory classification plays an important role in intelligent transportation systems because it can be utilized in traffic flow estimation at an intersection and anomaly detection such as traffic accidents and violations of traffic regulations. In this paper, we propose a new neural network architecture for vehicle trajectory classification by modifying the PointNet architecture, which was proposed for point cloud classification and semantic segmentation. The modifications are derived based on analyzing the differences between the properties of vehicle trajectory and point cloud. We call the modified network TrajNet. It is demonstrated from experiments using three public datasets that TrajNet can classify vehicle trajectories faster and more slightly accurate than the conventional networks used in the previous studies.
Download

Paper Nr: 46
Title:

Video Camera Identification from Sensor Pattern Noise with a Constrained ConvNet

Authors:

Derrick Timmerman, Guru S. Bennabhaktula, Enrique Alegre and George Azzopardi

Abstract: The identification of source cameras from videos, though it is a highly relevant forensic analysis topic, has been studied much less than its counterpart that uses images. In this work we propose a method to identify the source camera of a video based on camera specific noise patterns that we extract from video frames. For the extraction of noise pattern features, we propose an extended version of a constrained convolutional layer capable of processing color inputs. Our system is designed to classify individual video frames which are in turn combined by a majority vote to identify the source camera. We evaluated this approach on the benchmark VISION data set consisting of 1539 videos from 28 different cameras. To the best of our knowledge, this is the first work that addresses the challenge of video camera identification on a device level. The experiments show that our approach is very promising, achieving up to 93.1% accuracy while being robust to the WhatsApp and YouTube compression techniques. This work is part of the EU-funded project 4NSEEK focused on forensics against child sexual abuse.
Download

Paper Nr: 54
Title:

Weakly Supervised Gleason Grading of Prostate Cancer Slides using Graph Neural Network

Authors:

Nan Jiang, Yaqing Hou, Dongsheng Zhou, Pengfei Wang, Jianxin Zhang and Qiang Zhang

Abstract: Gleason grading of histopathology slides has been the “gold standard” for diagnosis, treatment and prognosis of prostate cancer. For the heterogenous Gleason score 7, patients with Gleason score 3+4 and 4+3 show a significant statistical difference in cancer recurrence and survival outcomes. Considering patients with Gleason score 7 reach up to 40% among all prostate cancers diagnosed, the question of choosing appropriate treatment and management strategy for these people is of utmost importance. In this paper, we present a Graph Neural Network (GNN) based weakly supervised framework for the classification of Gleason score 7. First, we construct the slides as graphs to capture both local relations among patches and global topological information of the whole slides. Then GNN based models are trained for the classification of heterogeneous Gleason score 7. According to the results, our approach obtains the best performance among existing works, with an accuracy of 79.5% on TCGA dataset. The experimental results thus demonstrate the significance of our proposed method in performing the Gleason grading task.
Download

Paper Nr: 60
Title:

A Blended Attention-CTC Network Architecture for Amharic Text-image Recognition

Authors:

Birhanu H. Belay, Tewodros Habtegebrial, Marcus Liwicki, Gebeyehu Belay and Didier Stricker

Abstract: In this paper, we propose a blended Attention-Connectionist Temporal Classification (CTC) network architecture for a unique script, Amharic, text-image recognition. Amharic is an indigenous Ethiopic script that uses 34 consonant characters with their 7 vowel variants of each and 50 labialized characters which are derived, with a small change, from the 34 consonant characters. The change involves modifying the structure of these characters by adding a straight line, or shortening and/or elongating one of its main legs including the addition of small diacritics to the right, left, top or bottom of the character. Such a small change affects orthographic identities of character and results in shape similarly among characters which are interesting, but challenging task, for OCR research. Motivated with the recent success of attention mechanism on neural machine translation tasks, we propose an attention-based CTC approach which is designed by blending attention mechanism directly within the CTC network. The proposed model consists of an encoder module, attention module and transcription module in a unified framework. The efficacy of the proposed model on the Amharic language shows that attention mechanism allows learning powerful representations by integrating information from different time steps. Our method outperforms state-of-the-art methods and achieves 1.04% and 0.93% of the character error rate on ADOCR test datasets.
Download

Paper Nr: 72
Title:

Multimodal Sentiment Analysis on Video Streams using Lightweight Deep Neural Networks

Authors:

Atitaya Yakaew, Matthew N. Dailey and Teeradaj Racharak

Abstract: Real-time sentiment analysis on video streams involves classifying a subject’s emotional expressions over time based on visual and/or audio information in the data stream. Sentiment can be analyzed using various modalities such as speech, mouth motion, and facial expression. This paper proposes a deep learning approach based on multiple modalities in which extracted features of an audiovisual data stream are fused in real time for sentiment classification. The proposed system comprises four small deep neural network models that analyze visual features and audio features concurrently. We fuse the visual and audio sentiment features into a single stream and accumulate evidence over time using an exponentially-weighted moving average to make a final prediction. Our work provides a promising solution to the problem of building real-time sentiment analysis systems that have constrained software or hardware capabilities. Experiments on the Ryerson audio-video database of emotional speech (RAVDESS) show that deep audiovisual feature fusion yields substantial improvements over analysis of either single modality. We obtain an accuracy of 90.74%, which is better than baselines of 11.11% – 31.48% on a challenging test dataset.
Download

Paper Nr: 75
Title:

Active Region Detection in Multi-spectral Solar Images

Authors:

Majedaldein Almahasneh, Adeline Paiement, Xianghua Xie and Jean Aboudarham

Abstract: Precisely detecting solar Active Regions (AR) from multi-spectral images is a challenging task yet important in understanding solar activity and its influence on space weather. A main challenge comes from each modality capturing a different location of these 3D objects, as opposed to more traditional multi-spectral imaging scenarios where all image bands observe the same scene. We present a multi-task deep learning framework that exploits the dependencies between image bands to produce 3D AR detection where different image bands (and physical locations) each have their own set of results. We compare our detection method against baseline approaches for solar image analysis (multi-channel coronal hole detection, SPOCA for ARs (Verbeeck et al., 2013)) and a state-of-the-art deep learning method (Faster RCNN) and show enhanced performances in detecting ARs jointly from multiple bands.
Download

Short Papers
Paper Nr: 5
Title:

Estimating the Probability Density Function of New Fabrics for Fabric Anomaly Detection

Authors:

Oliver Rippel, Maximilian Müller, Andreas Münkel, Thomas Gries and Dorit Merhof

Abstract: Image-based quality control aims at detecting anomalies (i.e. defects) in products. Supervised, data driven approaches have greatly improved Anomaly Detection (AD) performance, but suffer from a major drawback: they require large amounts of annotated training data, limiting their economic viability. In this work, we challenge and overcome this limitation for complex patterned fabrics. Investigating the structure of deep feature representations learned on a large-scale fabric dataset, we find that fabrics form clusters according to their fabric type, whereas anomalies form a cluster on their own. We leverage this clustering behavior to estimate the Probability Density Function (PDF) of new, previously unseen fabrics, in the deep feature representations directly. Using this approach, we outperform supervised and semi-supervised AD approaches trained on new fabrics, requiring only defect-free data for PDF-estimation.
Download

Paper Nr: 26
Title:

Data Fusion of Histological and Immunohistochemical Image Data for Breast Cancer Diagnostics using Transfer Learning

Authors:

Pranita Pradhan, Katharina Köhler, Shuxia Guo, Olga Rosin, Jürgen Popp, Axel Niendorf and Thomas W. Bocklitz

Abstract: A combination of histological and immunohistochemical tissue features can offer better breast cancer diagnosis as compared to histological tissue features alone. However, manual identification of histological and immunohistochemical tissue features for cancerous and healthy tissue requires an enormous human effort which delays the breast cancer diagnosis. In this paper, breast cancer detection using the fusion of histological (H&E) and immunohistochemical (PR, ER, Her2 and Ki-67) imaging data based on deep convolutional neural networks (DCNN) was performed. DCNNs, including the VGG network, the residual network and the inception network were comparatively studied. The three DCNNs were trained using two transfer learning strategies. In transfer learning strategy 1, a pre-trained DCNN was used to extract features from the images of five stain types. In transfer learning strategy 2, the images of the five stain types were used as inputs to a pre-trained multi-input DCNN, and the last layer of the multi-input DCNN was optimized. The results showed that data fusion of H&E and IHC imaging data could increase the mean sensitivity at least by 2% depending on the DCNN model and the transfer learning strategy. Specifically, the pre-trained inception and residual networks with transfer learning strategy 1 achieved the best breast cancer detection.
Download

Paper Nr: 30
Title:

Enhancing Phase Mapping for High-throughput X-ray Diffraction Experiments using Fuzzy Clustering

Authors:

Dipendra Jha, K. V. Narayanachari, Ruifeng Zhang, Denis T. Keane, Wei-keng Liao, Alok Choudhary, Yip-Wah Chung, Michael J. Bedzyk and Ankit Agrawal

Abstract: X-ray diffraction (XRD) is a widely used experiment in materials science to understand the composition-structure-property relationships of materials for designing and discovering new materials. A key aspect of XRD analysis is that the composition-phase diagram is composed of not only pure phases but also their mixed phases. Hard clustering approach treats the mixed phases as separate independent clusters from their constituent pure phases, hence, resulting in incorrect phase diagrams which complicate the next steps. Here, we present a novel clustering approach of XRD patterns by leveraging a fuzzy clustering technique that can significantly enhance the potential phase mapping and reduce the manual efforts involved in XRD analysis. The proposed approach first generates an initial composition-phase diagram and initial pure phase representations by applying the fuzzy c-means clustering algorithm, followed by hierarchical clustering to accomplish effortless manual merging of similar initial pure phases to generate the final composition-phase diagram. The proposed method is evaluated on the XRD samples from two high-throughput composition-spread experiments of Co-Ni-Ta and Co-Ti-Ta ternary alloy systems. Our results demonstrate significant improvement compared to hard clustering and almost completely eliminate manual efforts.
Download

Paper Nr: 44
Title:

Deep Learning based Model Identification System Exploits the Modular Structure of a Bio-inspired Posture Control Model for Humans and Humanoids

Authors:

Vittorio Lippi

Abstract: This work presents a system identification procedure based on Convolutional Neural Networks (CNN) for human posture control using the DEC (Disturbance Estimation and Compensation) parametric model. The modular structure of the proposed control model inspired the design of a modular identification procedure, in the sense that the same neural network is used to identify the parameters of the modules controlling different degrees of freedom. In this way the presented examples of body sway induced by external stimuli provide several training samples at once.
Download

Paper Nr: 61
Title:

Automated Detection of COVID-19 from CT Scans using Convolutional Neural Networks

Authors:

Rohit Lokwani, Ashrika Gaikwad, Viraj Kulkarni, Anirudha Pant and Amit Kharat

Abstract: COVID-19 is an infectious disease that causes respiratory problems similar to those caused by SARS-CoV (2003). In this paper, we propose a prospective screening tool wherein we use chest CT scans to diagnose the patients for COVID-19 pneumonia. We use a set of open-source images, available as individual CT slices, and full CT scans from a private Indian Hospital to train our model. We build a 2D segmentation model using the U-Net architecture, which gives the output by marking out the region of infection. Our model achieves a sensitivity of 0.96 (95% CI: 0.88-1.00) and a specificity of 0.88 (95% CI: 0.82-0.94). Additionally, we derive a logic for converting our slice-level predictions to scan-level, which helps us reduce the false positives.
Download

Paper Nr: 63
Title:

The Importance of Models in Data Analysis with Small Human Movement Datasets: Inspirations from Neurorobotics Applied to Posture Control of Humanoids and Humans

Authors:

Vittorio Lippi, Christoph Maurer and Thomas Mergner

Abstract: Machine learning has shown impressive improvements recently, thanks especially to the results shown in deep learning applications. Besides important advancements in the theory, such improvements have been associated with an increment in the complexity of the used models (i.e. the numbers of neurons and connections in neural networks). Bigger models are possible given the amount of data used in the training process is increased accordingly. In medical applications, however, the size of datasets is often limited by the availability of human subjects and the effort required to perform human experiments. This position paper proposes the integration of bioinspired models with machine learning.
Download

Paper Nr: 69
Title:

Applying Automated Machine Learning to Improve Budget Estimates for a Naval Fleet Maintenance Facility

Authors:

Cheryl Eisler and Mikayla Holmes

Abstract: A study was undertaken to improve the accuracy of staffing overtime budget predictions for a naval fleet maintenance facility and identify primary factors associated with overtime accrual. A series of models based on facility work orders were developed using the R statistical suite and the open source package H2O.ai for automated machine learning. Along with the model's predictive capabilities for budgetary planning, primary work order attributes associated with overtime hours were also determined based on the variables of importance. These gave insight into the type of maintenance and personnel assigned to the maintenance task which contributed to the highest accrual of overtime hours. Additionally, the monthly best curve fit for past budget predictions revealed a sigmoidal relationship, which was used to assist in the prediction of fiscal year 2019/2020 budget. The budget estimate from the model was found to be within 5% of the total budget expended hours over the fiscal year. As new annual data are provided or additional facilities examined, the models can be retrained or rebuilt to include new information and allow decision makers to prepare more accurate funding estimates – potentially reserving funds for upcoming critical maintenance tasks or saving funds through alternative approaches to task management.
Download

Paper Nr: 71
Title:

Individual Action and Group Activity Recognition in Soccer Videos from a Static Panoramic Camera

Authors:

Beerend Gerats, Henri Bouma, Wouter Uijens, Gwenn Englebienne and Luuk Spreeuwers

Abstract: Data and statistics are key to soccer analytics and have important roles in player evaluation and fan engagement. Automatic recognition of soccer events - such as passes and corners - would ease the data gathering process, potentially opening up the market for soccer analytics at non professional clubs. Existing approaches extract events on group level only and rely on television broadcasts or recordings from multiple camera viewpoints. We propose a novel method for the recognition of individual actions and group activities in panoramic videos from a single viewpoint. Three key contributions in the proposed method are (1) player snippets as model input, (2) independent extraction of spatio-temporal features per player, and (3) feature contextualisation using zero-padding and feature suppression in graph attention networks. Our method classifies video samples in eight action and eleven activity types, and reaches accuracies above 75% for ten of these classes.
Download

Paper Nr: 73
Title:

Time-First Tracking: An Efficient Multiple-Object Tracking Architecture for Dynamic Surveillance Environments

Authors:

Joachim Lohn-Jaramillo, Khari-Elijah Jarrett, Laura Ray, Richard Granger and Elijah Bowen

Abstract: Given the countless hours of video that are generated in surveillance environments, real-time for multi-object tracking (MOT) is vastly insufficient. Current MOT methods prioritize tracking accuracy in crowded environments, with little concern for total computational expense, which has led to a reliance on expensive object detectors to perform tracking. Indiscriminate use of object detectors is not scalable for surveillance problems and ignores the inherent spatio-temporal variation in scene complexity in many real-world environments. A novel MOT method is proposed, termed “Time-First Tracking”, which relies on “shallowly” processed motion with a new tracking method, leaving the use of expensive object detection methods to an “as-needed” basis. The resulting vast reduction in pixels-processed may yield orders of magnitude in cost savings, making MOT more tractable. Time-First Tracking is adaptable to spatio-temporal changes in tracking difficulty; videos are divided into spatio-temporal sub-volumes, rated with different tracking difficulties, that are subsequently processed with different object localization methods. New MOT metrics are proposed to account for cost along with code to create a synthetic MOT dataset for motion-based tracking.
Download

Paper Nr: 89
Title:

A High Accuracy Text Detection Model of Newly Constructing and Training Strategies

Authors:

Kha C. Nguyen and Ryosuke Odate

Abstract: Normally, text recognition systems include two main parts: text detection and text recognition. Text detection is a prerequisite and has a big impact on the performance of text recognition. In this paper, we propose a high-accuracy model for detecting text-lines on a receipt dataset. We focus on the three most important points to improve the performance of the model: anchor boxes for locating text regions, backbone networks to extract features, and a suppression method to select the best fitting bounding box for each text region. Specifically, we propose a clustering method to determine anchor boxes and apply novel convolution neural networks for feature extraction. These two points are the newly constructing strategies of the model. Besides, we propose a training strategy to make the model output angles of text-lines, then revise bounding boxes with the angles before applying the suppression method. This strategy is to detect skewed and downward/upward curved text-lines. Our model outperforms other best models submitted to the ICDAR 2019 competition with the detection rate of 98.87% (F1 score) so that we can trust the model for detecting text-lines automatically. These strategies are also flexible to apply for other datasets of various domains.
Download

Paper Nr: 16
Title:

Garment Detection in Catwalk Videos

Authors:

Qi Dang, Heydar M. Afkham and Oskar Juhlin

Abstract: Most computer vision applications in the commercial scene lack a large scale and properly annotated dataset. The solution to these applications relies on already published code and knowledge transfer from existing computer vision datasets. In most cases, these applications sacrifice proper benchmarking of the solution and rely on the performance of used methods from their respective papers. In this paper, we are focusing on how we can use the existing code base and the datasets in computer vision to address a hypothetical application of detecting garments in the catwalk videos. We proposed a combination of methods that allows us to localize garments in complex scenery by only training models on public datasets. To understand which method performs best for our application, we have designed a relative-benchmark framework that requires very little manual annotation to work.
Download

Paper Nr: 19
Title:

Object Tracking using Correction Filter Method with Adaptive Feature Selection

Authors:

Xiang Zhang, Yonggang Lu and Jiani Liu

Abstract: Correlation filter based tracking algorithms have shown favourable performance in recent years. Nonetheless, the fixed feature selection and potential model drift limit their effectiveness. In this paper, we propose a novel adaptive feature selection based tracking method which keeps the strong discriminating ability of the correlation filter. The proposed method can automatically select either the HOG feature or color feature for tracking based on the confidence scores of the features in each frame. Firstly, the response map of the color features and the HOG features are extracted respectively using correlation filter. The Lab color space is used to extract the color features which separate the luminance from the color. Secondly, the confidence region and the possible location of the target are estimated using the average peak-to-correlation energy. Thirdly, three criteria are used to select the proper feature for the current frame to perform tracking adaptively. The experimental results demonstrate that the proposed tracker performs superiorly comparing with several state-of-the-art algorithms on the OTB benchmark datasets.
Download

Paper Nr: 23
Title:

Upgraded W-Net with Attention Gates and Its Application in Unsupervised 3D Liver Segmentation

Authors:

Dhanunjaya Mitta, Soumick Chatterjee, Oliver Speck and Andreas Nürnberger

Abstract: Segmentation of biomedical images can assist radiologists to make a better diagnosis and take decisions faster by helping in the detection of abnormalities, such as tumors. Manual or semi-automated segmentation, however, can be a time-consuming task. Most deep learning based automated segmentation methods are supervised and rely on manually segmented ground-truth. A possible solution for the problem would be an unsupervised deep learning based approach for automated segmentation, which this research work tries to address. We use a W-Net architecture and modified it, such that it can be applied to 3D volumes. In addition, to suppress noise in the segmentation we added attention gates to the skip connections. The loss for the segmentation output was calculated using soft N-Cuts and for the reconstruction output using SSIM. Conditional Random Fields were used as a post-processing step to fine-tune the results. The proposed method has shown promising results, with a dice coefficient of 0.88 for the liver segmentation compared against manual segmentation.
Download

Paper Nr: 33
Title:

An Empirical Study on Machine Learning Models for Potato Leaf Disease Classification using RGB Images

Authors:

Soma Ghosh, Renu Rameshan and Dileep A. D.

Abstract: In this work, an empirical study is conducted on classification models built using RGB images of potato leaves. A series of experiments are done by training convolutional neural network (CNN) and support vector machine (SVM) using images captured in laboratory and field conditions and processed samples of images captured in field. A salient region based segmentation algorithm is devised to generate processed version of the images captured in field which performed well with respect to manually segmented ground truth of the dataset. Severe inconsistencies are observed in experimental results, particularly when train and test samples of models are similar images but captured under different environmental conditions. Following the analysis of obtained results, we come up with a set of clear directions to create an image dataset, which can lead to a reliable classification accuracy.
Download

Paper Nr: 36
Title:

Outlier Detection in Network Traffic Monitoring

Authors:

Marcin Michalak, Łukasz Wawrowski, Marek Sikora, Rafał Kurianowicz, Artur Kozłowski and Andrzej Białas

Abstract: Network traffic monitoring becomes, year by year, an increasingly more important branch of network infrastructure maintenance. There exist many dedicated tools for on-line network traffic monitoring that can defend the typical (and known) types of attacks by blocking some parts of the traffic immediately. However, there may occur some yet unknown risks in network traffic whose statistical description should be reflected as slowin-time changing characteristics. Such non-rapidly changing variable values probably should not be detectable by on–line tools. Still, it is possible to detect these changes with the data mining method. In the paper the popular anomaly detection methods with the application of the moving window procedure are presented as one of the approaches for anomaly (outlier) detection in network traffic monitoring. The paper presents results obtained on the real outer traffic data, collected in the Institute.
Download

Paper Nr: 39
Title:

Quantitative Method for Evaluating the Coordination between Sprinting Motions using Joint Coordinates Obtained from the Videos and Cross-correlations

Authors:

Masato Sabanai, Chanjin Seo, Hiroyuki Ogata and Jun Ohya

Abstract: This paper proposes a method for quantitatively evaluating sprinting motions using the videos of runners. Specifically, this paper explores the coordination between physical motions, which has been recognized as very important in sprinting. After detecting and normalizing the joint coordinates from sprinting videos, the cross-correlations of two windowed time-series data are calculated using the windowing cross-correlation function, and the coordination between the motions of the two joints is quantified. Experiments that use 20 subjects are conducted. As a result of classifying the cross-correlation obtained from the subjects’ data into two clusters using k-means clustering, conditions in which the obtained cluster includes a high percentage of inexperienced sprinters are found. To verify whether the motions corresponding to these conditions are valid as the evaluation criterion of sprinting, Spearman’s rank correlation coefficients between cross-correlations and 30-m time records are calculated. The results show a weak correlation with respect to the coordination between the elbow and knee motions. Therefore, it can be said that the cross-correlation corresponding to the coordination can be used as a quantitative criterion in sprinting.
Download

Paper Nr: 48
Title:

Common Topic Identification in Online Maltese News Portal Comments

Authors:

Samuel Zammit, Fiona Sammut and David Suda

Abstract: This paper aims to identify common topics in a dataset of online news portal comments made between April 2008 and January 2017 on the Times of Malta website. By making use of the FastText algorithm, Word2Vec is used to obtain word embeddings for each unique word in the dataset. Furthermore, document vectors are also obtained for each comment, where again similar comments are assigned similar representations. The resulting word and document embeddings are also clustered using k-means clustering to identify common topic clusters. The results obtained indicate that the majority of comments follow a political theme related either to party politics, foreign politics, corruption, issues of an ideological nature, or other issues. Comments related to themes such as sports, arts and culture were not common, except around years with major events. Additionally, a number of topics were identified as being more prevalent during some time periods rather than others. These include the Maltese divorce referendum in 2011, the Maltese citizenship scheme in 2013, Russia’s annexation of Crimea in 2014, Brexit in 2015 and corruption/Panama Papers in 2016.
Download

Paper Nr: 56
Title:

Data Scarcity: Methods to Improve the Quality of Text Classification

Authors:

Ingo Glaser, Shabnam Sadegharmaki, Basil Komboz and Florian Matthes

Abstract: Legal document analysis is an important research area. The classification of clauses or sentences enables valuable insights such as the extraction of rights and obligations. However, datasets consisting of contracts or other legal documents are quite rare, particularly regarding the German language. The exorbitant cost of manually labeled data, especially in regard to text classification, is the motivation of many studies that suggest alternative methods to overcome the lack of labeled data. This paper experiments the effects of text data augmentation on the quality of classification tasks. While a large amount of techniques exists, this work examines a selected subset including semi-supervised learning methods and thesaurus-based data augmentation. We could not just show that thesaurus-based data augmentation as well as text augmentation with synonyms and hypernyms can improve the classification results, but also that the effect of such methods depends on the underlying data structure.
Download

Paper Nr: 62
Title:

Automatic Diagnosis of COPD in Lung CT Images based on Multi-View DCNN

Authors:

Yin Bao, Yasseen Al Makady and Sasan Mahmoodi

Abstract: Chronic obstructive pulmonary disease (COPD) has long been one of the leading causes of morbidity and mortality worldwide. Numerous studies have shown that CT image analysis is an effective way to diagnose patients with COPD. Automatic diagnosis of CT images using computer vision will shorten the time a patient takes to confirm COPD. This enables patients to receive timely treatment. CT images are three-dimensional data. The extraction of 3D texture features is the core of classification problem. However, the classification accuracy of the current computer vision models is still not high when extracting these features. Therefore, computer vision assisted diagnosis has not been widely used. In this paper, we proposed MV-DCNN, a multi-view deep neural network based on 15 directions. The experimental results show that compared with the state-of-art methods, this method significantly improves the accuracy of COPD classification, with an accuracy of 97.7%. The model proposed here can be used in the medical institutions for diagnosis of COPD.
Download

Paper Nr: 77
Title:

Object Detection and Text Recognition in Large-scale Technical Drawings

Authors:

Trang M. Nguyen, Long Van Pham, Chien C. Nguyen and Vinh Van Nguyen

Abstract: In this digital transformation era, the demand for automatic pattern extraction from printed materials has never been higher, making it one of the most eminent problems nowadays. In this paper, we propose a new method for pattern recognition in highly complex technical drawings. Our method is a pipeline system that includes two phases: (1) detecting the objects that contain the patterns of interest with improvements to processing large-scale image, and (2) performing character recognition on the objects if they are text patterns with improvements to post-processing task. Our experiments on nearly five thousand real technical drawings show promising results and the capability to reduce manual labeling effort to a great extent.
Download

Paper Nr: 81
Title:

Graph Convolution Networks for Cell Segmentation

Authors:

Sachin Bahade, Michael Edwards and Xianghua Xie

Abstract: Graph signal processing is an emerging field in deep learning, aiming to solve various non-Euclidean domain problems. Pathologist have difficulty detecting diseases at an early stage due to the limitations of clinical methods and image analysis. For more accurate diagnosis of disease and early detection, automated segmentation can play a vital role. However, efficiency and accuracy of the system depends on how the model learned. We have found that traditional machine-learning methods, such as clustering and thresholding, are unsuited for precise cell segmentation. Furthermore, the recent development of deep-learning techniques has demonstrated promising results, especially for medical images. In this paper, we proposed two graph-based convolution methods for cell segmentation to improve analysis of immunostained slides. Our proposed methods use advanced deep-learning, spectral-, and spatial-based graph signal processing approaches to learn features. We have compared our results with state-of-the-art fully convolutional networks(FCN) method and found a significant of improvement of 2.2% in the spectral-based approach and 3.94% in the spatial-based approach in pixel based accuracy.
Download

Paper Nr: 84
Title:

Experimental Application of a Japanese Historical Document Image Synthesis Method to Text Line Segmentation

Authors:

Naoto Inuzuka and Tetsuya Suzuki

Abstract: We plan to use a text line segmentation method based on machine learning in our transcription support system for handwritten Japanese historical document in Kana, and are searching for a data synthesis method of annotated document images because it is time consuming to manually annotate a large set of document images for training data for machine learning. In this paper, we report our synthesis method of annotated document images designed for a Japanese historical document. To compare manually annotated Japanese historical document images and annotated document images synthesized by the method as training data for an object detection algorithm YOLOv3, we conducted text line segmentation experiments using the object detection algorithm. The experimental results show that a model trained by the synthetic annotated document images are competitive with that trained by the manually annotated document images from the view point of a metric intersection-over-union.
Download