ICPRAM 2020 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 7
Title:

A Hierarchical Convolution Neural Network Scheme for Radar Pulse Detection

Authors:

Van L. Do, Ha K. Nguyen, Dat T. Ngo and Ha Q. Nguyen

Abstract: The detection of radar pulses plays a critical role in passive radar systems since it provides inputs for other algorithms to localize and identify emitting targets. In this paper, we propose a hierarchical convolution neural network (CNN) to detect narrowband radar pulses of various waveforms and pulse widths at different noise levels. The scheme, named DeepIQ, takes a fixed-length segment of raw IQ samples as inputs and estimates the time of arrival (TOA) and the time of departure (TOD) of the radar pulse, if any, appearing in the segment. The estimated TOAs and TODs are then combined across segments to form a sequential detection mechanism. The DeepIQ scheme consists of sub-networks performing three different tasks: segment classification, denoising and edge detection. The proposed scheme is a full deep learning-based solution and thus, does not require any noise floor estimation process, as opposed to the commonly used Threshold-based Edge Detection (TED) methods. Simulation results show that the proposed solution significantly outperforms other schemes, especially under severe noise levels.
Download

Paper Nr: 30
Title:

Self-Training using Selection Network for Semi-supervised Learning

Authors:

Jisoo Jeong, Seungeui Lee and Nojun Kwak

Abstract: Semi-supervised learning (SSL) is a study that efficiently exploits a large amount of unlabeled data to improve performance in conditions of limited labeled data. Most of the conventional SSL methods assume that the classes of unlabeled data are included in the set of classes of labeled data. In addition, these methods do not sort out useless unlabeled samples and use all the unlabeled data for learning, which is not suitable for realistic situations. In this paper, we propose an SSL method called selective self-training (SST), which selectively decides whether to include each unlabeled sample in the training process. It is designed to be applied to a more real situation where classes of unlabeled data are different from the ones of the labeled data. For the conventional SSL problems which deal with data where both the labeled and unlabeled samples share the same class categories, the proposed method not only performs comparable to other conventional SSL algorithms but also can be combined with other SSL algorithms. While the conventional methods cannot be applied to the new SSL problems, our method does not show any performance degradation even if the classes of unlabeled data are different from those of the labeled data.
Download

Paper Nr: 32
Title:

Variational Inference of Dirichlet Process Mixture using Stochastic Gradient Ascent

Authors:

Kart-Leong Lim

Abstract: The variational inference of Bayesian mixture models such as the Dirichlet process mixture is not scalable to very large datasets, since the learning is based on computing the entire dataset each iteration. Recently, scalable version notably the stochastic variational inference, addresses this issue by performing local learning from randomly sampled batches of the full dataset or minibatch each iteration. The main problem with stochastic variational inference is that it still relies on the closed form update in variational inference to work. Stochastic gradient ascent is a modern approach to machine learning and it is widely deployed in the training of deep neural networks. It has two interesting properties. Firstly it runs on minibatch and secondly, it does not rely on closed form update to work. In this work, we explore using stochastic gradient ascent as a baseline for learning Bayesian mixture models such as Dirichlet process mixture. However, stochastic gradient ascent alone is not optimal for learning in terms of convergence. Instead, we turn our focus to stochastic gradient ascent techniques that use decaying step-size to optimize the convergence. We consider two methods here. The commonly known momentum approach and the natural gradient approach which uses an adaptive step-size through computing Fisher information. We also show that our new stochastic gradient ascent approach for training Dirichlet process mixture is compatible with deep ConvNet features and applicable to large scale datasets such as the Caltech256 and SUN397. Lastly, we justify our claims when comparing our method to an existing closed form learner for Dirichlet process mixture on these datasets.
Download

Paper Nr: 39
Title:

Structure Preserving Encoding of Non-euclidean Similarity Data

Authors:

Maximilian Münch, Christoph Raab, Michael Biehl and Frank-Michael Schleif

Abstract: Domain-specific proximity measures, like divergence measures in signal processing or alignment scores in bioinformatics, often lead to non-metric, indefinite similarities or dissimilarities. However, many classical learning algorithms like kernel machines assume metric properties and struggle with such metric violations. For example, the classical support vector machine is no longer able to converge to an optimum. One possible direction to solve the indefiniteness problem is to transform the non-metric (dis-)similarity data into positive (semi-)definite matrices. For this purpose, many approaches have been proposed that adapt the eigenspectrum of the given data such that positive definiteness is ensured. Unfortunately, most of these approaches modify the eigenspectrum in such a strong manner that valuable information is removed or noise is added to the data. In particular, the shift operation has attracted a lot of interest in the past few years despite its frequently reoccurring disadvantages. In this work, we propose a modified advanced shift correction method that enables the preservation of the eigenspectrum structure of the data by means of a low-rank approximated nullspace correction. We compare our advanced shift to classical eigenvalue corrections like eigenvalue clipping, flipping, squaring, and shifting on several benchmark data. The impact of a low-rank approximation on the data’s eigenspectrum is analyzed.
Download

Paper Nr: 42
Title:

On the Similarity between Hidden Layers of Pruned and Unpruned Convolutional Neural Networks

Authors:

Alessio Ansuini, Eric Medvet, Felice A. Pellegrino and Marco Zullich

Abstract: During the last few decades, artificial neural networks (ANN) have achieved an enormous success in regression and classification tasks. The empirical success has not been matched with an equally strong theoretical understanding of such models, as some of their working principles (training dynamics, generalization properties, and the structure of inner representations) still remain largely unknown. It is, for example, particularly difficult to reconcile the well known fact that ANNs achieve remarkable levels of generalization also in conditions of severe over-parametrization. In our work, we explore a recent network compression technique, called Iterative Magnitude Pruning (IMP), and apply it to convolutional neural networks (CNN). The pruned and unpruned models are compared layer-wise with Canonical Correlation Analysis (CCA). Our results show a high similarity between layers of pruned and unpruned CNNs in the first convolutional layers and in the fully-connected layer, while for the intermediate convolutional layers the similarity is significantly lower. This suggests that, although in intermediate layers representation in pruned and unpruned networks is markedly different, in the last part the fully-connected layers act as pivots, producing not only similar performances but also similar representations of the data, despite the large difference in the number of parameters involved.
Download

Paper Nr: 55
Title:

Computation of the φ-Descriptor in the Case of 2D Vector Objects

Authors:

Jason Kemp, Tyler Laforet and Pascal Matsakis

Abstract: The spatial relations between objects, a part of everyday speech, are capable of being described within an image via a Relative Position Descriptor (RPD). The φ-descriptor, a recently introduced RPD, encapsulates more spatial information than other popular descriptors. However, only algorithms for determining the φdescriptor of raster objects exist currently. In this paper, the first algorithm for the computation of the φdescriptor in the case of 2D vector objects is introduced. The approach used is based on the concept of Points of Interest (which are points on the boundaries of the objects where elementary spatial relations change) and dividing the objects into regions according to their corresponding relationships. The capabilities of the algorithm have been tested and verified against an existing φ-descriptor algorithm for raster objects. The new algorithm is intended to show the versatility of the φ-descriptor.
Download

Paper Nr: 56
Title:

Fast Fourier Transform based Force Histogram Computation for 3D Raster Data

Authors:

Jaspinder Kaur, Tyler Laforet and Pascal Matsakis

Abstract: The force histogram is a quantitative representation of the relative position between two objects. Two practical algorithms have been previously introduced to compute the force histogram between objects: the line-based algorithm (which works well with 2D data, but is computationally unstable in the case of 3D data), and the Fast Fourier Transform (FFT)-based algorithm (which is inefficient in the case of 2D data, but has not been implemented for 3D data). In this paper, an efficient FFT-based algorithm for force histogram computation in the case of 3D raster data is introduced. Its computation time is compared against that of the 3D line-based algorithm; except in a few cases, the computation time for new FFT-based algorithm is less than that of 3D line-based algorithm. The experiments validate that the FFT-based algorithm is computationally efficient regardless of the number of directions, type of forces, and shape of the objects (convex, concave, disjoint or overlapping).
Download

Paper Nr: 80
Title:

Comparison of Algorithms for Tree-top Detection in Drone Image Mosaics of Japanese Mixed Forests

Authors:

Yago Diez, Sarah Kentsch, Maximo L. Caceres, Ha T. Nguyen, Daniel Serrano and Ferran Roure

Abstract: Counting trees is a common problem in forest applications often solved by performing field studies that are exceedingly cost-intensive in time and manpower. Consequently, many researchers have used computer vision techniques to automatically detect trees by finding tree tops. The success of these algorithms is highly dependent on the data that they are used on. We present a study using data acquired by ourselves in a natural mixed forest using an Unmanned Aerial Vehicle (UAV). Given the particularly challenging nature of our data, we developed a pre-processing step aimed at preparing the data so that it could be used with six common clustering algorithms to detect tree tops. Extensive experiments using data covering over 40 ha is presented and tree detection accuracy, tree counting metrics and computation and use time considerations are taken into account. Our algorithms detect over 80% with high location accuracy and up to 90% with lower accuracy. Tree counting errors range from 8% to 14% for most methods. Data Acquisition and runtime considerations show how this techniques are ready to have an immediate impact in the processing of real forest data.
Download

Paper Nr: 81
Title:

A New Diversity Maintenance Strategy based on the Double Granularity Grid for Multiobjective Optimization

Authors:

Junzhong Ji, Yannan Weng and Cuicui Yang

Abstract: The diversity maintenance of nondominated solutions is crucial for solving multiobjective optimization problems. The grid strategy is a very effective way to maintain the diversity of nodominated solutions, but the existing grid strategies all adopt single-layer grid structure, which has weak ability for judging the distribution of nodominated solutions in the hyperboxes with the same crowding degree. To further explore the ability of the grid strategy for maintaining the diversity of nondominated solutions, this paper presents a new diversity maintenance strategy based on the double granularity grid. The double granularity grid strategy firstly partitions the hyperboxes with the same largest crowding degree into fine granularity hyperboxes. Then, it selects nondominated individual solutions according to the solution distribution in both coarse and fine granularity hyperboxes, which can avoids randomness for selecting individual solutions in the single grid structure. To validate the performance of the double granularity grid strategy, we first integrated it with two famous algorithms, then tested the two integration algorithms by comparing them with the original algorithms and four other state-of-the-art algorithms.The experimental results validate the powerful advantages of the proposed double granularity grid strategy.
Download

Paper Nr: 86
Title:

Automatic Segmentation of Necrosis Zones after Radiofrequency Ablation of Spinal Metastases

Authors:

Johannes Steffen, Georg Hille, Mathias Becker, Sylvia Saalfeld and Klaus Tönnies

Abstract: In this work, we propose an automatic deep learning-based approach to segment necrotizing tissue (necrosis zones) after radiofrequency ablations (RFA) of spinal metastases in follow-up Magnetic Resonance (MR) images. While the manual segmentation of those necrosis zones is challenging and time consuming, it is a crucial step to assess, whether a preceding therapy using RFA was successful and to what extent, i.e., to quantitatively evaluate how much of the metastasis was necrotized throughout the therapy. Therefore, we trained a U-Net like deep neural network on 26 clinical cases (and various augmentations of those), where each case had an associated contrast enhanced T1-weighted as well as a T2-weighted MR sequence. We evaluated the proposed approach on both sequences separately as well as in a combined setting and report Dice coefficients, sensitivity-, and specificity rates for the automatic segmentations. A Dice coefficient of up to 77.2 % indicates promising segmentation quality, if compared to related work and similar segmentation tasks. To the best of our knowledge, this is the first work to tackle the problem of automatic segmentation of necrosis zones in MR images and therefore lacks comparability with related works. However, our best results are somewhat superior to semi-automatic approaches of liver metastases segmentation, which might be considered a problem of similar complexity.
Download

Paper Nr: 100
Title:

JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks

Authors:

N. B. Erichson, Zhewei Yao and Michael W. Mahoney

Abstract: It has been demonstrated that very simple attacks can fool highly-sophisticated neural network architectures. In particular, so-called adversarial examples, constructed from perturbations of input data that are small or imperceptible to humans but lead to different predictions, may lead to an enormous risk in certain critical applications. In light of this, there has been a great deal of work on developing adversarial training strategies to improve model robustness. These training strategies are very expensive, in both human and computational time. To complement these approaches, we propose a very simple and inexpensive strategy which can be used to “retrofit” a previously-trained network to improve its resilience to adversarial attacks. More concretely, we propose a new activation function—the JumpReLU—which, when used in place of a ReLU in an already-trained model, leads to a trade-off between predictive accuracy and robustness. This trade-off is controlled by the jump size, a hyper-parameter which can be tuned during the validation stage. Our empirical results demonstrate that this increases model robustness, protecting against adversarial attacks with substantially increased levels of perturbations. This is accomplished simply by retrofitting existing networks with our JumpReLU activation function, without the need for retraining the model. Additionally, we demonstrate that adversarially trained (robust) models can greatly benefit from retrofitting.
Download

Paper Nr: 102
Title:

Learned and Hand-crafted Feature Fusion in Unit Ball for 3D Object Classification

Authors:

Sameera Ramasinghe, Salman Khan and Nick Barnes

Abstract: Convolution is an effective technique that can be used to obtain abstract feature representations using hierarchical layers in deep networks. However, performing convolution in non-Euclidean topological spaces such as the unit ball (B3) is still an under-explored problem. In this paper, we propose a light-weight experimental architecture for 3D object classification, that operates in B3. The proposed network utilizes both hand-crafted and learned features, and uses capsules in the penultimate layer to disentangle 3D shape features through pose and view equivariance. It simultaneously maintains an intrinsic co-ordinate frame, where mutual relationships between object parts are preserved. Furthermore, we show that the optimal view angles for extracting patterns from 3D objects depend on its shape and achieve compelling results with a relatively shallow network, compared to the state-of-the-art.
Download

Short Papers
Paper Nr: 9
Title:

Multiple Ellipse Detection by using RANSAC and DBSCAN Method

Authors:

Kristian Sabo and Rudolf Scitovski

Abstract: In this paper we consider one and multiple ellipse (represented as a Mahalanobis circle) detection problem on the basis of data points coming from one or several ellipses not known in advance. For solving one ellipse detection problem two methods are mentioned. These methods are used by solving the multiple ellipse detection problem. The method proposed in this paper is based on the well-known RANSAC method using the parameters MinPts and ε from the DBSCAN method. In this way the efficiency of choosing the best ellipse among N ellipses given by the RANSAC method is improved. The local density ρ=̂ |π̂| | Ê| is determined for each obtained ellipse Eŵith circumference | Ê| and corresponding cluster πŵith |π̂| elements . If local density ρîs smaller than lower bound MinPts 2ε of the local density of the whole set A, the ellipse Eŵill be dropped. In order to obtain the final solution, an Adaptive Mahalanobis k-means algorithm is applied on the remaining ellipses. The method is illustrated on several examples with artificial data point sets and also on a few real images.
Download

Paper Nr: 16
Title:

Aerial Radar Target Classification using Artificial Neural Networks

Authors:

Guy Ardon, Or Simko and Akiva Novoselsky

Abstract: In this paper, we propose a new algorithm for classification of aerial radar targets by using Radar Cross Section (RCS) time-series corresponding to target detections of a given track. RCS values are obtained directly from SNR values, according to the radar equation. The classification is based on analysing the behaviour of the RCS time-series, which is the unique “fingerprint” of an aerial radar target. The classification process proposed in this paper is based on training a fully-connected neural network on features extracted from the RCS time-series and its corresponding Intrinsic Mode Functions (IMFs). The training is based on a database containing RCS signatures of various aerial targets. The algorithm has been tested on a large and diverse set of simulative flight trajectories, and its performance has been compared with that of several different methods. We have found that the proposed neural network-based classifier performed better on our database.
Download

Paper Nr: 17
Title:

Subject-independent Pain Recognition using Physiological Signals and Para-linguistic Vocalizations

Authors:

Nadeen Shoukry, Omar Elkilany, Patrick Thiam, Viktor Kessler and Friedhelm Schwenker

Abstract: Pain is the result of a complex interaction among the various parts of the human nervous system. It plays an important role in the diagnosis and treatment of patients. The standard method for pain recognition is self-report; however, not all patients can communicate pain effectively. In this work, the task of automated pain recognition is addressed using para-linguistic and physiological data. Hand-crafted and automatically generated features are extracted and evaluated independently. Several state-of-the-art machine learning algorithms are applied to perform subject-independent binary classification. The SenseEmotion dataset is used for evaluation and comparison. Random forests trained on hand-crafted features from the physiological modalities achieved an accuracy of 82.61%, while support vector machines trained on hand-crafted features from the para-linguistic data achieved an accuracy of 63.86%. Hand-crafted features outperformed automatically generated features.
Download

Paper Nr: 34
Title:

Mediastinal Lymph Node Detection using Deep Learning

Authors:

Jayant P. Singh, Yuji Iwahori, M. K. Bhuyan, Hiroyasu Usami, Taihei Oshiro and Yasuhiro Shimizu

Abstract: Accurate Lymph Node detection plays a significant role in tumour staging, choice of therapy, and in predicting the outcome of malignant diseases. Clinical examination to detect lymph node metastases alone is tedious and error-prone due to the low contrast of surrounding structures in Computed Tomography (CT) and to their varying shapes, poses, sizes, and sparsely distributed locations. (Oda et al., 2017) report 84.2% sensitivity at 9.1 false-positives per volume (FP/vol.) by local intensity structure analysis based on an Intensity Targeted Radial Structure Tensor (ITRST). In this paper, we first operate a candidate generation stage using U-Net (modified fully convolutional network for segmentation of biomedical images), towards 100% sensitivity at the cost of high FP levels to generate volumes of interest (VOI). Thereafter, we present an exhaustive analysis of approaches using different representations (ways to decompose a 3D VOI) as input to train Convolutional Neural Network (CNN), 3D CNN (convolutional neural network using 3D convolutions) classifier. We also evaluate SVMs trained on features extracted by the aforementioned CNN and 3D CNN. The candidate generation followed by false positive reduction to detect lymph nodes provides an alternative to compute and memory intensive methods using 3D fully convolutional networks. We validate approaches on a dataset of 90 CT volumes with 388 mediastinal lymph nodes published by (Roth et al., 2014). Our best approach achieves 84% sensitivity at 2.88 FP/vol. in the mediastinum of chest CT volumes.
Download

Paper Nr: 36
Title:

Interdependent Multi-task Learning for Simultaneous Segmentation and Detection

Authors:

Mahesh Reginthala, Yuji Iwahori, M. K. Bhuyan, Yoshitsugu Hayashi, Witsarut Achariyaviriya and Boonserm Kijsirikul

Abstract: Lightweight, fast, and accurate deep-learning algorithms are essential for practical deployment in real-world use-cases. Semantic segmentation and object detection are the principal tasks of visual perception. A multi-task network significantly reduces the number of parameters compared to two independent networks running simultaneously for each task. Generally, multi-task networks have shared encoders and multiple independent task-specific decoders. Instead, we modeled our network to exploit the features from both encoder and decoder. We propose the multi-task network that performs both segmentation and detection with only 37.9 million parameters and inference time of 74 milliseconds on a consumer-grade GPU. This network performs two tasks with much fewer parameters and in much less inference time compared to each single task network.
Download

Paper Nr: 50
Title:

Goal-based Evaluation of Text Mining Results in an Industrial Use Case

Authors:

Jens Drawehn, Matthias Blohm, Maximilien Kintz and Monika Kochanowski

Abstract: Artificial intelligence boosted the interest in text mining solutions in the last few years. Especially in nonEnglish-speaking countries, where there might not be clear market leaders, a variety of solutions for different text mining scenarios has become available. Most of them support special use cases and have strengths and weaknesses in others. In text or page classification, standard measures like precision, recall, sensitivity or F1-score are prevalent. However, evaluation of feature extraction results requires more tailored approaches. We experienced many issues on the way to benchmarking feature extraction results from text, like whether a result is correct, partly correct, helpful or useless. The main contribution of this work is a method for designing a tailored evaluation procedure in an individual text extraction benchmark for one specific use case. In this context, we propose a general way of mapping the common CRISP-DM process to particularities of text mining projects. Furthermore, we describe possible goals of information extraction, the features to be extracted, suitable evaluation criteria and a corresponding customized scoring system. This is applied in detail in an industrial use case.
Download

Paper Nr: 61
Title:

A Manifold Learning Framework for the Detection of Cardiac Disorders in Acoustic Signals

Authors:

Keren Hochman, Amir Averbuch, Alon Schclar and Raid Saabni

Abstract: Cardiac disorders are clinical situations in which the heart does not function properly. These disorders may be fatal to patients if they are not detected. Detecting such disorders often involves special and in some cases very expensive medical devices such as Computer Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound imaging or Electrocardiograms. Acoustic detection of these disorders by simply listening to the heart using a stethoscope - although being the cheapest detection method - requires a highly skilled doctor. We propose a method that detects cardiac disorders from simple acoustic recordings of the heart. Acquiring such recording is in most cases cheaper than the above mentioned devices. The proposed algorithm is composed of two steps: an offline training step which constructs a classifier based on labeled recordings; and an online classification step which detects cardiac disorders given a recording of the heart. Given the online nature of the algorithm, the proposed algorithm can be implemented as a smartphone application. One of the key elements of oth the training and detection steps is the concise and informative representation of the acoustic signal. This representation is obtained using the application of the spline wavelet packet transform followed by the application of the Diffusion Maps (DM) dimensionality reduction algorithm. The proposed approach is generic and can be applied to various signal types for solving different classification problems.
Download

Paper Nr: 62
Title:

Heavy Caterpillar Distances for Rooted Labeled Unordered Trees

Authors:

Nozomi Abe, Takuya Yoshino and Kouich Hirata

Abstract: In this paper, we introduce two heavy caterpillar distances between rooted labeled unordered trees (trees, for short) based on the edit distance between the heavy caterpillars obtained from the heavy paths in trees. Then, we show that the heavy caterpillar distances provide the upper bound of the edit distance for trees, can be computed in quadratic time under the unit cost function and are incomparable with other variations of the edit distance.
Download

Paper Nr: 64
Title:

Private Body Part Detection using Deep Learning

Authors:

André Tabone, Alexandra Bonnici, Stefania Cristina, Reuben Farrugia and Kenneth Camilleri

Abstract: Fast and accurate detection of sexually exploitative imagery is necessary for law enforcement agencies to allow for prosecution of suspect individuals. In literature, techniques which can be used to assist law enforcement agencies only determine whether the image content is pornographic or benign. In this paper, we provide a review on classical handcrafted-feature based and deep-learning based pornographic detection in images and describe a framework which goes beyond this, to identify the location of genitalia in the image. Despite this being a computationally complex task, we show that by learning multiple features, a MobileNet framework can achieve an accuracy of 76.29% in the correct labelling of female and male sexual organs.
Download

Paper Nr: 67
Title:

DNNFG: DNN based on Fourier Transform Followed by Gabor Filtering for the Modular FER

Authors:

Sujata and Suman K. Mitra

Abstract: The modular approach mimics the capability of the human brain to identify a person with a limited facial part. In this article, we experimentally show that some facial parts like eyes, nose, lips, and forehead contribute more in the expression recognition task. Deep neural network, VGG16 ft, is proposed to automatically extricate features from the given facial images. Fine-tuning is very fruitful to the FER (Facial Expression Recognition) with pre-trained models, if sufficient facial images are not collected. Two preprocessing approaches, Fourier transform followed by Gabor filters and Data Augmentation (DA), are implemented to restrain the regions used for Facial expression recognition (FER). The features from four facial regions are concatenated and classification is done using SVM and KNN (with different distance measure). The experimental result shows that the proposed framework can recognize the facial expressions like happy, anger, sad, surprise, disgust and fear with high accuracy for the benchmark datasets like “JAFFE”, “VIDEO”, “CK+” and “Oulu-Casia”.
Download

Paper Nr: 73
Title:

File Name Classification Approach to Identify Child Sexual Abuse

Authors:

Mhd Wesam Al-Nabki, Eduardo Fidalgo, Enrique Alegre and Rocío Aláiz-Rodríguez

Abstract: When Law Enforcement Agencies seize a computer machine from a potential producer or consumer of Child Sexual Exploitation Material (CSEM), they need accurate and time-efficient tools to analyze its files. However, classifying and detecting CSEM by manual inspection is a high time-consuming task, and most of the time, it is unfeasible in the amount of time available for Spanish police using a search warrant. An option for identifying CSEM is to analyze the names of the files stored in the hard disk of the suspect person, looking in the text for patterns related to CSEM. However, due to the particularity of this file names, mainly its length and the use of obfuscated words, current file name classification methods suffer from a low recall rate, which is essential in the context of this problem. This paper presents our ongoing research to identify CSEM through their file names. We evaluate two approaches of short text classification: a proposal based on machine learning classifiers exploring the use of Logistic Regression and Support Vector Machine and an approach using deep learning by adapting two popular Convolutional Neural Network (CNN) models that work on character-level. The presented CNN achieved an average class recall of 0.86 and a recall rate of 0.78 for the CSEM class. The CNN based classifier could be integrated into forensic tools and services that might support Law Enforcement Agencies to identify CSEM without the need to access systematically to the visual content of every file.
Download

Paper Nr: 83
Title:

Hybrid Fuzzy Binning for Near-duplicate Image Retrieval: Combining Fuzzy Histograms and SIFT Keypoints

Authors:

Afra’a A. Alyosef and Andreas Nürnberger

Abstract: Near-duplicate image retrieval is still a challenging task, especially due to issues with matching quality and performance. Most existing approaches use high dimensional vectors based on local features such as SIFT keypoints to represent images. The extraction and matching of these vectors to detect near-duplicates are time and memory consuming. Global features such as color histograms can strongly reduce the dimensionality of image vectors and significantly accelerate the matching process. On the other hand, they strongly decrease the quality of the retrieval process. In this work, we propose a hybrid approach to improve the quality of retrieval and reduce the computation time by applying a robust filtering process using global features optimized for recall followed by a ranking process optimized for precision. For efficient filtering we propose a fuzzy partition hue saturation (HS) histogram to retrieve a subset of near-duplicate candidate images. After that, we re-rank the top retrieved results by extracting the SIFT features. In order to evaluate the performance and quality of this hybrid approach, we provide results of a comparative performance analysis using the original SIFT-128D, the HS color histogram, the fuzzy HS model (F-HS), the proposed fuzzy partition HS model (FP-HS) and the combination of the proposed fuzzy partition HS histogram with the SIFT features using large scale image benchmark databases. The results of experiments show that applying the fuzzy partition HS histogram and re-rank the top results (only 6%) of the retrieved images) using the SIFT algorithm significantly outperforms the use of the individual state of art methods with respect to computing efficiently and effectively.
Download

Paper Nr: 88
Title:

Activation Adaptation in Neural Networks

Authors:

Farnoush Farhadi, Vahid P. Nia and Andrea Lodi

Abstract: Many neural network architectures rely on the choice of the activation function for each hidden layer. Given the activation function, the neural network is trained over the bias and the weight parameters. The bias catches the center of the activation, and the weights capture the scale. Here we propose to train the network over a shape parameter as well. This view allows each neuron to tune its own activation function and adapt the neuron curvature towards a better prediction. This modification only adds one further equation to the back-propagation for each neuron. Re-formalizing activation functions as a comulative distribution function (cdf) generalizes the class of activation function extensively. We propose to generalizing towards extensive class of activation functions and study: i) skewness and ii) smoothness of activation functions. Here we introduce adaptive Gumbel activation function as a bridge between assymmetric Gumbel and symmetric sigmoid. A similar approach is used to invent a smooth version of ReLU. Our comparison with common activation functions suggests different data representation especially in early neural network layers. This adaptation also provides prediction improvement.
Download

Paper Nr: 19
Title:

Improved Subspace Method for Supervised Anomaly Detection with Minimal Anomalous Data

Authors:

Fumito Ebuchi, Aiga Suzuki and Masahiro Murakawa

Abstract: In conventional anomaly detection methods, the classifier is usually trained only with normal data. However, real-world problems may present a very small amount of anomalous data. In this paper, we propose an improved subspace method for anomaly detection that has the ability to utilize a very small amount of anomalous data. Our method introduces an objective function that minimizes the average projection length of anomalous data into the conventional objective function for the subspace method. This formulation enables a normal subspace that considers the distribution of anomalous data to be learned, thereby improving the anomaly detection performance. Furthermore, because the information about anomalous data is provided in the form of the average projection length, stable detection can be expected even when an extremely small amount of anomalous data is used. We used MNIST and the CIFAR-10 dataset to evaluate the effectiveness of the proposed method, which yielded a higher anomaly detection performance compared with the conventional normal model or classifier model under conditions in which very little anomalous data are obtainable. The performance of our method on CIFAR-10 was assessed by imposing the constraint that only four or five anomalous data samples could be used. In this test, our method achieved an average AUC of 0.263 points higher than that of the state-of-the-art method using only normal data.
Download

Paper Nr: 43
Title:

An Efficient Moth Flame Optimization Algorithm using Chaotic Maps for Feature Selection in the Medical Applications

Authors:

Ruba A. Khurma, Ibrahim Aljarah and Ahmad Sharieh

Abstract: In this paper, multiple variants of the Binary Moth Flame Optimization Algorithm (BMFO) based on chaotic maps are introduced and compared as search strategies in a wrapper feature selection framework. The main purpose of using chaotic maps is to enhance the initialization process of solutions in order to help the optimizer alleviate the local minima and globally converge towards the optimal solution. The proposed approaches are applied for the first time on FS problems. Dimensionality is a major problem that adversely impacts the learning process due to data-overfit and long learning time. Feature selection (FS) is a preprocessing stage in a data mining process to reduce the dimensionality of the dataset by eliminating the redundant and irrelevant noisy features. FS is formulated as an optimization problem. Thus, metaheuristic algorithms have been proposed to find promising near optimal solutions for this complex problem. MFO is one of the recent metaheuristic algorithms which has been efficiently used to solve various optimization problems in a wide range of applications. The proposed approaches have been tested on 23 medical datasets. The comparative results revealed that the chaotic BMFO (CBMFO) significantly increased the performance of the MFO algorithm and achieved competitive results when compared with other state-of-the-arts metaheuristic algorithms.
Download

Paper Nr: 69
Title:

Use of Language Models for Document Stream Segmentation

Authors:

Chems E. Neche, Yolande Belaíd and Abdel Belaíd

Abstract: Page stream segmentation into single documents is a very common task which is practiced in companies and administrations when processing their incoming mail. It is not a straightforward task because the limits of the documents are not always obvious, and it is not always easy to find common features between the pages of the same document. In this paper, we seek to compare existing segmentation models and propose a new segmentation one based on GRUs (Gated Recurrent Unit) and an attention mechanism, named AGRU. This model uses the text content of the previous page and the current page to determine if both pages belong to the same document. So, due to its attention mechanism, this model is capable to recognize words that define the first page of a document. Training and evaluation are carried out on two datasets: Tobacco-800 and READ-Corpus. The former is a public dataset on which our model reaches an F1 score equal to 90%, and the later is private for which our model reaches an F1 score equal to 96%.
Download

Paper Nr: 82
Title:

Predicting Depression with Social Media Images

Authors:

Stankevich Maxim, Nikolay Ignatiev and Ivan Smirnov

Abstract: The study is focused on the task of depression detection by analyzing images related to social media users. We formed a dataset that consists of 485,121 images from profiles of 398 volunteers that provided access to their data in popular Russian-speaking social media Vkontakte. The results of the depression questionnaire were used to distinguish depression and control groups and set the binary classification task. We observed 3 types of users’ images: profile photos, images from posts, and albums. We applied object detection methods to retrieve object features that determine the presence of 80 different object classes on users’ images. To aim the task, the different machine learning algorithms were trained on the objects and color features. Our models achieved up to 65.5% F1-score for the task of revealing depressed users.
Download

Area 2 - Applications

Full Papers
Paper Nr: 3
Title:

What Reviews in Local Online Labour Markets Reveal about the Performance of Multi-service Providers

Authors:

Joschka Kersting and Michaela Geierhos

Abstract: This paper deals with online customer reviews of local multi-service providers. While many studies investigate product reviews and online labour markets with service providers delivering intangible products “over the wire”, we focus on websites where providers offer multiple distinct services that can be booked, paid and reviewed online but are performed locally offline. This type of service providers has so far been neglected in the literature. This paper analyses reviews and applies sentiment analysis. It aims to gain new insights into local multi-service providers’ performance. There is a broad literature range presented with regard to the topics addressed. The results show, among other things, that providers with good ratings continue to perform well over time. We find that many positive reviews seem to encourage sales. On average, quantitative star ratings and qualitative ratings in the form of review texts match. Further results are also achieved in this study.
Download

Paper Nr: 8
Title:

Using Unsupervised Machine Learning for Plasma Etching Endpoint Detection

Authors:

Imen Chakroun, Thomas J. Ashby, Sayantan Das, Sandip Halder, Roel Wuyts and Wilfried Verachtert

Abstract: Much has been discussed around the advent of Industry 4.0 tools to improve yield across front-end and backend semiconductor manufacturers. One of these tools is the etch endpoint detection (EPD) systems. It is essential to optimize the etch process by precisely landing on the underlying layers, because over-etching can cause underlying layer damage. In this work, we explore unsupervised machine learning for automatically identifying the endpoint during plasma etching of low open-area wafers using optical emission spectroscopy.
Download

Paper Nr: 11
Title:

Radially Distorted Planar Motion Compatible Homographies

Authors:

Marcus V. Örnhag

Abstract: Fast and accurate homography estimation is essential to many computer vision applications, including scene degenerate cases and planarity detection. Such cases arise naturally in man-made environments, and failure to handle them will result in poor positioning estimates. Most modern day consumer cameras are affected by some level of radial distortion, which must be compensated for in order to get accurate estimates. This often demands calibration procedures, with specific scene requirements, and off-line processing. In this paper a novel polynomial solver for radially distorted planar motion compatible homographies is presented. The proposed algorithm is fast and numerically stable, and is proven on both synthetic and real data to work well inside a RANSAC loop.
Download

Paper Nr: 12
Title:

Multimodal Deep Denoising Convolutional Autoencoders for Pain Intensity Classification based on Physiological Signals

Authors:

Patrick Thiam, Hans A. Kestler and Friedhelm Schwenker

Abstract: The performance of a conventional information fusion architecture is greatly affected by its ability to detect and combine useful and complementary information from heterogeneous representations stemming from a set of distinctive modalities. Moreover, manually designing a set of relevant and complementary features for a specific pattern recognition task is a complex and tedious endeavour. Therefore, enabling pattern recognition architectures to autonomously generate and select relevant descriptors directly from the set of preprocessed raw data is a favourable alternative to the more conventional manual feature engineering. In the following work, multimodal information fusion approaches based on Deep Denoising Convolutional Autoencoders (DDCAEs) are proposed for the classification of pain intensities based on physiological signals (electrodermal activity (EDA), electromyogram (EMG) and electrocardiogram (ECG)). The approaches are characterized by the simultaneous optimization of both the joint representation of the input channels generated by the multimodal DDCAE and the feed-forward neural network performing the classification of the pain intensities. The assessment performed on the BioVid Heat Pain Database (Part A) points at the relevance of the proposed approaches. In particular, the introduction of trainable weighting parameters for the generation of an aggregated latent representation outperforms most of the previously proposed methods in related works, each based on a set of carefully selected hand-crafted features.
Download

Paper Nr: 13
Title:

Simultaneous Flexible Keyword Detection and Text-dependent Speaker Recognition for Low-resource Devices

Authors:

Hiroshi Fujimura, Ning Ding, Daichi Hayakawa and Takehiko Kagoshima

Abstract: This paper proposes a new method for simultaneous flexible keyword detection and text-dependent speaker identification using a recognized keyword. The purpose is to identify a speaker from among a set of pre-registered speakers on the basis of a short-command utterance in an office or home on low-resource chip devices. The first contribution is to construct the process that includes a neural network (NN) and a customized Viterbi-based algorithm for flexible keyword detection, and Gaussian mixture models (GMMs) for speaker identification. Outputs of a middle layer in the NN and alignment information for keyword detection are also used for creating feature vectors for speaker GMMs. The second contribution is to apply DropConnect in speaker-modeling uncertainties of the Bayesian NN that is used for speaker reacognition. It results in robust speaker models when enrollment utterances are few. Evaluation was conducted using 39 Japanese keywords by 100 speakers. Recognition performance was measured on the basis of false acceptances and false rejects using keyword utterances. Speaker identification for 100 pre-registered speakers for recognized keywords was simultaneously evaluated. The identification rate when using a conventional i-vector method was 71.22%. By contrast, the identification rate of the proposed method was 89.29% while using low-cost resources.
Download

Paper Nr: 23
Title:

Hierarchical Traffic Sign Recognition for Autonomous Driving

Authors:

Vartika Sengar, Renu M. Rameshan and Senthil Ponkumar

Abstract: Traffic Sign Recognition is very crucial for self-driving cars and Advanced Driver Assistance Systems. As the vehicle moves within a region or across regions, it encounters a variety of signs which needs to be recognized with very high accuracy. It is generally observed that traffic signs have large intra-class variability and small inter-class variability. This makes visual distinguishability between distinct classes extremely irregular. In this paper we propose a hierarchical classifier in which the number of coarse classes is automatically determined. This gives the advantage of dedicated classifiers trained for classes which are more difficult to distinguish. This is an application oriented work which involves systematic and intelligent combination of machine learning and computer vision based algorithms with required modifications for designing fully automated hierarchical classification framework for traffic sign recognition. The proposed solution is a real-time scalable machine learning based approach which can efficiently take care of wide intra-class variations without extracting desired handcrafted features beforehand. It eliminates the need for manually observing and grouping relevant features, thereby reducing human time and efforts. The classifier performance accuracy is surpassing the accuracy achieved by humans on publicly available GTSRB traffic sign dataset with lesser parameters than the existing solutions.
Download

Paper Nr: 28
Title:

Segmentation of Moving Objects in Traffic Video Datasets

Authors:

Anusha Aswath, Renu Rameshan, Biju Krishnan and Senthil Ponkumar

Abstract: In this paper, we aim to automate segmentation of multiple moving objects in video datasets specific to traffic use case. This automation is achieved in two steps. First, we generate bounding boxes using our proposed multi-object tracking algorithm based on convolutional neural network (CNN) model which is capable of re-identification. Second, we convert the various tracked objects into pixel masks using an instance segmentation algorithm. The proposed method of tracking has shown promising results with high precision and success rate in traffic video datasets specifically when there is severe object occlusion and frequent camera motion present in the video. Generating instance aware pixel masks for multiple object instances of a video dataset for ground truth is a tedious task. The proposed method offers interactive corrections with human-in-the-loop to improve the bounding boxes and the pixel masks as the video sequence proceeds. It exhibits powerful generalization capabilities and hence the proposed tracker and segmentation network was applied as a part of an annotation tool to reduce human effort and time.
Download

Paper Nr: 52
Title:

A Low Cost Electronic Nose with a GMM-UBM Approach for Wood Species Verification

Authors:

Naren Mantilla-Ramirez, Homero Ortega-Boada, Milton Paja-Sarria and Alexander Sepúlveda-Sepúlveda

Abstract: Deforestation endangers some vulnerable wood species. Although there are effective timber species identification methods, they are typically expensive and time-consuming, they must be carried out by experts and they are not applicable to places far from main cities. In contrast, we propose to use electronic noses to identify timber species, e.g. during their transportation process, from the volatile compounds that timbers emanate. In the present work, it is proposed a method for timber species detection from their aromas. The measurements of the volatile compounds are made by an array of 16 chemical sensors, whose curves are the inputs to a pattern recognition system. Detection is performed by using Gaussian mixture modeling with Universal Background Model. In contrast to previous works, in this work, we apply a new approach to the problem of timer species detection; furthermore, the sample collection conditions are closer to those found in real situations; and, the number of samples used is larger and more varied. We found an EER (equal error rate) of 24.18% for cedar verification and an EER of 33.62% for 4-timber species verification.
Download

Paper Nr: 54
Title:

Learning Question Similarity in CQA from References and Query-logs

Authors:

Alex Zhicharevich, Moni Shahar and Oren S. Shalom

Abstract: Community question answering (CQA) sites are quickly becoming an invaluable source of information in many domains. Since CQA forums are based on the contributions of many authors, the problem of finding similar or even duplicate questions is essential. In the absence of supervised data for this problem, we propose a novel approach to generate weak labels based on easily obtainable data that exist in most CQAs, e.g., query logs and references in the answers. These labels accommodate training of auxiliary supervised text classification models. The internal states of these models serve as meaningful question representations and are used for semantic similarity. We demonstrate that these methods are superior to state of the art text embedding methods for the question similarity task.
Download

Paper Nr: 57
Title:

Exploring the Dependencies between Behavioral and Neuro-physiological Time-series Extracted from Conversations between Humans and Artificial Agents

Authors:

Hmamouche Youssef, Ochs Magalie, Prévot Laurent and Chaminade Thierry

Abstract: Whole-brain neuroimaging using functional Magnetic Resonance Imaging (fMRI) provides valuable data to localize brain activity in space and time. Here, we use a unique corpus including fMRI and behavior recorded when participants discussed with a human or a conversational robot. Temporal dynamic is crucial when studying conversation, yet identifying relationship between the participants’ behavior and their brain activity is technically challenging given the time resolution of fMRI. We propose here an approach developed to extract neurophysiological and behavioral time-series from the corpus and analyse their causal relationships. Preprocessing entails the construction of discrete neurophysiological time-series from functionally well defined brain areas, as well as behavioral time-series describing higher-order behaviors extracted from synchronized raw audio, video and eyetracking recordings. The second step consists in applying machine learning models to predict brain activity on the basis of various aspects of behavior given knowledge about the functional role of the areas under scrutiny. Results demonstrate the specificity of the behaviors allowing the predictions of the activity in functional brain areas.
Download

Paper Nr: 87
Title:

Sentiment Analysis from Sound Spectrograms via Soft BoVW and Temporal Structure Modelling

Authors:

George Pikramenos, Georgios Smyrnis, Ioannis Vernikos, Thomas Konidaris, Evaggelos Spyrou and Stavros Perantonis

Abstract: Monitoring and analysis of human sentiments is currently one of the hottest research topics in the field of human-computer interaction, having many applications. However, in order to become practical in daily life, sentiment recognition techniques should analyze data collected in an unobtrusive way. For this reason, analyzing audio signals of human speech (as opposed to say biometrics) is considered key to potential emotion recognition systems. In this work, we expand upon previous efforts to analyze speech signals using computer vision techniques on their spectrograms. In particular, we utilize ORB descriptors on keypoints distributed on a regular grid over the spectrogram to obtain an intermediate representation. Firstly, a technique similar to Bag-of-Visual-Words (BoVW) is used, where a visual vocabulary is created by clustering keypoint descriptors, but instead a soft candidacy score is used to construct the histogram descriptors of the signal. Furthermore, a technique which takes into account the temporal structure of the spectrograms is examined, allowing for effective model regularization. Both of these techniques are evaluated in several popular emotion recognition datasets, with results indicating an improvement over the simple BoVW method.
Download

Short Papers
Paper Nr: 14
Title:

The Necessity and Pitfall of Augmentation in Deep Learning: Observations During a Case Study in Triplet Learning for Coin Images

Authors:

Daniel Soukup

Abstract: We conducted a case study on a subset of the MUSCLE CIS image benchmark of modern coins with the goal to assess the potential of deep embedding learning for generating representative CNN feature vectors of coin images, which are clustered class by class. In the course of training our models (CNN), we applied algorithmic rotational augmentation to the coin images to enforce rotational invariance. While augmentation is a usual procedure for regularizing deep learning models towards more geometric invariance, exactly that procedure revealed an interesting yet precarious pitfall in deep embedding learning: its susceptibility to interpolation errors. That interpolation bias results in distorted and ambiguous representation clusters of coin classes in the feature space, jeopardizing classification capabilities.
Download

Paper Nr: 15
Title:

Pitch-synchronous Discrete Cosine Transform Features for Speaker Identification and Verification

Authors:

Amit Meghanani and A. G. Ramakrishnan

Abstract: We propose a feature called pitch-synchronous discrete cosine transform (PS-DCT), derived from the voiced part of the speech for speaker identification (SID) and verification (SV) tasks. PS-DCT features are derived from the ‘time-domain, quasi-stationary waveform shape’ of the voiced sounds. We test our PS-DCT feature on TIMIT, Mandarin and YOHO datasets. On TIMIT with 168 and Mandarin with 855 speakers, we obtain the SID accuracies of 99.4% and 96.1%, respectively, using a Gaussian mixture model-based classifier. In the i-vector-based SV framework, fusing the ‘PS-DCT based system’ with the ‘MFCC-based system’ at the score level reduces the equal error rate (EER) for both YOHO and Mandarin datasets. In the case of limited test data and session variabilities, we obtain a significant reduction in EER, up to 5.8% (for test data of duration < 3 sec).
Download

Paper Nr: 20
Title:

Real-time 3D Object Detection from Point Clouds using an RGB-D Camera

Authors:

Ya Wang, Shu Xu and Andreas Zell

Abstract: This paper aims at real-time high-accuracy 3D object detection from point clouds for both indoor and outdoor scenes using only a single RGB-D camera. We propose a new network system that combines both 2D and 3D object detection algorithms to achieve better real-time object detection results and has faster speed by simplifying our networks on real robots. YOLOv3 is one of the state-of-the-art object detection methods based on 2D images. Frustum PointNets is a real-time method using frustum constraints to predict a 3D bounding box of an object. Combining these two approaches can be efficient for real-time 2D-3D object detection, both indoor and outdoor. We not only have the improved training and evaluation accuracy and lower mean loss on the KITTI object detection benchmark, but also achieve better average precision (AP) on 3D detection of all classes in three different levels of difficulty. In addition, we implement our system of on-board real-time 2D and 3D object detection using only an RGB-D camera on three different hardware devices.
Download

Paper Nr: 24
Title:

Supervised Machine Learning and Feature Selection for a Document Analysis Application

Authors:

James Pope, Daniel Powers, J. A. (Jim) Connell, Milad Jasemi, David Taylor and Xenofon Fafoutis

Abstract: Over the past three decades large amounts of information have been converted to image formats from paper documents. Though in digital form, extracting the information, usually textual, from these documents requires complex image processing and optical character recognition techniques. The processing pipeline from the image to information typically includes an orientation correction task, document identification task, and text analysis task. When there are many document variants the tasks become difficult requiring complex subanalysis for each variant and quickly exceeds human capability. In this work, we demonstrate a document analysis application with the orientation correction and document identification task carried out by supervised machine learning techniques for a large, international airline. The documents have been amassed over forty years with numerous variants and are mostly black and white, typically consist of text and lines, and some have extensive noise. Low level symbols are extracted from the raw images and separated into partitions. The partitions are used to generate statistical features which are then used to train the classifiers. We compare the classifiers for each task (e.g. decision tree, support vector machine, and random forest) to choose the most appropriate. We also perform feature selection to reduce the complexity of the document type classifiers. These parsimonious models result in comparable accuracy with 80% or fewer features.
Download

Paper Nr: 35
Title:

A Method to Identify the Cause of Misrecognition for Offline Handwritten Japanese Character Recognition using Deep Learning

Authors:

Keiji Gyohten, Hidehiro Ohki and Toshiya Takami

Abstract: In this research, we propose a method to identify the cause of misrecognition in offline handwritten character recognition using a convolutional neural network (CNN). In our method, the CNN learns not only character images augmented by applying an image processing method, but also those generated from character models with stroke structures. Using these character models, the proposed method can generate character images which lack one stroke. By learning the augmented character images lacking a stroke, the CNN can identify the presence of each stroke in the characters to be recognized. Subsequently, by adding dense layers to the final layer and learning the character images, obtaining the CNN for the offline handwritten character recognition becomes possible. The obtained CNN has nodes that can represent the presence of the strokes and can identify which strokes are the cause of misrecognition. The effectiveness of the proposed method is confirmed from character recognition experiments targeting 440 types of Japanese characters.
Download

Paper Nr: 45
Title:

Detection System of Gram Types for Bacteria from Gram Stained Smears Images

Authors:

Ryosuke Iida, Kazuki Hashimoto, Kouich Hirata, Kimiko Matsuoka and Shigeki Yokoyama

Abstract: In this paper, we develop the detection system of Gram types determined by stained colors and stained shapes for bacteria from Gram stained smears images. Here, we call four types of bacteria, that is, Gram positive cocci (GPC), Gram positive bacilli (GPB), Gram negative cocci (GNC) and Gram negative bacilli (GPB) Gram types, and then add to two types as Gram positive unknown (GPU), and Gram positive unknown (GNU). The system first infers the candidate regions of bacteria by using image processing. Next, it constructs a classifier dividing the candidate regions into Gram types by using SVM (support vetcor machine) and DNN (deep neural network). Finally, it detects the occurrences of Gram types in a newly input image and retrieves Gram stained smears images similar as the input image such that the occurrence ratio for the Gram types is similar.
Download

Paper Nr: 48
Title:

Deep Learning Approach to Diabetic Retinopathy Detection

Authors:

Borys Tymchenko, Philip Marchenko and Dmitry Spodarets

Abstract: Diabetic retinopathy is one of the most threatening complications of diabetes that leads to permanent blindness if left untreated. One of the essential challenges is early detection, which is very important for treatment success. Unfortunately, the exact identification of the diabetic retinopathy stage is notoriously tricky and requires expert human interpretation of fundus images. Simplification of the detection step is crucial and can help millions of people. Convolutional neural networks (CNN) have been successfully applied in many adjacent subjects, and for diagnosis of diabetic retinopathy itself. However, the high cost of big labeled datasets, as well as inconsistency between different doctors, impede the performance of these methods. In this paper, we propose an automatic deep-learning-based method for stage detection of diabetic retinopathy by single photography of the human fundus. Additionally, we propose the multistage approach to transfer learning, which makes use of similar datasets with different labeling. The presented method can be used as a screening method for early detection of diabetic retinopathy with sensitivity and specificity of 0.99 and is ranked 54 of 2943 competing methods (quadratic weighted kappa score of 0.925466) on APTOS 2019 Blindness Detection Dataset (13000 images).
Download

Paper Nr: 49
Title:

Using DICOM Tags for Clustering Medical Radiology Images into Visually Similar Groups

Authors:

Teo Manojlović, Dino Ilić, Damir Miletić and Ivan Štajduhar

Abstract: The data stored in a Picture Archiving and Communication System (PACS) of a clinical centre normally consists of medical images recorded from patients using select imaging techniques, and stored metadata information concerning the details on the conducted diagnostic procedures - the latter being commonly stored using the Digital Imaging and Communications in Medicine (DICOM) standard. In this work, we explore the possibility of utilising DICOM tags for automatic annotation of PACS databases, using K-medoids clustering. We gather and analyse DICOM data of medical radiology images available as a part of the RadiologyNet database, which was built in 2017, and originates from the Clinical Hospital Centre Rijeka, Croatia. Following data preprocessing, we used K-medoids clustering for multiple values of K, and we chose the most appropriate number of clusters based on the silhouette score. Next, for evaluating the clustering performance with regard to the visual similarity of images, we trained an autoencoder from a non-overlapping set of images. That way, we estimated the visual similarity of pixel data clustered by DICOM tags. Paired t-test (p < 0.001) suggests a significant difference between the mean distance from cluster centres of images clustered by DICOM tags, and randomly-permuted cluster labels.
Download

Paper Nr: 51
Title:

A Triplet-learnt Coarse-to-Fine Reranking for Vehicle Re-identification

Authors:

Efklidis Katsaros, Henri Bouma, Arthur van Rooijen and Elise Dusseldorp

Abstract: Vehicle re-identification refers to the task of matching the same query vehicle across non-overlapping cameras and diverse viewpoints. Research interest on the field emerged with intelligent transportation systems and the necessity for public security maintenance. Compared to person, vehicle re-identification is more intricate, facing the challenges of lower intra-class and higher inter-class similarities. Motivated by deep metric learning advances, we propose a novel, triplet-learnt coarse-to-fine reranking scheme (C2F-TriRe) to address vehicle re-identification. Coarse vehicle features conduct the baseline ranking. Thereafter, a fully connected network maps features to viewpoints. Simultaneously, windshields are detected and respective fine features are extracted to capture custom vehicle characteristics. Conditional to the viewpoint, coarse and fine features are combined to yield a robust reranking. The proposed scheme achieves state-of-the-art performance on the VehicleID dataset and outperforms our baselines by a large margin.
Download

Paper Nr: 53
Title:

New Commercial Representation for Cattle Information Gathering

Authors:

Jorge Navarro, Isaac Martín de Diego, Karen Príncipe-Aguirre and María J. Algar

Abstract: As the development of Wireless Sensor Networks improves, new applications of Internet of Things are emerging in sectors as diverse as military, environmental, health or food. In many of these applications, the autonomy of the devices is an essential element in order to make reasonable use of them. For the cattle domain, there is a need for an efficient use of energy by sending few messages that accumulate as much information as possible. This paper proposes a new strategy for sending summarized information from devices that are commercially used in cattle to analyze animal behavior. Experiments using 120 different daily time series related to animal behavior have been performed. The obtained results show that the proposed strategy highly improves the current operation mode of the equipment.
Download

Paper Nr: 66
Title:

Simultaneous Object Detection and Semantic Segmentation

Authors:

Niels O. Salscheider

Abstract: Both object detection in and semantic segmentation of camera images are important tasks for automated vehicles. Object detection is necessary so that the planning and behavior modules can reason about other road users. Semantic segmentation provides for example free space information and information about static and dynamic parts of the environment. There has been a lot of research to solve both tasks using Convolutional Neural Networks. These approaches give good results but are computationally demanding. In practice, a compromise has to be found between detection performance, detection quality and the number of tasks. Otherwise it is not possible to meet the real-time requirements of automated vehicles. In this work, we propose a neural network architecture to solve both tasks simultaneously. This architecture was designed to run with around 10 Hz on 1 MP images on current hardware. Our approach achieves a mean IoU of 61.2% for the semantic segmentation task on the challenging Cityscapes benchmark. It also achieves an average precision of 69.3% for cars and 67.7% for pedestrians on the moderate difficulty level of the KITTI benchmark.
Download

Paper Nr: 72
Title:

Guidelines for Effective Automatic Multiple Sclerosis Lesion Segmentation by Magnetic Resonance Imaging

Authors:

Giuseppe Placidi, Luigi Cinque and Matteo Polsinelli

Abstract: General constraints for automatic identification/segmentation of multiple sclerosis (MS) lesions by Magnetic Resonance Imaging (MRI) are discussed and guidelines for effective training of a supervised technique are presented. In particular, system generalizability to different imaging sequences and scanners from different manufacturers, misalignment between images from different modalities and subjectivity in generating labelled images, are indicated as the main limitations to high accuracy automatic MS lesions identification/segmentation. A convolutional neural network (CNN) based method is used by applying the suggested guidelines and preliminary results demonstrate the improvements. The method has been trained, validated and tested on publicly available labelled MRI datasets. Future developments and perspectives are also presented.
Download

Paper Nr: 74
Title:

Device-based Image Matching with Similarity Learning by Convolutional Neural Networks that Exploit the Underlying Camera Sensor Pattern Noise

Authors:

Guru S. Bennabhaktula, Enrique Alegre, Dimka Karastoyanova and George Azzopardi

Abstract: One of the challenging problems in digital image forensics is the capability to identify images that are captured by the same camera device. This knowledge can help forensic experts in gathering intelligence about suspects by analyzing digital images. In this paper, we propose a two-part network to quantify the likelihood that a given pair of images have the same source camera, and we evaluated it on the benchmark Dresden data set containing 1851 images from 31 different cameras. To the best of our knowledge, we are the first ones addressing the challenge of device-based image matching. Though the proposed approach is not yet forensics ready, our experiments show that this direction is worth pursuing, achieving at this moment 85 percent accuracy. This ongoing work is part of the EU-funded project 4NSEEK concerned with forensics against child sexual abuse.
Download

Paper Nr: 77
Title:

Twitter Topic Progress Visualization using Micro-clustering

Authors:

Takako Hashimoto, Akira Kusaba, Dave Shepard, Tetsuji Kuboyama, Kilho Shin and Takeaki Uno

Abstract: This paper proposes a method for visualizing the progress of a bursty topic on Twitter using a previously-proposed micro-clustering technique, which reveals the cause and the progress of a burst. Micro-clustering can efficiently represent sub-topics of a bursty topic, which allows visualizing transitions between these subtopics over time. This process allows for a Twitter user to see the origin of a bursty topic more easily. To show the method’s effectiveness, we conducted an experiment on a real bursty topic, a controversy over childcare leave in Japan. When we extract sub-topics using micro-clustering, and analyze micro-clusters over time, we can understand the progress of the target topic and discover the micro-clusters that caused the burst.
Download

Paper Nr: 94
Title:

Improving Dialogue Smoothing with A-priori State Pruning

Authors:

Manex Serras, María I. Torres and Arantza D. Pozo

Abstract: When Dialogue Systems (DS) face real usage, a challenge to solve is managing unforeseen situations without breaking the coherence of the dialogue. One way to achieve this is by redirecting the interaction to known dialogue states in a transparent way. This work proposes a simple a-priori pruning method to rule out invalid candidates when searching for similar dialogue states in unexpected scenarios. The proposed method is evaluated on a User Model (UM) based on Attributed Probabilistic Finite State Bi-Automata (A-PFSBA), trained on the Dialogue State Tracking Challenge 2 (DSTC2) corpus. Results show that the proposed technique improves response times and achieves higher F1 scores than previous A-PFSBA implementations and deep learning models.
Download

Paper Nr: 101
Title:

Network of Steel: Neural Font Style Transfer from Heavy Metal to Corporate Logos

Authors:

Aram Ter-Sarkisov

Abstract: We introduce a method for transferring style from the logos of heavy metal bands onto corporate logos using a VGG16 network. We establish the contribution of different layers and loss coefficients to the learning of style, minimization of artefacts and maintenance of readability of corporate logos. We find layers and loss coefficients that produce a good tradeoff between heavy metal style and corporate logo readability. This is the first step both towards sparse font style transfer and corporate logo decoration using generative networks. Heavy metal and corporate logos are very different artistically, in the way they emphasize emotions and readability, therefore training a model to fuse the two is an interesting problem.
Download

Paper Nr: 103
Title:

Detection of Privacy Disclosure in the Medical Domain: A Survey

Authors:

Bianca Buff, Joschka Kersting and Michaela Geierhos

Abstract: When it comes to increased digitization in the health care domain, privacy is a relevant topic nowadays. This relates to patient data, electronic health records or physician reviews published online, for instance. There exist different approaches to the protection of individuals privacy, which focus on the anonymization and masking of personal information subsequent to their mining. In the medical domain in particular, measures to protect the privacy of patients are of high importance due to the amount of sensitive data that is involved (e.g. age, gender, illnesses, medication). While privacy breaches in structured data can be detected more easily, disclosure in written texts is more difficult to find automatically due to the unstructured nature of natural language. Therefore, we take a detailed look at existing research on areas related to privacy protection. Likewise, we review approaches to the automatic detection of privacy disclosure in different types of medical data. We provide a survey of several studies concerned with privacy breaches in the medical domain with a focus on Physician Review Websites (PRWs). Finally, we briefly develop implications and directions for further research.
Download

Paper Nr: 107
Title:

Democratization of Artificial Intelligence (AI) to Small Scale Farmers: A Framework to Deploy AI Models to Tiny IoT Edges That Operate in Constrained Environments

Authors:

Chandrasekar Vuppalapati, Anitha Ilapakurti, Sharat Kedari, Jaya Vuppalapati, Santosh Kedari and Raja Vuppalapati

Abstract: Big Data surrounds us. Every minute, our smartphone collects huge amount of data from geolocations to next clickable item on the ecommerce site. Data has become one of the most important commodities for the individuals and companies. Nevertheless, this data revolution has not touched every economic sector, especially rural economies, e.g., small farmers have been largely passed over the data revolution, in the developing countries due to infrastructure and compute constrained environments. Not only this is a huge missed opportunity for the big data companies, it is one of the significant obstacle in the path towards sustainable food and a huge inhibitor closing economic disparities. The purpose of the paper is to develop a framework to deploy artificial intelligence models in constrained compute environments that enable remote rural areas and small farmers to join the data revolution and start contribution to the digital economy and empowers the world through the data to create a sustainable food for our collective future.
Download

Paper Nr: 5
Title:

Person Identification based on Physiological Signals: Conditions and Risks

Authors:

Peter Bellmann, Patrick Thiam and Friedhelm Schwenker

Abstract: Person identification is usually based on video signals, DNA samples or fingerprints. In this study, we want to show the effectiveness of other physiological signals for person identification. For this purpose, we evaluate different settings with the SenseEmotion Database. The data set was initially collected for research purposes in the fields of emotion and pain intensity recognition. However, we use the multi-modality of this database to evaluate the effectiveness of different physiological signals, such as the heart activity or skin conductance, for person identification purposes. It is almost impossible for human beings to identify persons by evaluating a set of different fingerprints. Machine learning methods usually outperform humans in both, operation time as well as accuracy, in those tasks. In our study, we show that basic pattern recognition models can be used to identify human beings based on physiological signals. However, our outcomes show that person identification based on physiological signals must be treated with caution. Specifically, our results indicate that it is essential to include physiological signals from different recording sessions, to ensure generalisation ability of the classification model, for the person identification task.
Download

Paper Nr: 10
Title:

Reinforcement Learning of Robot Behavior based on a Digital Twin

Authors:

Tobias Hassel and Oliver Hofmann

Abstract: A reinforcement learning approach using a physical robot is cumbersome and expensive. Repetitive execution of actions in order to learn from success and failure requires time and money. In addition, misbehaviour of the robot may also damage or destroy the test bed. Therefore, a digital twin of a physical robot has been used in our research work to train a model within the simulation environment Unity. Later on, the trained model has been transferred to a real-world scenario and used to control a physical agent.
Download

Paper Nr: 18
Title:

Japanese Cursive Character Recognition for Efficient Transcription

Authors:

Kazuya Ueki and Tomoka Kojima

Abstract: We conducted detailed experiments of Japanese cursive character recognition to promote Japanese historical document transcription and digitization by using a publicly available kuzushiji dataset released by the Center for Open Data in the Humanities (CODH). Using deep learning, we analyzed the causes of recognition difficulties through a recognition experiment of over 1,500-class of kuzushiji characters. Furthermore, assuming actual transcription conditions, we introduced a method to automatically determine which characters should be held for judgment by identifying difficult-to-recognize characters or characters that were not used during training. As a result, we confirmed that a classification rate of more than 90% could be achieved by narrowing down the characters to be classified even when a recognition model with a classification rate of 73.10% was used. This function could improve transcribers’ ability to judge correctness from context in the post-process—namely, the previous and subsequent characters.
Download

Paper Nr: 25
Title:

Loads Estimation using Deep Learning Techniques in Consumer Washing Machines

Authors:

Alexander Babichev, Vittorio Casagrande, Luca Della Schiava, Gianfranco Fenu, Imola Fodor, Enrico Marson, Felice Andrea Pellegrino, Gilberto Pin, Erica Salvato, Michele Toppano and Davide Zorzenon

Abstract: Home appliances are nowadays present in every house. In order to ensure a suitable level of maintenance, manufacturers strive to design a method to estimate the wear of the single electrical parts composing an appliance without providing it with a large number of expensive sensors. With this in mind, our goal consists in inferring the status of the electrical actuators of a washing machine, given the measures of electrical signals at the plug, which carry an aggregate information. The approach is end-to-end, i.e. it does not require any feature extraction and thus it can be easily generalized to other appliances. Two different techniques have been investigated: Convolutional Neural Networks and Long Short-Term Memories. These tools are trained and tested on data collected on four different washing machines.
Download

Paper Nr: 26
Title:

Sclera Segmentation using Spatial Kernel Fuzzy Clustering Methods

Authors:

M. S. Maheshan, B. S. Harish and S. A. Kumar

Abstract: Biometrics is one of the domain that is gaining lot of importance in the present digital industry. Biometrics are getting integrated in different devices and reaching the end users at a very affordable cost. Among various biometric traits, Sclera is one such trait that is getting popular in the research community for its distinct nature of authenticating and identification of individuals. The recognition system using sclera trait purely depends on efficient segmentation of sclera image. Segmentation process is considered to be significant in image processing system because of better visualization. The segmentation can be done using region based, edge based, threshold based and also clustering based techniques. This paper concentrates on clustering based technique by proposing a variant of conventional Fuzzy C Means (FCM) algorithm. Though the Fuzzy C Means presents outstanding results in many applications, unfortunately it is sensitive to noise and ignore neighbourhood information. Thus to alleviate these limitations this paper presents Generalized Spatial Kernel Fuzzy C Means (GSK-FCM) clustering algorithms for sclera segmentation. To evaluate the proposed methods, experimentation are conducted on Sclera Segmentation and Recognition Benchmarking Competition (SSRBC 2015) dataset. The result of the experiments reveals that the proposed methods outperform the other variants of FCM.
Download

Paper Nr: 29
Title:

Using Automatic Features for Text-image Classification in Amharic Documents

Authors:

Birhanu Belay, Tewodros Habtegebrial, Gebeyehu Belay and Didier Stricker

Abstract: In many documents, ranging from historical to modern archived documents, handwritten and machine printed texts may coexist in the same document image, raising significant issues within the recognition process and affects the performance of OCR application. It is, therefore, necessary to discriminate the two types of texts so that it becomes possible to apply the desired recognition techniques. Inspired by the recent successes CNN based features on pattern recognition, in this paper, we propose a method that can discriminate handwritten from machine printed text-lines in Amharic document image. In addition, we also demonstrate the effect of replacing the last fully connected layer with a binary support vector machine which minimizes a margin-based loss instead of the cross-entropy loss. Based on the results observed during experimentation, using Binary SVM gives significant discrimination performance compared to the fully connected layers.
Download

Paper Nr: 38
Title:

Hate Speech Detection using Word Embedding and Deep Learning in the Arabic Language Context

Authors:

Hossam Faris, Ibrahim Aljarah, Maria Habib and Pedro A. Castillo

Abstract: Hate speech over online social networks is a worldwide problem that leads for diminishing the cohesion of civil societies. The rapid spread of social media websites is accompanied with an increasing number of social media users which showed a higher rate of hate speech, as well. The objective of this paper is to propose a smart deep learning approach for the automatic detection of cyber hate speech. Particularly, the detection of hate speech on Twitter on the Arabic region. Hence, a dataset is collected from Twitter that captures the hate expressions in different topics at the Arabic region. A set of features extracted from the dataset based on a word embedding mechanism. The word embeddings fed into a deep learning framework. The implemented deep learning approach is a hybrid of convolutional neural network (CNN) and long short-term memory (LSTM) network. The proposed approach achieved good results in classifying tweets as Hate or Normal regarding accuracy, precision, recall, and F1 measure.
Download

Paper Nr: 41
Title:

FotonNet: A Hardware-efficient Object Detection System using 3D-depth Segmentation and 2D-deep Neural Network Classifier

Authors:

Gurjeet Singh, Sunmiao, Shi Shi and Patrick Chiang

Abstract: Object detection and classification is one of the most crucial computer vision problems. Ever since the introduction of deep learning, we have witnessed a dramatic increase in the accuracy of this object detection problem. However, most of these improvements have occurred using conventional 2D image processing. Recently, low-cost 3D-image sensors, such as the Microsoft Kinect (Time-of-Flight) or the Apple FaceID (Structured-Light), can provide 3D-depth or point cloud data that can be added to a convolutional neural network, acting as an extra set of dimensions. We are proposing a hardware-based approach for Object Detection by moving region of interest identification closer to sensor node in the hardware. Due to this approach, we do not need a large dataset with depth images to retrain the network. Our 2D + 3D system takes the 3D-data to determine the object region followed by any conventional 2D-DNN, such as AlexNet. In this method, our approach can readily dissociate the information collected from the Point Cloud and 2D-Image data and combine both operations later. Hence, our system can use any existing trained 2D network on a large image dataset and does not require a large 3D-depth dataset for new training. Experimental object detection results across 30 images show an accuracy of 0.67, whereas 0.54 and 0.51 for FasterRCNN and YOLO, respectively.
Download

Paper Nr: 44
Title:

Detecting Geckler Classification from Gram Stained Smears Images for Sputum

Authors:

Kazuki Hashimoto, Ryosuke Iida, Kouich Hirata, Kimiko Matsuoka and Shigeki Yokoyama

Abstract: A Geckler classification is a criterion how the smear image is quality based on the number of buccal squamous epithelial (BSE) cells and leukocytes in the Gram stained smears images per 100× field for sputum. The Geckler classification then determines which of images is valuable to microscope testing for the Gram stained smears images per 1,000× field for sputum. In this paper, we develop the system to detect the Geckler classification from Gram stained smears images per 100× field for sputum. In this system, first we detect the regions of BSE cells and leukocytes and then construct the classifier of the BSE cells and leukocytes by SVM and DNN. Then, we detect the Geckler class of every test image by detecting the candidate regions and by applying the classifier.
Download

Paper Nr: 46
Title:

Mosaic Images Segmentation using U-net

Authors:

Gianfranco Fenu, Eric Medvet, Daniele Panfilo and Felice A. Pellegrino

Abstract: We consider the task of segmentation of images of mosaics, where the goal is to segment the image in such a way that each region corresponds exactly to one tile of the mosaic. We propose to use a recent deep learning technique based on a kind of convolutional neural networks, called U-net, that proved to be effective in segmentation tasks. Our method includes a preprocessing phase that allows to learn a U-net despite the scarcity of labeled data, which reflects the peculiarity of the task, in which manual annotation is, in general, costly. We experimentally evaluate our method and compare it against the few other methods for mosaic images segmentation using a set of performance indexes, previously proposed for this task, computed using 11 images of real mosaics. In our results, U-net compares favorably with previous methods. Interestingly, the considered methods make errors of different kinds, consistently with the fact that they are based on different assumptions and techniques. This finding suggests that combining different approaches might lead to an even more effective segmentation.
Download

Paper Nr: 47
Title:

2D Orientation and Grasp Point Computation for Bin Picking in Overhaul Processes

Authors:

Sajjad Taheritanjani, Juan Haladjian, Thomas Neumaier, Zardosht Hodaie and Bernd Bruegge

Abstract: During industrial overhauling processes, several small parts and fasteners must be sorted and packed into different containers for reuse. Most industrial bin picking solutions use either a CAD model of the objects for comparison with the obtained 3D point clouds or complementary approaches, such as stereo cameras and laser sensors. However, obtaining CAD models may be infeasible for all types of small parts. In addition, industrial small parts have characteristics (e.g., light reflections in ambient light) that make the picking task even more challenging even when using laser and stereo cameras. In this paper, we propose an approach that solves these problems by automatically segmenting small parts and classifying their orientation and obtaining a grasp point using 2D images. The proposed approach obtained segmentation accuracy of 80% by applying a Mask R-CNN model trained on 10 annotated images. Moreover, it computes the orientation and grasp point of the pickable objects using Mask R-CNN or a combination of PCA and Image Moment. The proposed approach is a first step towards an automated bin picking system in overhaul processes that reduces costs and time by segmenting pickable small parts to be picked by a robot.
Download

Paper Nr: 58
Title:

Identification of Sustainable Locations in Pigeon Flights using Flow Simulation Method

Authors:

Margarita Zaleshina and Alexander Zaleshin

Abstract: Navigation behaviour in nature is based on data obtained from perception of the terrain where movement occurs. The aim of this work is to study the influence of visual factors on the flight of birds over medium distances (about 10 km). In this study, we propose a method for probabilistic analysis of pigeon flights over combined countryside and urban terrain, based on surface flow simulation. Z-value – an altitude analogue that describes the characteristic gradient of the flow – is calculated as a function of "landscape complexity" based on the density of significant landscape objects. The calculated probabilistic model is compared with data on GPS-tracks of untrained and trained pigeons. As a result, significant features of terrain that determine sustainable locations in pigeon flights are identified. In the study, visual characteristics of the territories over which pigeons flew are calculated using remote sensing data from open sources, and spatial data are processed using the geographical information system QGIS.
Download

Paper Nr: 60
Title:

Activity Mining in a Smart Home from Sequential and Temporal Databases

Authors:

Josky Aízan, Cina Motamed and Eugene C. Ezin

Abstract: In this paper, we implement the Sequential Pattern Mining from Temporal Databases to learn activity in a smart home. The Pre-processing is firstly conducted on sensor data by taking into account the timestamp of sensor events. Then we extract typical activities using a sequential pattern mining algorithm. In order to perform activities’ recognition, features are extracted and activities are modeled. Experiments are carried out on the Massachusetts Institute of Technology (MIT) smart home data set. The results show the effectiveness of the proposed approach with 99% as recognition rate.
Download

Paper Nr: 65
Title:

Stairway to Elders: Bridging Space, Time and Emotions in Their Social Environment for Wellbeing

Authors:

Giuseppe Boccignone, Claudio de’Sperati, Marco Granato, Giuliano Grossi, Raffaella Lanzarotti, Nicoletta Noceti and Francesca Odone

Abstract: The physical and mental health in elderly population is an emergent issue which in recent years has become an urgent socio-economic phenomenon. Computer scientists, together with physicians and caregivers have devoted a great research effort to conceive and devise assistive technologies, aiming at safeguarding elder health, while a marginal consideration has been devoted to their emotional domain. In this manuscript we outline the research plan and the objectives of a current project called Stairway to elders: bridging space, time and emotions in their social environment for wellbeing”. Through a set of sensors, which include cameras and physiological sensors, we aim at developing computational methods for understanding the affective state and socialization attitude of older people in ecological conditions. A valuable by-product of the project will be the collection of a multi-modal dataset to be used for model design, and that will be made available to the research community. The outcomes of the project should support the design of an environment which automatically (or semi-automatically) adapts its conditions to the affective state of older people, with a consequent improvement of their life quality.
Download

Paper Nr: 71
Title:

Deep Learning Techniques for Dragonfly Action Recognition

Authors:

Martina Monaci, Niccolò Pancino, Paolo Andreini, Simone Bonechi, Pietro Bongini, Alberto Rossi, Giorgio Ciano, Giorgia Giacomini, Franco Scarselli and Monica Bianchini

Abstract: Anisoptera are a suborder of insects belonging to the order of Odonata, commonly identified with the generic term dragonflies. They are characterized by a long and thin abdomen, two large eyes, and two pairs of transparent wings. Their ability to move the four wings independently allows dragonflies to fly forwards, backwards, to stop suddenly and to hover in mid–air, as well as to achieve high flight performance, with speed up to 50 km per hour. Thanks to these particular skills, many studies have been conducted on dragonflies, also using machine learning techniques. Some analyze the muscular movements of the flight to simulate dragonflies as accurately as possible, while others try to reproduce the neuronal mechanisms of hunting dragonflies. The lack of a consistent database and the difficulties in creating valid tools for such complex tasks have significantly limited the progress in the study of dragonflies. We provide two valuable results in this context: first, a dataset of carefully selected, pre–processed and labeled images, extracted from videos, has been released; then some deep neural network models, namely CNNs and LSTMs, have been trained to accurately distinguish the different phases of dragonfly flight, with very promising results.
Download

Paper Nr: 89
Title:

Tracking Handball Players with the DeepSORT Algorithm

Authors:

Kristina Host, Marina Ivašić-Kos and Miran Pobar

Abstract: In team sports scenes, such as in handball, it is common to have many players on the field performing different actions according to the rules of the game. During practice, each player has their own ball, and sequentially repeats a particular technique in order to adopt it and use it. In this paper, the focus is to detect and track all players on the handball court, so that the performance of a particular athlete, and the adoption of a particular technique can be analyzed. This is a very demanding task of multiple object tracking because players move fast, often change direction, and are often occluded or out of the camera field view. We propose a DeepSort algorithm for player tracking after the players have been detected with YOLOv3 object detector. The effectiveness of the proposed methods is evaluated on a custom set of handball scenes using standard multiple object tracking metrics. Also, common detection problems that have been observed are discussed.
Download

Paper Nr: 90
Title:

Frame Detection and Text Line Segmentation for Early Japanese Books Understanding

Authors:

Lyu Bing, Hiroyuki Tomiyama and Lin Meng

Abstract: Early Japanese books record a lot of information, and deciphering these pieces of ancient literature is very useful for researching history, politics, and culture. However, there are many early Japanese books that have not been deciphered. In recent years, with the rapid development of artificial intelligence technology, researchers are aiming to recognize characters in the early Japanese books through deep learning in order to decipher the information recorded in the books. However, these ancient literature are written in Kuzushi characters which is difficult to be recognized automatically for the reason for a large number of variation and joined-up style. Furthermore, the frame of article and the text line tilt increase the difficult recognition. This paper introduces a deep learning method for recognizing the characters, and proposal frame deletion and text line segmentation for helping Early Japanese Books understanding.
Download

Paper Nr: 96
Title:

Analysing Risk of Coronary Heart Disease through Discriminative Neural Networks

Authors:

Ayush Khaneja, Siddharth Srivastava, Astha Rai, A. S. Cheema and P. K. Srivastava

Abstract: The application of data mining, machine learning and artificial intelligence techniques in the field of diagnostics is not a new concept, and these techniques have been very successfully applied in a variety of applications, especially in dermatology and cancer research. But, in the case of medical problems that involve tests resulting in true or false (binary classification), the data generally has a class imbalance with samples majorly belonging to one class (ex: a patient undergoes a regular test and the results are false). Such disparity in data causes problems when trying to model predictive systems on the data. In critical applications like diagnostics, this class imbalance cannot be overlooked and must be given extra attention. In our research, we depict how we can handle this class imbalance through neural networks using a discriminative model and contrastive loss using a Siamese neural network structure. Such a model does not work on a probability-based approach to classify samples into labels. Instead it uses a distance-based approach to differentiate between samples classified under different labels.
Download

Paper Nr: 105
Title:

Automatic Classification of French Spontaneous Oral Speech into Injunction and No-injunction Classes

Authors:

Abdenour Hacine-Gharbi and Philippe Ravier

Abstract: The injunctive values are of particular interest for many studies dealing with oral speech interactions, e.g. in automatic meaning processing or in the field of language pathology understanding and therapy. We propose in this paper an automatic classification system using a subset of the RAVIOLI database in order to evaluate the role of prosody in the definition of injunctive values. RAVIOLI is constituted of more than 100 hours wild massive oral spontaneous speech. This work is a preliminary study that exploits a subset of 197 injunction values that have been labelled as exploitable utterances by two linguistic experts augmented by 198 of non-injunctive utterances. Many feature types were considered for this study: some classical features employed in speech community for automatic speech recognition tasks (LPCC, MFCC and PLP with their associated dynamic features) and some prosodic features (pitch and energy, with their associated dynamic features). The results clearly show the importance of prosodic features for the classification of the utterances into injunction or no-injunction classes and particularly the predominance of the log energy feature.
Download

Paper Nr: 106
Title:

A Neural Information Retrieval Approach for Résumé Searching in a Recruitment Agency

Authors:

Brandon Grech and David Suda

Abstract: Finding résumés that match a job description can be a daunting task for a recruitment agency, due to the fact that these agencies are dealing with hundreds of job descriptions and tens of thousands of résumés simultaneously. In this paper we explain a search method devised for a recruitment agency by measuring similarity between résumé documents and job description documents. Document vectors are obtained via TF-IDF weights from word embeddings arising from a neural language model with a skip-gram loss function. We show that, with this approach, successful searches can be achieved, and that the number of skips assumed in the skip gram loss function determines how successful it can be for different job descriptions.
Download