ICPRAM 2018 Abstracts


Area 1 - Theory and Methods

Full Papers
Paper Nr: 5
Title:

Accurate and Fast Computation of Approximate Graph Edit Distance based on Graph Relabeling

Authors:

Sousuke Takami and Akihiro Inokuchi

Abstract: The graph edit distance, a well-known metric for determining the similarity between two graphs, is commonly used for analyzing large sets of structured data, such as those used in chemoinformatics, document analysis, and malware detection. As computing the exact graph edit distance is computationally expensive, and may be intractable for large-scale datasets, various approximation techniques have been developed. In this paper, we present a method based on graph relabeling that is both faster and more accurate than the conventional approach. We use unfolded subtrees to denote the potential relabeling of local structures around a given vertex. These subtree representations are concatenated as a vector, and the distance between different vectors is used to characterize the distance between the corresponding graphs. This avoids the need for multiple calculations of the exact graph edit distance between local structures. Simulation experiments on two real-world chemical datasets are reported. Compared with the conventional technique, the proposed method gives a more accurate approximation of the graph edit distance and is significantly faster on both datasets. This suggests the proposed method could be applicable in the analysis of larger and more complex graph-like datasets.
Download

Paper Nr: 6
Title:

A Deep Convolutional Neural Network for Location Recognition and Geometry based Information

Authors:

Francesco Bidoia, Matthia Sabatelli, Amirhossein Shantia, Marco A. Wiering and Lambert Schomaker

Abstract: In this paper we propose a new approach to Deep Neural Networks (DNNs) based on the particular needs of navigation tasks. To investigate these needs we created a labeled image dataset of a test environment and we compare classical computer vision approaches with the state of the art in image classification. Based on these results we have developed a new DNN architecture that outperforms previous architectures in recognizing locations, relying on the geometrical features of the images. In particular we show the negative effects of scale, rotation, and position invariance properties of the current state of the art DNNs on the task. We finally show the results of our proposed architecture that preserves the geometrical properties. Our experiments show that our method outperforms the state of the art image classification networks in recognizing locations.
Download

Paper Nr: 7
Title:

Studio2Shop: From Studio Photo Shoots to Fashion Articles

Authors:

Julia Lasserre, Katharina Rasch and Roland Vollgraf

Abstract: Fashion is an increasingly important topic in computer vision, in particular the so-called street-to-shop task of matching street images with shop images containing similar fashion items. Solving this problem promises new means of making fashion searchable and helping shoppers find the articles they are looking for. This paper focuses on finding pieces of clothing worn by a person in full-body or half-body images with neutral backgrounds. Such images are ubiquitous on the web and in fashion blogs, and are typically studio photos, we refer to this setting as studio-to-shop. Recent advances in computational fashion include the development of domain-specific numerical representations. Our model Studio2Shop builds on top of such representations and uses a deep convolutional network trained to match a query image to the numerical feature vectors of all the articles annotated in this image. Top-k retrieval evaluation on test query images shows that the correct items are most often found within a range that is sufficiently small for building realistic visual search engines for the studio-to-shop setting.
Download

Paper Nr: 9
Title:

Constant-time Extraction of Statistical Moments for Object Detection Procedures

Authors:

Przemysław Klęsk and Aneta Bera

Abstract: We propose a computational technique, backed with special integral images, allowing for constant-time extraction of statistical moments within detection procedures. The moments under study are formulated in their normalized central version. The set of proposed integral images needs to be prepared prior to the detection procedure. Its size grows quadratically with the imposed maximum order of moments, but the time invested in the preparation is amortized sufficiently well at the scanning stage. We give exact counts of the number of operations involved in extraction according to the proposed algorithm. The main idea is coupled with an auxiliary technique for detection window partitioning. In the experimental part, we demonstrate two examples of detection tasks. Detectors have been trained on the proposed features by the RealBoost learning algorithm and achieve both: satisfactory time performance and accuracy.
Download

Paper Nr: 14
Title:

Transfer Learning to Adapt One Class SVM Detection to Additional Features

Authors:

Yongjian Xue and Pierre Beauseroy

Abstract: In this paper, we use the multi-task learning idea to solve a problem of detection with one class SVM when new sensors are added to the system. The main idea is to adapt the detection system to the upgraded sensor system. To solve that problem, the kernel matrix of multi-task learning model can be divided into two parts, one part is based on the former features and the other part is based on the new features. Typical estimation methods can be used to fill the corresponding new features in the old detection system, and a variable kernel is used for the new features in order to balance the importance of the new features with the number of observed samples. Experimental results show that it can keep the false alarm rate relatively stable and decrease the miss alarm rate rapidly as the number of samples increases in the target task.
Download

Paper Nr: 20
Title:

Occlusion-robust Detector Trained with Occluded Pedestrians

Authors:

Zhixin Guo, Wenzhi Liao, Peter Veelaert and Wilfried Philips

Abstract: Pedestrian detection has achieved a remarkable progress in recent years, but challenges remain especially when occlusion happens. Intuitively, occluded pedestrian samples contain some characteristic occlusion appearance features that can help to improve detection. However, we have observed that most existing approaches intentionally avoid using samples of occluded pedestrians during the training stage. This is because such samples will introduce unreliable information, which affects the learning of model parameters and thus results in dramatic performance decline. In this paper, we propose a new framework for pedestrian detection. The proposed method exploits the use of occluded pedestrian samples to learn more robust features for discriminating pedestrians, and enables better performances on pedestrian detection, especially for the occluded pedestrians (which always happens in many real applications). Compared to some recent detectors on Caltech Pedestrian dataset, with our proposed method, detection miss rate for occluded pedestrians are significantly reduced.
Download

Paper Nr: 21
Title:

Density-based Clustering using Automatic Density Peak Detection

Authors:

Huanqian Yan, Yonggang Lu and Heng Ma

Abstract: Clustering is an important unsupervised machine learning method which has played an important role in various fields. Density-based clustering methods are capable of dealing with clusters of different sizes and shapes. As suggested by Alex Rodriguez et al. in a paper published in Science in 2014, the 2D decision graph of the estimated density value versus the minimum distance from the points with higher density values for all the data points can be used to identify the cluster centroids. However, there lack automatic methods for the determination of the cluster centroids from the decision graph. In this work, a novel statistic-based method is designed to identify the cluster centroids automatically from the decision graph. So the number of clusters is also automatically determined. Experiments on several synthetic and real-world datasets show the superiority of the proposed method in centroid identification from the datasets with various distributions and dimensionalities. Furthermore, it is also shown that the proposed method can be effectively applied to image segmentation.
Download

Paper Nr: 30
Title:

Notes on Expected Computational Cost of Classifiers Cascade: A Geometric View

Authors:

Dariusz Sychel, Przemysław Klęsk and Aneta Bera

Abstract: A cascade of classifiers, working within a detection procedure, extracts and uses different number of features depending on the window under analysis. Windows with background regions can be typically recognized as negative with just a few features, whereas windows with target objects (or resembling them) might require thousands of features. The central point of attention for this paper is a quantity that describes the average computational cost of an operating cascade, namely—the expected value of the number of features the cascade uses. This quantity can be calculated explicitly knowing the probability distribution underlying the data and the properties of a particular cascade (detection and false alarm rates of its stages), or it can be accurately estimated knowing just the latter. We show three purely geometric examples that demonstrate how training a cascade with sensitivity / FAR constraints imposed per each stage can lead to non-optimality in terms of the computational cost. We do not propose a particular algorithm to overcome the pitfalls of stage-wise training, instead, we sketch an intuition showing that non-greedy approaches can improve the resulting cascades.
Download

Paper Nr: 33
Title:

Interactive LSTM-Based Design Support in a Sketching Tool for the Architectural Domain - Floor Plan Generation and Auto Completion based on Recurrent Neural Networks

Authors:

Johannes Bayer, Syed Saqib Bukhari and Andreas Dengel

Abstract: While computerized tools for late design phases are well-established in the architectural domain, early design phases still lack widespread, automated solutions. During these phases, the actual concept of a building is developed in a creative process which is conducted manually nowadays. In this paper, we present a novel strategy that tackles the problem in a semi-automated way, where long short-term memories (LSTMs) are making suggestions for each design step based on the user’s existing concept. A design step could be for example the creation of connections between rooms given a list of rooms or the creation of room layouts given a graph of connected rooms. This results in a tightly interleaved interaction between the user and the LSTMs. We propose two approaches for creating LSTMs with this behavior. In the first approach, one LSTM is trained for each design step. In the other approach, suggestions for all design steps are made by a single LSTM. We evaluate these approaches against each other by testing their performance on a set of floor plans. Finally, we present the integration of the best performing approach in an existing sketching software, resulting in an auto-completion for floor plans, similar to text auto-completion in modern office software.
Download

Paper Nr: 43
Title:

Deep Spatial Pyramid Match Kernel for Scene Classification

Authors:

Shikha Gupta, Deepak Kumar Pradhan, Dileep Aroor Dinesh and Veena Thenkanidiyoor

Abstract: Several works have shown that Convolutional Neural Networks (CNNs) can be easily adapted to different datasets and tasks. However, for extracting the deep features from these pre-trained deep CNNs a fixedsize (e.g., 227227) input image is mandatory. Now the state-of-the-art datasets like MIT-67 and SUN-397 come with images of different sizes. Usage of CNNs for these datasets enforces the user to bring different sized images to a fixed size either by reducing or enlarging the images. The curiosity is obvious that “Isn’t the conversion to fixed size image is lossy ?”. In this work, we provide a mechanism to keep these lossy fixed size images aloof and process the images in its original form to get set of varying size deep feature maps, hence being lossless. We also propose deep spatial pyramid match kernel (DSPMK) which amalgamates set of varying size deep feature maps and computes a matching score between the samples. Proposed DSPMK act as a dynamic kernel in the classification framework of scene dataset using support vector machine. We demonstrated the effectiveness of combining the power of varying size CNN-based set of deep feature maps with dynamic kernel by achieving state-of-the-art results for high-level visual recognition tasks such as scene classification on standard datasets like MIT67 and SUN397.
Download

Paper Nr: 49
Title:

Approximate Graph Edit Distance by Several Local Searches in Parallel

Authors:

Évariste Daller, Sébastien Bougleux, Benoit Gaüzère and Luc Brun

Abstract: Solving or approximating the linear sum assignment problem (LSAP) is an important step of several constructive and local search strategies developed to approximate the graph edit distance (GED) of two attributed graphs, or more generally the solution to quadratic assignment problems. Constructive strategies find a first estimation of the GED by solving an LSAP. This estimation is then refined by a local search strategy. While these search strategies depend strongly on the initial assignment, several solutions to the linear problem usually exist. They are not taken into account to get better estimations. All the estimations of the GED based on an LSAP select randomly one solution. This paper explores the insights provided by the use of several solutions to an LSAP, refined in parallel by a local search strategy based on the relaxation of the search space, and conditional gradient descent. Other generators of initial assignments are also considered, approximate solutions to an LSAP and random assignments. Experimental evaluations on several datasets show that the proposed estimation is comparable to more global search strategies in a reduced computational time.
Download

Paper Nr: 53
Title:

Earth Mover’s Distances for Rooted Labaled Unordered Trees based on Tai Mapping Hierarchy

Authors:

Taiga Kawaguchi and Kouichi Hirata

Abstract: In this paper, we introduce earth mover’s distances (EMDs, for short) for rooted labeled trees based on Tai mapping hierarchy. First, by focusing on the restricted mappings in the Tai mapping hierarchy providing the tractable variations of the tree edit distance, we formulate the EMDs whose signatures are all of the pairs of a complete subtree and its frequency and whose ground distances are the tractable variations. Then, we compare the EMDs with their ground distances, which are tractable variations.
Download

Paper Nr: 67
Title:

Convolutional Neural Networks for Phoneme Recognition

Authors:

Cornelius Glackin, Julie Wall, Gérard Chollet, Nazim Dugan and Nigel Cannings

Abstract: This paper presents a novel application of convolutional neural networks to phoneme recognition. The phonetic transcription of the TIMIT speech corpus is used to label spectrogram segments for training the convolutional neural network. A window of a fixed size slides over the spectrogram of the TIMIT utterances and the resulting spectrogram patches are assigned to the appropriate phone class by parsing TIMIT’s phone transcription. The convolutional neural network is the standard GoogLeNet implementation trained with stochastic gradient descent with mini batches. After training, phonetic rescoring is performed in the usual way to map the TIMIT phone set to the smaller standard set. Benchmark results are presented for comparison to other state-of-the-art approaches. Finally, conclusions and future directions with regard to extending the approach are discussed.
Download

Paper Nr: 75
Title:

Performance Evaluation and Enhancement of Biclustering Algorithms

Authors:

Jeffrey Dale, America Nishimoto and Tayo Obafemi-Ajayi

Abstract: In gene expression data analysis, biclustering has proven to be an effective method of finding local patterns among subsets of genes and conditions. The task of evaluating the quality of a bicluster when ground truth is not known is challenging. In this analysis, we empirically evaluate and compare the performance of eight popular biclustering algorithms across 119 synthetic datasets that span a wide range of possible bicluster structures and patterns. We also present a method of enhancing performance (relevance score) of the biclustering algorithms to increase confidence in the significance of the biclusters returned based on four internal validation measures. The experimental results demonstrate that the Average Spearman’s Rho evaluation measure is the most effective criteria to improve bicluster relevance with the proposed performance enhancement method, while maintaining a relatively low loss in recovery scores.
Download

Paper Nr: 79
Title:

CRVM: Circular Random Variable-based Matcher - A Novel Hashing Method for Fast NN Search in High-dimensional Spaces

Authors:

Faraj Alhwarin, Alexander Ferrein and Ingrid Scholl

Abstract: Nearest Neighbour (NN) search is an essential and important problem in many areas, including multimedia databases, data mining and computer vision. For low-dimensional spaces a variety of tree-based NN search algorithms efficiently cope with finding the NN, for high-dimensional spaces, however, these methods are inefficient. Even for Locality Sensitive Hashing (LSH) methods which solve the task approximately by grouping sample points that are nearby in the search space into buckets, it is difficult to find the right parameters. In this paper, we propose a novel hashing method that ensures a high probability of NNs being located in the same hash buckets and a balanced distribution of data across all the buckets. The proposed method is based on computing a selected number of pairwise uncorrelated and uniformly-distributed Circular Random Variables (CRVs) from the sample points. The method has been tested on a large dataset of SIFT features and was compared to LSH and the Fast Library for Approximated NN search (FLANN) matcher with linear search as the base line. The experimental results show that our method significantly reduces the search query time while preserving the search quality, in particular for dynamic databases and small databases whose size does not exceed 200k points.
Download

Paper Nr: 83
Title:

Joint Monocular 3D Car Shape Estimation and Landmark Localization via Cascaded Regression

Authors:

Yanan Miao, Huan Ma, Jia Cui and Xiaoming Tao

Abstract: Previous works on reconstruction of a three-dimensional (3D) point shape model commonly use a two-step framework. Precisely localizing a series of feature points in an image is performed on the first step. Then the second procedure attempts to fit the 3D data to the observations to get the real 3D shape. Such an approach has high time consumption, and easily gets stuck into local minimum. To address this problem, we propose a method to jointly estimate the global 3D geometric structure of car and localize 2D landmarks from a single viewpoint image. First, we parametrizing the 3D shape by the coefficients of the linear combination of a set of predefined shape bases. Second, we adopt a cascaded regression framework to regress the global shape encoded by the prior bases, by jointly minimizing the appearance and shape fitting differences under a weak projection camera model. The position fitting item can help cope with the description ambiguity of local appearance, and provide more information for 3D reconstruction. Experimental results on a multi-view car dataset demonstrate favourable improvements on pose estimation and shape prediction, compared with some previous methods.
Download

Paper Nr: 94
Title:

On the Taut String Interpretation of the One-dimensional Rudin–Osher–Fatemi Model

Authors:

Niels Chr Overgaard

Abstract: A new proof of the equivalence of the Taut String Algorithm and the one-dimensional Rudin–Osher–Fatemi model is presented. Based on duality and the projection theorem in Hilbert space, the proof is strictly elementary. Existence and uniqueness of solutions (in the continuous case) to both denoising models follow as by-products. The standard convergence properties of the denoised signal, as the regularizing parameter tends to zero, are recalled and efficient proofs provided. Moreover, a new and fundamental estimate on the denoised signal is derived. It implies, among other things, the strong convergence (in the space of functions of bounded variation) of the denoised signal to the in-signal as the regularization parameter vanishes.
Download

Paper Nr: 95
Title:

Estimating Uncertainty in Time-difference and Doppler Estimates

Authors:

Gabrielle Flood, Anders Heyden and Kalle Åström

Abstract: Sound and radio can be used to estimate the distance between a transmitter and a sender by correlating the emitted and received signal. Alternatively by correlating two received signals it is possible to estimate distance difference. Such methods can be divided into methods that are robust to noise and reverberation, but give limited precision and sub-sample refinements that are sensitive to noise, but give higher precision when initialized close to the real translation. In this paper we develop stochastic models that can explain the limits in the precision of such sub-sample time-difference estimates. Using such models we provide new methods for precise estimates of time-differences as well as Doppler effects. The method is verified on both synthetic and real data.
Download

Short Papers
Paper Nr: 4
Title:

Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead

Authors:

Matthia Sabatelli, Francesco Bidoia, Valeriu Codreanu and Marco Wiering

Abstract: In this paper we propose a novel supervised learning approach for training Artificial Neural Networks (ANNs) to evaluate chess positions. The method that we present aims to train different ANN architectures to understand chess positions similarly to how highly rated human players do. We investigate the capabilities that ANNs have when it comes to pattern recognition, an ability that distinguishes chess grandmasters from more amateur players. We collect around 3,000,000 different chess positions played by highly skilled chess players and label them with the evaluation function of Stockfish, one of the strongest existing chess engines. We create 4 different datasets from scratch that are used for different classification and regression experiments. The results show how relatively simple Multilayer Perceptrons (MLPs) outperform Convolutional Neural Networks (CNNs) in all the experiments that we have performed. We also investigate two different board representations, the first one representing if a piece is present on the board or not, and the second one in which we assign a numerical value to the piece according to its strength. Our results show how the latter input representation influences the performances of the ANNs negatively in almost all experiments.
Download

Paper Nr: 13
Title:

Abnormal Events Detection for Infrastructure Security using Key Metrics

Authors:

Van-Khoa Le, Edith Grall-Maes and Pierre Beauseroy

Abstract: This paper presents a detection process which utilizes various sensors (camera, card readers, movement detector) for detecting automatically abnormal events. The detection process strengthens current security systems to identify attackers in the context of building and office. Key metrics are proposed to describe people’s behavior in critical zones of the building. They are built using measures from the sensors, which provide information about the person, the position, and the instant. These metrics are used to classify abnormal behaviors from regular ones, based on a statistical classifier. This technique is tested on both simulated data and real data, in which an attacking scenario was prepared by security experts. Results show that abnormal events from the scenario have been successfully detected. The experiments demonstrate that the proposed key metrics are relevant and the proposed detection scheme is appropriate for infrastructure surveillance.
Download

Paper Nr: 24
Title:

Two-step Transfer Learning for Semantic Plant Segmentation

Authors:

Shunsuke Sakurai, Hideaki Uchiyama, Atsushi Shimada, Daisaku Arita and Rin-ichiro Taniguchi

Abstract: We discuss the applicability of a fully convolutional network (FCN), which provides promising performance in semantic segmentation tasks, to plant segmentation tasks. The challenge lies in training the network with a small dataset because there are not many samples in plant image datasets, as compared to object image datasets such as ImageNet and PASCAL VOC datasets. The proposed method is inspired by transfer learning, but involves a two-step adaptation. In the first step, we apply transfer learning from a source domain that contains many objects with a large amount of labeled data to a major category in the plant domain. Then, in the second step, category adaptation is performed from the major category to a minor category with a few samples within the plant domain. With leaf segmentation challenge (LSC) dataset, the experimental results confirm the effectiveness of the proposed method such that F-measure criterion was, for instance, 0.953 for the A2 dataset, which was 0.355 higher than that of direct adaptation, and 0.527 higher than that of non-adaptation.
Download

Paper Nr: 29
Title:

Nearest Neighbor Search using Sketches as Quantized Images of Dimension Reduction

Authors:

Naoya Higuchi, Yasunobu Imamura, Tetsuji Kuboyama, Kouichi Hirata and Takeshi Shinohara

Abstract: In this paper, we discuss sketches based on ball partitioning (BP), which are compact bit sequences representing multidimensional data. The conventional nearest search using sketches consists of two stages. The first stage selects candidates depending on the Hamming distances between sketches. Then, the second stage selects the nearest neighbor from the candidates. Since the Hamming distance cannot completely reflect the original distance, more candidates are needed to achieve higher accuracy. On the other hand, we can regard BP sketches as quantized images of a dimension reduction. Although quantization error is very large if we use only sketches to compute distances, we can partly recover distance information using query. That is, we can compute a lower bound of distance between a query and a data using only query and the sketch of the data. We propose candidate selection methods at the first stage using the lower bounds. Using the proposed method, higher level of accuracy for nearest neighbor search is shown through experimenting on multidimensional data such as images, music and colors.
Download

Paper Nr: 47
Title:

Feature Engineering for Depression Detection in Social Media

Authors:

Maxim Stankevich, Vadim Isakov, Dmitry Devyatkin and Ivan Smirnov

Abstract: This research is based on the CLEF/eRisk 2017 pilot task which is focused on early risk detection of depression. The CLEF/eRsik 2017 dataset consists of text examples collected from messages of 887 Reddit users. The main idea of the task is to classify users into two groups: risk case of depression and non-risk case. This paper considers different feature sets for depression detection task among Reddit users by text messages processing. We examine our bag-of-words, embedding and bigram models using the CLEF/eRisk 2017 dataset and evaluate the applicability of stylometric and morphological features. We also perform a comparison of our results with the CLEF/eRisk 2017 task report.
Download

Paper Nr: 56
Title:

The Wrong Tool for Inference - A Critical View of Gaussian Graphical Models

Authors:

Kevin R. Keane and Jason J. Corso

Abstract: Myopic reliance on a misleading first sentence in the abstract of Covariance Selectiona Dempster (1972) spawned the computationally and mathematically dysfunctional Gaussian graphical model (GGM). In stark contrast to the GGM approach, the actual (Dempster, 1972, § 3) algorithm facilitated elegant and powerful applications, including a “texture model” developed two decades ago involving arbitrary distributions of 1000+ dimensions Zhu (1996). The “Covariance Selection” algorithm proposes a greedy sequence of increasingly constrained maximum entropy hypotheses Good (1963), terminating when the observed data “fails to reject” the last proposed probability distribution. We are mathematically critical of GGM methods that address a continuous convex domain with a discrete domain “golden hammer”. Computationally, selection of the wrong tool morphs polynomial-time algorithms into exponential-time algorithms. GGMs concepts are at odds with the fundamental concept of the invariant spherical multivariate Gaussian distribution. We are critical of the Bayesian GGM approach because the model selection process derails at the start when virtually all prior mass is attributed to comically precise multi-dimensional geometric “configurations” (Dempster, 1969, Ch. 13). We propose two Bayesian alternatives. The first alternative is based upon (Dempster, 1969, Ch. 15.3) and (Hoff, 2009, Ch. 7). The second alternative is based upon Bretthorst (2012), a recent paper placing maximum entropy methods such as the “Covariance Selection” algorithm in a Bayesian framework.
Download

Paper Nr: 87
Title:

Semantic Segmentation in Red Relief Image Map by UX-Net

Authors:

Tomoya Komiyama, Kazuhiro Hotta, Kazuo Oda, Satomi Kakuta and Mikako Sano

Abstract: This paper proposes a semantic segmentation method in Red Relief Image Map which a kind of aerial laser image. We modify the U-Net by adding the paths between convolutional layer and deconvolutional layer with different resolution. By using the feature maps obtained at different layers, the segmentation accuracy is improved. We compare the segmentation accuracy of the proposed UX-Net with the original U-net. Our proposed method improved class-average accuracy in comparison with the U-Net.
Download

Paper Nr: 90
Title:

Segmentation of Lidar Intensity using Weighted Fusion based on Appropriate Region Size

Authors:

Masaki Umemura, Kazuhiro Hotta, Hideki Nonaka and Kazuo Oda

Abstract: We propose a semantic segmentation method for LiDAR intensity images obtained by Mobile Mapping System (MMS). Conventional segmentation method could give high pixel-wise accuracy but the accuracy of small objects is quite low. We solve this issue by using the weighted fusion of multi-scale inputs because each class has the most effective scale that small object class gives higher accuracy for small input size than large input size. In experiments, we use 36 LIDAR intensity images with ground truth labels. We divide 36 images into 28 training images and 8 test images. Our proposed method gain 87.41% on class average accuracy, and it is 5% higher than conventional method. We demonstrated that the weighted fusion of multi-scale inputs is effective to improve the segmentation accuracy of small objects.
Download

Paper Nr: 3
Title:

The Method to Measure Si Thickness for Bond Line Thickness

Authors:

YangSub Park, KilBum Kang, Sangyun Yun and SeongSoo Kim

Abstract: Today, many semiconductor products are manufactured through the TSV process. At this time, It is important to manage the Bond Line Thickness because all of the stacked dies must be discarded due to a single contact failure. If we can measure the thickness of the silicon, the BLT in the wafer level package process can be estimated. In this paper, we propose a method to measure the thickness of silicon by using infrared ray. We designed the infrared light source to select the path of the incident light to the objective lens. And this optical system has a characteristic of moving in the opposite direction according to a change in height. By using this optical system, it is possible to calculate the correct in-focus position. By doing this, we present a method to measure BLT by measuring the distance between the top and bottom of Si surface.
Download

Paper Nr: 38
Title:

Impact of Training LSTM-RNN with Fuzzy Ground Truth

Authors:

Martin Jenckel, Sourabh Sarvotham Parkala, Syed Saqib Bukhari and Andreas Dengel

Abstract: Most machine learning algorithms follow the supervised learning approach and therefore require annotated training data. The large amount of training data required to train state of the art deep neural networks changed the methods of acquiring the required annotations. User annotations or completely synthetic annotations are becoming more and more prevalent replacing careful manual annotations by experts. In the field of OCR recent work has shown that synthetic ground truth acquired through clustering with minimal manual annotation yields good results when combined with bidirectional LSTM-RNN. Similarly we propose a change to standard LSTM training to handle imperfect manual annotation. When annotating historical documents or low quality scans deciding on the correct annotation is difficult especially for non-experts. Providing all possible annotations in such cases, instead of just one, is what we call fuzzy ground truth. Finally we show that training an LSTM-RNN on fuzzy ground truth achieves a similar performance.
Download

Paper Nr: 39
Title:

Automatic Detection of a Phases for CAP Classification

Authors:

Fabio Mendonça, Ana Fred, Sheikh Shanawaz Mostafa, Fernando Morgado-Dias and Antonio G. Ravelo-García

Abstract: The aim of this study is to develop an automatic detector of the cyclic alternating pattern by first detecting the activation phases (A phases) of this pattern, analysing the electroencephalogram during sleep, and then applying a finite state machine to implement the final classification. A public database was used to test the algorithms and a total of eleven features were analysed. Sequential feature selection was employed to select the most relevant features and a post processing procedure was used for further improvement of the classification. The classification of the A phases was produced using linear discriminant analysis and the average accuracy, sensitivity and specificity was, respectively, 75%, 78% and 74%. The cyclic alternating pattern detection accuracy was 75%. When comparing with the state of the art, the proposed method achieved the highest sensitivity but a lower accuracy since the fallowed approach was to keep the REM periods, contrary to the method that is used in the majority of the state of the art publications which leads to an increase in the overall performance. However, the approach of this work is more suitable for automatic system implementation since no alteration of the EEG data is needed.
Download

Paper Nr: 45
Title:

3D Orientation Estimation of Industrial Parts from 2D Images using Neural Networks

Authors:

Julien Langlois, Harold Mouchère, Nicolas Normand and Christian Viard-Gaudin

Abstract: In this paper we propose a pose regression method employing a convolutional neural network (CNN) fed with single 2D images to estimate the 3D orientation of a specific industrial part. The network training dataset is generated by rendering pose-views from a textured CAD model to compensate for the lack of real images and their associated position label. Using several lighting conditions and material reflectances increases the robustness of the prediction and allows to anticipate challenging industrial situations. We show that using a geodesic loss function, the network is able to estimate a rendered view pose with a 5 accuracy while inferring from real images gives visually convincing results suitable for any pose refinement processes.
Download

Paper Nr: 51
Title:

Minimum Modal Regression

Authors:

Koichiro Yamauchi and Vanamala Narasimha Bhargav

Abstract: The recent development of microcomputers enables the execution of complex software in small embedded systems. Artificial intelligence is one form of software to be embedded into such devices. However, almost all embedded systems still have restricted storage space. One of the authors has already proposed an incremental learning method for regression, which works under a fixed storage space; however, this method cannot support the multivalued functions that usually appear in real-world problems. One way to support the multivalued function is to use the model regression method with a kernel density estimator. However, this method assumes that all sample points are recorded as kernel centroids, which is not suitable for small embedded systems. In this paper, we propose a minimum modal regression method that reduces the number of kernels using a projection method. The conditions required to maintain accuracy are derived through theoretical analysis. The experimental results show that our method reduces the number of kernels while maintaining a specified level of accuracy.
Download

Paper Nr: 57
Title:

Dimensionality Reduction with Evolutionary Shephard-Kruskal Embeddings

Authors:

Oliver Kramer

Abstract: This paper introduces an evolutionary iterative approximation of Shephard-Kruskal based dimensionality reduction with linear runtime. The method, which we call evolutionary Shephard-Kruskal embedding (EvoSK), iteratively constructs a low-dimensional representation with Gaussian sampling in the environment of the latent positions of the closest embedded patterns. The approach explicitly optimizes the distance preservation in low-dimensional space, similar to the objective solved by multi-dimensional scaling. Experiments on a small benchmark data set show that EvoSK can perform better than its famous counterparts multi-dimensional scaling and isometric mapping and outperforms stochastic neighbor embeddings.
Download

Paper Nr: 62
Title:

The Scarcity of Universal Colour Names

Authors:

Gunilla Borgefors

Abstract: There is a trend in Computer Vision to use over twenty colour names for image annotation, retrieval and to train deep learning networks to name unknown colours for human use. This paper will show that there is little consistency of colour naming between languages and even between individuals speaking the same language. Experiments will be cited that show that your mother tongue influences how your brain processes colour. It will also be pointed out that the eleven so called basic colours in English are not universal and cannot be applied to other languages. The conclusion is that only the six Hering primary colours, possibly with simple qualifications, are the only ones you should use if you aim for universal usage of your systems. That is: black, white, red, green, blue, and yellow.
Download

Paper Nr: 71
Title:

Multi-modal Medical Image Registration by Local Affine Transformations

Authors:

Liliana Lo Presti and Marco La Cascia

Abstract: Image registration is the process of finding the geometric transformation that, applied to the floating image, gives the registered image with the highest similarity to the reference image. Registering a pair of images involves the definition of a similarity function in terms of the parameters of the geometric transformation that allows the registration. This paper proposes to register a pair of images by iteratively maximizing the empirical mutual information through coordinate gradient descent. Hence, the registered image is obtained by applying a sequence of local affine transformations. Rather than adopting a uniformly spaced grid to select image blocks to locally register, as done by state-of-the-art techniques, this paper proposes a method which is similar in spirit to boosting strategies used in classification. In this work, a probability distribution over the pixels of the registered image is maintained. At each pixel, this distribution represents the probability that a local affine transformation of a block centered on this pixel should be computed to improve the similarity between the registered and the reference images. The distribution is updated iteratively during the registration process to move probability mass towards pixels unaffected by the estimated local transformation. The paper presents preliminary results by a qualitative evaluation on several pairs of medical images acquired by different sources.
Download

Paper Nr: 77
Title:

Novel Clustering based on Discrete Morse Technique

Authors:

Jian Ping Zhang, Xi Yu Liu and Yong Li

Abstract: A new clustering algorithm based on discrete Morse theory is proposed for cluster analysis in this paper. Firstly, an energy surface is defined on data set by Gaussian kernel functions. Secondly, a simplicial complex can be obtained by hull Triangulation on the energy surface. Finally, the optimization model based on discrete Morse theory is adopted to find cluster centers and clusters on a simplicial complex. It is a novel approach. The experimental results on some synthetic and UCI data sets have demonstrated that the new algorithm can discover clusters with arbitrary shapes and densities at different levels, moreover it can successfully divide data points overlapping into many meaningful clusters. The results show the feasibility and effectiveness of the new clustering algorithm.
Download

Paper Nr: 78
Title:

Sparse Least Squares Twin Support Vector Machines with Manifold-preserving Graph Reduction

Authors:

Xijiong Xie

Abstract: Least squares twin support vector machines are a new non-parallel hyperplane classifier, in which the primal optimization problems of twin support vector machines are modified in least square sense and inequality constraints are replaced by equality constraints. In classification problems, enhancing the robustness of least squares twin support vector machines and reducing the time complexity of kernel function evaluation of a new example when inferring the label of a new example are very important. In this paper, we propose a new sparse least squares twin support vector machines based on manifold-preserving graph reduction which is an efficient graph reduction algorithm with manifold assumption. This method first selects informative examples for positive examples and negative examples, respectively and then applies them for classification. Experimental results confirm the feasibility and effectiveness of our proposed method.
Download

Paper Nr: 89
Title:

Road Detection from Satellite Images by Improving U-Net with Difference of Features

Authors:

Ryosuke Kamiya, Kazuhiro Hotta, Kazuo Oda and Satomi Kakuta

Abstract: In this paper, we propose a road detection method from satellite images by improving the U-Net using the difference of feature maps. U-Net has connections between convolutional layers and deconvolutional layers and concatenates feature maps at convolutional layer with those at deconvolutional layer. Here we introduce the difference of feature maps instead of the concatenation of feature maps. We evaluate our proposed method on road detection problem. Our proposed method obtained significant improvements in comparison with the U-Net.
Download

Paper Nr: 96
Title:

Combining Keypoint Clustering and Neural Background Subtraction for Real-time Moving Object Detection by PTZ Cameras

Authors:

Danilo Avola, Marco Bernardi, Luigi Cinque, Gian Luca Foresti and Cristiano Massaroni

Abstract: Detection of moving objects is a topic of great interest in computer vision. This task represents a prerequisite for more complex duties, such as classification and re-identification. One of the main challenges regards the management of dynamic factors, with particular reference to bootstrapping and illumination change issues. The recent widespread of PTZ cameras has made these issues even more complex in terms of performance due to their composite movements (i.e., pan, tilt, and zoom). This paper proposes a combined keypoint clustering and neural background subtraction method for real-time moving object detection in video sequences acquired by PTZ cameras. Initially, the method performs a spatio-temporal tracking of the sets of moving keypoints to recognize the foreground areas and to establish the background. Subsequently, it adopts a neural background subtraction to accomplish a foreground detection, in these areas, able to manage bootstrapping and gradual illumination changes. Experimental results on two well-known public datasets and comparisons with different key works of the current state-of-the-art demonstrate the remarkable results of the proposed method.
Download

Area 2 - Applications

Full Papers
Paper Nr: 10
Title:

Automated Diagnosis of Breast Cancer and Pre-invasive Lesions on Digital Whole Slide Images

Authors:

Ezgi Mercan, Sachin Mehta, Jamen Bartlett, Donald L. Weaver, Joann G. Elmore and Linda G. Shapiro

Abstract: Digital whole slide imaging has the potential to change diagnostic pathology by enabling the use of computeraided diagnosis systems. To this end, we used a dataset of 240 digital slides that are interpreted and diagnosed by an expert panel to develop and evaluate image features for diagnostic classification of breast biopsy whole slides to four categories: benign, atypia, ductal carcinoma in-situ and invasive carcinoma. Starting with a tissue labeling step, we developed features that describe the tissue composition of the image and the structural changes. In this paper, we first introduce two models for the semantic segmentation of the regions of interest into tissue labels: an SVM-based model and a CNN-based model. Then, we define an image feature that consists of superpixel tissue label frequency and co-occurrence histograms based on the tissue label segmentations. Finally, we use our features in two diagnostic classification schemes: a four-class classification, and an alternative classification that is one-diagnosis-at-a-time starting with invasive versus benign and ending with atypia versus ductal carcinoma in-situ (DCIS). We show that our features achieve competitive results compared to human performance on the same dataset. Especially at the critical atypia vs. DCIS threshold, our system outperforms pathologists by achieving an 83% accuracy.
Download

Paper Nr: 11
Title:

Efficient and Accurate Mitosis Detection - A Lightweight RCNN Approach

Authors:

Yuguang Li, Ezgi Mercan, Stevan Knezevitch, Joann G. Elmore and Linda G. Shapiro

Abstract: The analysis of breast cancer images includes the detection of mitotic figures whose counting is important in the grading of invasive breast cancer. Mitotic figures are difficult to find in the very large whole slide images, as they may look only slightly different from normal nuclei. In the last few years, several convolutional neural network (CNN) systems have been developed for mitosis detection that are able to beat conventional, feature-based approaches. However, these networks contain many layers and many neurons per layer, so both training and actual classification require powerful computers with GPUs. In this paper, we describe a new lightweight region-based CNN methodology we have developed that is able to run on standard machines with only a CPU and can achieve accuracy measures that are almost as good as the best CNN-based system so far in a fraction of the time, when both are run on CPUs. Our system, which includes a feature-based region extractor plus two CNN stages, is tested on the ICPR 2012 and ICPR 2014 datasets, and results are given for accuracy and timing.
Download

Paper Nr: 37
Title:

Fully Automatic Faulty Weft Thread Detection using a Camera System and Feature-based Pattern Recognition

Authors:

Marcin Kopaczka, Marco Saggiomo, Moritz Güttler, Thomas Gries and Dorit Merhof

Abstract: In this paper, we present a novel approach for the fully automated detection of faulty weft threads on air-jet weaving machines using computer vision. The proposed system consists of a camera array for image acquisition and a classification pipeline in which we use different image processing and machine learning methods to allow precise localization and reliable classification of defects. The camera system is introduced and its advantages over other approaches are discussed. Subsequently, the processing steps are motivated and described in detail, followed by an in-depth analysis of the impact of different system parameters to allow chosing optimal algorithm combinations for the problem of faulty weft yarn detection. To analyze the capabilities of our solution, system performance is thoroughly evaluated under realistic production settings, showing excellent detection rates.
Download

Paper Nr: 41
Title:

Modified Time Flexible Kernel for Video Activity Recognition using Support Vector Machines

Authors:

Ankit Sharma, Apurv Kumar, Sony Allappa, Veena Thenkanidiyoor, Dileep Aroor Dinesh and Shikha Gupta

Abstract: Video activity recognition involves automatically assigning a activity label to a video. This is a challenging task due to the complex nature of video data. There exists many sub activities whose temporal order is important. For building an SVM-based activity recognizer it is necessary to use a suitable kernel that considers varying length temporal data corresponding to videos. In (Mario Rodriguez and Makris, 2016), a time flexible kernel (TFK) is proposed for matching a pair of videos by encoding a video into a sequence of bag of visual words (BOVW) vectors. The TFK involves matching every pair of BOVW vectors from a pair of videos using linear kernel. In this paper we propose modified TFK (MTFK) where better approaches to match a pair of BOVW vectors are explored. We propose to explore the use of frequency based kernels for matching a pair of BOVW vectors. We also propose an approach for encoding the videos using Gaussian mixture models based soft clustering technique. The effectiveness of the proposed approaches are studied using benchmark datasets.
Download

Paper Nr: 61
Title:

Perception Enhancement for Bionic Vision - Preliminary Study on Object Classification with Subretinal Implants

Authors:

Johannes Steffen, Jonathan Napp, Stefan Pollmann and Klaus Tönnies

Abstract: The restored vision by using subretinal implants of patients suffering from a loss of photoreceptors, e.g., in retinitis pigmentosa and age-related macular degeneration, is, compared to healthy subjects, very limited. Therefore, we investigated, whether it is possible to enhance the perception of such patients by transforming the input images in a systematic manner. To this end, we propose a new image transformation network that is capable to learn plausible image transformations in an end-to-end fashion in order to enhance the perception of (virtual) patients with simulated subretinal implants. As a proof of concept, we test our method on an object classification task with three classes. Our results are promising. Compared to a baseline model, the overall object classification accuracy increased significantly from 67:4% to 81:1%. Furthermore, we discuss implications and limitations of our proof of concept and outline aspects of our work that can be improved and need to be subject of further research.
Download

Paper Nr: 65
Title:

Automatic Recognition of the Hepatocellular Carcinoma from Ultrasound Images using Complex Textural Microstructure Co-Occurrence Matrices (CTMCM)

Authors:

Delia Mitrea, Sergiu Nedevschi and Radu Badea

Abstract: The hepatocellular carcinoma is one of the most frequent malignant liver tumours. The golden standard for HCC detection is the needle biopsy, but this is a dangerous technique. We aim to perform the non-invasive recognition of this tumour, using computerized methods within ultrasound images. For this purpose, we defined the textural model of HCC, consisting of the relevant textural features that separate this tumour from other visually similar tissues and of the specific values that correspond to these relevant features: arithmetic mean, standard deviation, probability distribution. In this paper, we demonstrate the role that the Complex Textural Microstructure Co-occurrence Matrices have in the improvement of the textural model of HCC and in the increase of the recognition performance. During the experiments, we considered the following classes: cirrhosis, HCC, cirrhotic parenchyma on which HCC evolved and hemangioma, a frequent benign liver tumour. The resulted recognition accuracy for HCC was towards 90%.
Download

Paper Nr: 70
Title:

Supervised Deep Polylingual Topic Modeling for Scholarly Information Recommendations

Authors:

Pannawit Samatthiyadikun and Atsuhiro Takasu

Abstract: Polylingual text processing is important for content-based and hybrid recommender systems. It helps recommender systems extract content information from broader sources. It also enables systems to recommend items in a user’s native language. We propose a cross-lingual keyword recommendation method based on a polylingual topic model. The model is further extended with a popular deep learning architecture, the CNN–RNN model. With this model, keywords can be recommended from text written in different languages; model parameters are very meaningful, and we can interpret them. We evaluate the proposed method using crosslingual bibliographic databases that contain both English and Japanese abstracts and keywords.
Download

Paper Nr: 97
Title:

A Rover-based System for Searching Encrypted Targets in Unknown Environments

Authors:

Danilo Avola, Luigi Cinque, Gian Luca Foresti, Marco Raoul Marini and Daniele Pannone

Abstract: In the last decade, there has been a widespread use of autonomous robots in several application fields, such as border controls, precision agriculture, and military operations. Usually, in the latter, there is the need to encrypt the acquired data, or to mark as relevant some positions or areas. In this paper, we present a client-server rover-based system able to search encrypted targets within an unknown environment. The system uses a rover to explore an unknown environment through a Simultaneous Localization And Mapping (SLAM) algorithm and acquires the scene with a standard RGB camera. Then, by using visual cryptography, it is possible to encrypt the acquired RGB data and to send it to a server, which decrypts the data and checks if it contains a target object. The experiments performed on several objects show the effectiveness of the proposed system.
Download

Short Papers
Paper Nr: 2
Title:

Local and Global Feature Descriptors Combination from RGB-Depth Videos for Human Action Recognition

Authors:

Rawya Al-Akam and Dietrich Paulus

Abstract: This paper attempts to present human action recognition through the combination of local and global feature descriptors values, which are extracted from RGB and Depth videos. A video sequence is represented as a collection of spatio and spatio-temporal features. However, the challenging problems exist in both local and global descriptors for classifying human actions. We proposed a novel combination of the two descriptor methods, 3D trajectory and motion boundary histogram for the local feature and global Gist feature descriptor for the global feature (3DTrMBGG). To solve the problems of the structural information among the local descriptors, and clutter background and occlusion among the global descriptor, the combination of the local and global features descriptor is used. In this paper, there are three novel combination steps of video descriptors. First, combines motion and 3D trajectory shape descriptors. Second, extract the structural information using global gist descriptor. Third, combines these two descriptor steps to get the 3DTrMBGG feature vector from spatio-temporal domains. The results of the 3DTrMBGG features are used along with the K-mean clustering and multi-class support vector machine classifier. Our new method on several video actions improves performance on actions even with low movement rate and outperforms the competing state-of-the-art -temporal feature-based human action recognition methods.
Download

Paper Nr: 15
Title:

Experiences with Publicly Open Human Activity Data Sets - Studying the Generalizability of the Recognition Models

Authors:

Pekka Siirtola, Heli Koskimäki and Juha Röning

Abstract: In this article, it is studied how well inertial sensor-based human activity recognition models work when training and testing data sets are collected in different environments. Comparison is done using publicly open human activity data sets. This article has four objectives. Firstly, survey about publicly available data sets is presented. Secondly, one previously not shared human activity data set used in our earlier work is opened for public use. Thirdly, the genaralizability of the recognition models trained using publicly open data sets are experimented by testing them with data from another publicly open data set to get knowledge to how models work when they are used in different environment, with different study subjects and hardware. Finally, the challenges encountered using publicly open data sets are discussed. The results show that data gathering protocol can have a statistically significant effect to the recognition rates. In addition, it was noted that often publicly open human activity data sets are not as easy to apply as they should be.
Download

Paper Nr: 19
Title:

Face Anti-spoofing based on Deep Stack Generalization Networks

Authors:

Xin Ning, Weijun Li, Meili Wei, Linjun Sun and Xiaoli Dong

Abstract: Thanks for the recent development of Convolutional Neural Networks (CNNs), the performance of face anti-spoofing methods has been improved by extracting more distinguishing features between genuine and fake faces than the hand-crafted texture features. As known, the way of fraud is diverse, thus the fake class has large intra-class variations, so training as a binary classification problem is hard to learn the distinguishing features. In this work, our contribution is a novel model fusion approach for face anti-spoofing, which can reduce the intra-class variations. According to the type of fraud, we firstly train different models for face anti-spoofing problem by CNN, thus the intra-class variations of fake class has reduced during training each model. Distinguishing features can be learned more easily. Then the stacked generalized method is used for combining the lower models to achieve better predictive accuracy. For perfecting the generalized accuracy, the stacked generalized approach changes the weight of each model's prediction, so that the model after fusion can predict precisely whether the face image is fake or genuine. Meanwhile, the experimental results indicate our method can obtain excellent results compared to the state-of-the-art methods.
Download

Paper Nr: 22
Title:

High Performance Layout Analysis of Medieval European Document Images

Authors:

Syed Saqib Bukhari, Ashutosh Gupta, Anil Kumar Tiwari and Andreas Dengel

Abstract: Layout analysis, mainly including binarization and page segmentation, is one of the most important performance determining steps of an OCR system for complex medieval document images, which contain noise, distortions and irregular layouts. In this paper, we present high performance page segmentation techniques for medieval European document images which include a novel main-body and side-notes segregation and an improved version of OCRopus (OCRopus, ) based text line extraction. In order to complete the high performance layout analysis pipeline, we have also presented the application of the percentile based binarization (Afzal et al., 2014) and the multiresolution morphology based text and non-text segmentation (Bukhari et al., 2011) methods over historical document images. presented layout analysis techniques are applied to a collection of the 15th century Latin document images, which achieved more than 90% accuracy for each of the segmentation techniques.
Download

Paper Nr: 26
Title:

A Fast Multiresolution Approach Useful for Retinal Image Segmentation

Authors:

Dario Lo Castro, Domenico Tegolo and Cesare Valenti

Abstract: Retinal diseases such as retinopathy of prematurity (ROP), diabetic and hypertensive retinopathy present several deformities of fundus oculi which can be analyzed both during screening and monitoring such as the increase of tortuosity, lesions of tissues, exudates and hemorrhages. In particular, one of the first morphological changes of vessel structures is the increase of tortuosity. The aim of this work is the enhancement and the detection of the principal characteristics in retinal image by exploiting a non-supervised and automated methodology. With respect to the well-known image analysis through Gabor or Gaussian filters, our approach uses a filter bank that resembles the “à trous” wavelet algorithm. In this contribution we show a particular approach to speed-up the computing time. This methodology rotates the kernels and it is a fast enough to extract information useful to assess vessel tortuosity and to segment (not considered explicitly in this paper) retinal images. Furthermore, we compare on the public databases DRIVE and DIARETDB0 our output images against the SCIRD-TS algorithm, which is considered as one of the most effective supervised methods for the detection of retinal thin structures.
Download

Paper Nr: 27
Title:

Automatic Counting of Wheat Spikes from Wheat Growth Images

Authors:

Najmah Alharbi, Ji Zhou and Wenija Wang

Abstract: This study aims to develop an automated screening system that can estimate the number of wheat spikes (i.e. ears) from a given wheat plant image acquired after the flowering stage. The platform can be used to assist the dynamic estimation of wheat yield potential as well as grain yield based on wheat images captured by the CropQuant platform. Our proposed system framework comprises three main stages. Firstly, it transforms the wheat plant raw image data using colour index of vegetation extraction (CIVE) and then segments wheat ear regions from the image to reduce the influence of the background signals. Secondly, it detects wheat ears using Gabor filter banks and K-means clustering algorithm. Finally, it estimates the number of wheat spikes within extracted wheat spike region through a regression method. The framework is tested with a real-world dataset of wheat growth images equally distributed from flowering to ripening stages. The estimations of the wheat ears were benchmarked against the ground truth produced in this study by human manual counting. Our automatic counting system achieved an average accuracy of 90.7% with a standard deviation of 0.055, at a much faster speed than human experts and hence the system has a potential to be improved for agricultural applications on wheat growth studies in the future.
Download

Paper Nr: 34
Title:

Road Boundary Detection using In-vehicle Monocular Camera

Authors:

Kazuki Goro and Kazunori Onoguchi

Abstract: When a lane marker such as a white line is not drawn on the road or it’s hidden by snow, it’s important for the lateral motion control of the vehicle to detect the boundary line between the road and the roadside object such as curbs, grasses, side walls and so on. Especially, when the road is covered with snow, it’s necessary to detect the boundary between the snow side wall and the road because other roadside objects are occluded by snow. In this paper, we proposes the novel method to detect the shoulder line of a road including the boundary with the snow side wall from an image of an in-vehicle monocular camera. Vertical lines on an object whose height is different from a road surface are projected onto slanting lines when an input image is mapped to a road surface by the inverse perspective mapping. The proposed method detects a road boundary using this characteristic. In order to cope with the snow surface where various textures appear, we introduce the degree of road boundary that responds strongly at the boundary with the area where slant edges are dense. Since the shape of the snow wall is complicated, the boundary line is extracted by the Snakes using the degree of road boundary as image forces. Experimental results using the KITTI dataset and our own dataset including snow road show the effectiveness of the proposed method.
Download

Paper Nr: 42
Title:

Application of Machine Learning for Automatic MRD Assessment in Paediatric Acute Myeloid Leukaemia

Authors:

Roxane Licandro, Michael Reiter, Markus Diem, Michael Dworzak, Angela Schumich and Martin Kampel

Abstract: Acute Myeloid Leukaemia (AML) is a rare type of blood cancer in children. This disease originates from genetic alterations of hematopoetic progenitor cells, which are involved in the hematopoiesis process, and leads to the proliferation of undifferentiated (leukaemic) cells. Flow CytoMetry (FCM) measurements enable the assessment of the Minimal Residual Disease (MRD), a value which clinicians use as powerful predictor for treatment response and diagnostic tool for planning patients’ individual therapy. In this work we propose machine learning applications for the automatic MRD assessment in AML. Recent approaches focus on childhood Acute Lymphoblastic Leukaemia (ALL), more common in this population. We perform experiments regarding the performance of state-of-the-art algorithms and provide a novel GMM formulation to estimate leukaemic cell populations by learning background (non-cancer) populations only. Additionally, combination of backgrounds of different leukaemia types are evaluated regarding their ability to predict MRD in AML. The results suggest that background populations and combinations of these are suitable to assess MRD in AML.
Download

Paper Nr: 46
Title:

Transfer Learning for Structures Spotting in Unlabeled Handwritten Documents using Randomly Generated Documents

Authors:

Geoffrey Roman-Jimenez, Christian Viard-Gaudin, Adeline Granet and Harold Mouchére

Abstract: Despite recent achievements in handwritten text recognition due to major advances in deep neural networks, historical handwritten documents analysis is still a challenging problem because of the requirement of large annotated training database. In this context, knowledge transfer of neural networks pre-trained on already available labeled data could allow us to process new collections of documents. In this study, we focus on localization of structures at the word-level, distinguishing words from numbers, in unlabeled handwritten documents. We based our approach on a transductive transfer learning paradigm using a deep convolutional neural network pre-trained on artificial labeled images randomly generated with strokes, word and number patches. We designed our model to predict a mask of the structures positions at the pixel-level, directly from the pixel values. The model has been trained using 100,000 generated images. The classification performances of our model were assessed by using randomly generated images coming from a different set of images of words and digits. At the pixel level, the averaged accuracy of the proposed structures detection system reach 96.1%. We evaluated the transfer capability of our model on two datasets of real handwritten documents unseen during the training. Results show that our model is able to distinguish most ”digits” structures from ”word” structures while avoiding other various structures present in the documents, showing the good transferability of the system to real documents.
Download

Paper Nr: 50
Title:

Anomaly Detection in Industrial Software Systems - Using Variational Autoencoders

Authors:

Tharindu Kumarage, Nadun De Silva, Malsha Ranawaka, Chamal Kuruppu and Surangika Ranathunga

Abstract: Industrial software systems are known to be used for performing critical tasks in numerous fields. Faulty conditions in such systems can cause system outages that could lead to losses. In order to prevent potential system faults, it is important that anomalous conditions that lead to these faults are detected effectively. Nevertheless, the high complexity of the system components makes anomaly detection a high dimensional machine learning problem. This paper presents the application of a deep learning neural network known as Variational Autoencoder (VAE), as the solution to this problem. We show that, when used in an unsupervised manner, VAE outperforms the well-known clustering technique DBSCAN.Moreover, this paper shows that higher recall can be achieved using the semi-supervised one class learning of VAE, which uses only the normal data to train the model. Additionally, we show that one class learning of VAE outperforms semi-supervised one class SVM when training data consist of only a very small amount of anomalous samples. When a tree based ensemble technique is adopted for feature selection, the obtained results evidently demonstrate that the performance of the VAE is highly positively correlated with the selected feature set.
Download

Paper Nr: 55
Title:

Enhancement of Emotion Recogniton using Feature Fusion and the Neighborhood Components Analysis

Authors:

Hany Ferdinando and Esko Alasaarela

Abstract: Feature fusion is a common approach to improve the accuracy of the system. Several attemps have been made using this approach on the Mahnob-HCI database for affective recognition, achieving 76% and 68% for valence and arousal respectively as the highest achievements. This study aimed to improve the baselines for both valence and arousal using feature fusion of HRV-based, which used the standard Heart Rate Variability analysis, standardized to mean/standard deviation and normalized to [-1,1], and cvxEDA-based feature, calculated based on a convex optimization approach, to get the new baselines for this database. The selected features, after applying the sequential forward floating search (SFFS), were enhanced by the Neighborhood Component Analysis and fed to kNN classifier to solve 3-class classification problem, validated using leave-one-out (LOO), leave-one-subject-out (LOSO), and 10-fold cross validation methods. The standardized HRV-based features were not selected during the SFFS method, leaving feature fusion from normalized HRV-based and cvxEDA-based features only. The results were compared to previous studies using both single- and multi-modality. Applying the NCA enhanced the features such that the performances in valence set new baselines: 82.4% (LOO validation), 79.6% (10-fold cross validation), and 81.9% (LOSO validation), enhanced the best achievement from both single- and multi-modality. For arousal, the performances were 78.3%, 78.7%, and 77.7% for LOO, LOSO, and 10-fold cross validations respectively. They outperformed the best achievement using feature fusion but could not enhance the performance in single-modality study using cvxEDA-based feature. Some future works include utilizing other feature extraction methods and using more sophisticated classifier other than the simple kNN.
Download

Paper Nr: 60
Title:

Coarse Clustering and Classification of Images with CNN Features for Participatory Sensing in Agriculture

Authors:

Prakruti Bhatt, Sanat Sarangi and Srinivasu Pappula

Abstract: A solution is proposed to perform unsupervised image classification and tagging by leveraging the high level features extracted from a pre-trained Convolutional Neural Network (CNN). It is validated over images collected through a mobile application used by farmers to report image-based events like pest and disease incidents, and application of agri-inputs towards self-certification of farm operations. These images need to be classified into their respective event classes in order to help farmers tag images properly and support the experts to issue appropriate advisories. Using the features extracted from CNN trained on ImageNet database, images are coarsely clustered into classes for efficient image tagging. We evaluate the performance of different clustering methods over the feature vectors of images extracted from global average pooling layer of state-of-the-art deep CNN models. The clustered images represent a broad category which is further divided into classes. CNN features of the tea leaves category of images were used to train the SVM classifier with which we achieve 93.75% classification accuracy in automated state diagnosis of tea leaves captured in uncontrolled conditions. This method creates a model to auto-tag images at the source and can be deployed at scale through mobile applications.
Download

Paper Nr: 74
Title:

Wavelet Cepstral Coefficients for Electrical Appliances Identification using Hidden Markov Models

Authors:

Abdenour Hacine-Gharbi and Philippe Ravier

Abstract: In previews work, a construction of electrical appliances identification system has been proposed using Hidden Markov Models combined with STFS (Short-Time Fourier Series) features extraction. This paper proposes many extensions: (i) a larger spectral band up to the maximum frequency value for the analysis of the data is investigated, but requiring a higher dimensionality of the STFS feature vector; (ii) a more compact representation than the SFTS vector is investigated with the wavelet based approaches; (iii) the relevance of the wavelet based features are investigated using feature selection procedure. The results show that increasing the number of harmonics in STFS from 50 to 249 does not necessarily improve the CR because of the peaking phenomenon observed with high dimensionality. The wavelet cepstral coefficients (WCC) descriptor with 8 cycle time analysis windows presents a higher performance comparing to the STFS, discrete wavelet energy (DWE) and log wavelet energy (LWE) descriptors. Recommendations are also given for selecting wavelet family, the mother wavelet order within the family and the decomposition depth. It turns out that the Daubechies wavelet of order 4 and decomposition depth 6 (or Coiflet wavelet with order 2 and depth 7) is recommended in order to achieve the better CR values.
Download

Paper Nr: 76
Title:

Generic Fourier Descriptors for Autonomous UAV Detection

Authors:

Eren Unlu, Emmanuel Zenou and Nicolas Riviere

Abstract: With increasing number of Unmanned Aerial Vehicles (UAVs) -also known as drones- in our lives, safety and privacy concerns have arose. Especially, strategic locations such as governmental buildings, nuclear power stations etc. are under direct threat of these publicly available and easily accessible gadgets. Various methods are proposed as counter-measure, such as acoustics based detection, RF signal interception, micro-doppler RADAR etc. Computer vision based approach for detecting these threats seems as a viable solution due to various advantages. We envision an autonomous drone detection and tracking system for the protection of strategic locations. In this work, 2-dimensional scale, rotation and translation invariant Generic Fourier Descriptor (GFD) features (which are analyzed with a neural network) are used for classifying aerial targets as a drone or bird. For the training of this system, a large dataset composed of birds and drones is gathered from open sources. We have achieved up to 85.3% overall correct classification rate.
Download

Paper Nr: 99
Title:

Automatic Counting and Classification of Microplastic Particles

Authors:

Javier Lorenzo-Navarro, Modesto Castrillón-Santana, May Gómez, Alicia Herrera and Pedro A. Marín-Reyes

Abstract: Microplastic particles have become an important ecological problem due to the huge amount of plastics debris that ends up in the sea. An additional impact is the ingestion of microplastics by marine species, and thus microplastics enter into the food chain with unpredictable effects on humans. In addition to the exploration of their presence in fishes, researchers are studying the presence of microplastics in coastal areas. The workload is therefore time consuming, due to the need to carry out regular campaigns to quantify their presence in the samples. So, in this work a method for automatic counting and classifying microplastic particles is presented. To the best of our knowledge, this is the first proposal to address this challenging problem. The method makes use of Computer Vision techniques for analyzing the acquired images of the samples; and Machine Learning techniques to develop accurate classifiers of the different types of microplastic particles that are considered. The obtained results show that making use of color based and shape based features along with a Random Forest classifier, an accuracy of 96.6% is achieved recognizing four types of particles: pellets, fragments, tar and line.
Download

Paper Nr: 16
Title:

Image Features in Space - Evaluation of Feature Algorithms for Motion Estimation in Space Scenarios

Authors:

Marc Steven Krämer, Simon Hardt and Klaus-Dieter Kuhnert

Abstract: Image features are used in many computer vision applications. One important field of use is the visual navigation. The localization of robots can be done with the help of visual odometry. To detect its surrounding, a robot is typically equipped with different environment sensors like cameras or lidar. For such a multi sensor system the exact pose of each sensor is very important. To test, monitor and correct these calibration parameters, the ego-motion can be calculated separately by each sensor and compared. In this study we evaluate SIFT, SURF, ORB, AKAZE, BRISK, BRIEF and KAZE operator for visual odometry in a space scenario. Since there was no suitable space test data available, we have generated our own.
Download

Paper Nr: 17
Title:

Deep Learning Approaches towards Book Covers Classification

Authors:

Przemyslaw Buczkowski, Antoni Sobkowicz and Marek Kozlowski

Abstract: Machine learning methods allow computers to use data in less and less structured form. Such data formats were available only to humans until now. This in turn gives opportunities to automate new areas. Such systems can be used for supporting administration of big e-commerce platforms e.g. searching for products with inadequate descriptions. In this paper, we continue to try to extract information about books, but we changed the domain of our predictions. Now we try to make guesses about a book based on an actual cover image instead of short textual description. We compare how much information about the book can be extracted from those sources and describe in detail our model and methodology. Promising results were achieved.
Download

Paper Nr: 31
Title:

People Counting based on Kinect Depth Data

Authors:

Rabah Iguernaissi, Djamal Merad and Pierre Drap

Abstract: The people’s counting is one of the most important parts in the design of any system for behavioral analysis. It is used to measure and manage people’s flow within zones with restricted attendance. In this work, we propose a counting strategy for counting the number of people entering and leaving a given closed area. Our counting method is based on the use of depth map obtained from a Kinect sensor installed in a zenithal position with respect to the motion direction. It is used to count the number of people crossing a virtual line of interest (LOI). The proposed method is based on the use of two main modules a people detection module that is used to detect individuals crossing the LOI and a tracking module that is used to track detected individuals to determine the direction of their motions. The people detection is based on the design of a smart sensor that is used with both the grayscale image that represents depth changes and the binary image that represents foreground objects within the depth map to detect and localize individuals. Then, these individuals are tracked by the second module to determine the direction of their motions.
Download

Paper Nr: 32
Title:

Predicting Hospital Safety Measures using Patient Experience of Care Responses

Authors:

Michael A. Pratt and Henry Chu

Abstract: To make healthcare more cost effective, the current trend in the U.S. is towards a hospital value-based purchasing program. In this program, a hospital’s performance is measured in the safety, patient experience of care, clinical care, and efficiency and cost reduction domains. We investigate the efficacy of predicting the safety measures using the patient experience of care measures. We compare four classifiers in the prediction tasks and concluded that random forest and support vector machine provided the best performance.
Download

Paper Nr: 48
Title:

Transfer Learning for Handwriting Recognition on Historical Documents

Authors:

Adeline Granet, Emmanuel Morin, Harold Mouchère, Solen Quiniou and Christian Viard-Gaudin

Abstract: In this work, we investigate handwriting recognition on new historical handwritten documents using transfer learning. Establishing a manual ground-truth of a new collection of handwritten documents is time consuming but needed to train and to test recognition systems. We want to implement a recognition system without performing this annotation step. Our research deals with transfer learning from heterogeneous datasets with a ground-truth and sharing common properties with a new dataset that has no ground-truth. The main difficulties of transfer learning lie in changes in the writing style, the vocabulary, and the named entities over centuries and datasets. In our experiment, we show how a CNN-BLSTM-CTC neural network behaves, for the task of transcribing handwritten titles of plays of the Italian Comedy, when trained on combinations of various datasets such as RIMES, Georges Washington, and Los Esposalles. We show that the choice of the training datasets and the merging methods are determinant to the results of the transfer learning task.
Download

Paper Nr: 54
Title:

Building Robust Classifiers with Generative Adversarial Networks for Detecting Cavitation in Hydraulic Turbines

Authors:

Andreas Look, Oliver Kirschner and Stefan Riedelbauch

Abstract: In this paper a convolutional neural network (CNN) with high ability for generalization is build. The task of the network is to predict the occurrence of cavitation in hydraulic turbines independent from sensor position and turbine type. The CNN is directly trained on acoustic spectrograms, obtained form acoustic emission sensors operating in the ultrasonic range. Since gathering training data is expensive, in terms of limiting accessibility to hydraulic turbines, generative adversarial networks (GAN) are utilized in order to create fake training data. GANs consist basically of two parts. The first part, the generator, has the task to create fake input data, which ideally is not distinguishable form real data. The second part, the discriminator, has the task to distinguish between real and fake data. In this work an Auxiliary Classifier-GAN (AC-GAN) is build. The discriminator of an AC-GAN has the additional task to predict the class label. After successful training it is possible to obtain a robust classifier out of the discriminator. The performance of the classifier is evaluated on separate validation data.
Download

Paper Nr: 58
Title:

Localizing People in Crosswalks using Visual Odometry: Preliminary Results

Authors:

Marc Lalonde, Pierre-Luc St-Charles, Délia Loupias, Claude Chapdelaine and Samuel Foucher

Abstract: This paper describes a prototype for the localization of pedestrians carrying a video camera. The application envisioned here is to analyze the trajectories of blind people going across long crosswalks when following an accessible pedestrian signal (APS), in the context of signal optimization. Instead of relying on an observer for manually logging the subjects’ position at regular time intervals with respect to the crosswalk, we propose to equip the subjects with a wearable camera: a visual odometry algorithm then recovers the trajectory and spatial analysis can then determine to which extent the subject remained within reasonable boundaries while performing the crossing. Preliminary tests in conditions similar to a street crossing show that our results qualitatively agree with the physical behavior of the subject.
Download

Paper Nr: 63
Title:

Background-Invariant Robust Hand Detection based on Probabilistic One-Class Color Segmentation and Skeleton Matching

Authors:

Andrey Kopylov, Oleg Seredin, Olesia Kushnir, Inessa Gracheva and Aleksandr Larin

Abstract: In this paper we present a new method of hand detection in cluttered background for video stream processing. At first, skin segmentation is performed by one-class color pixel classifier which is trained using just a face image fragment without any background training sample. The modified version of one-class classifier is proposed. For each pixel it returns the grade (probability) of its belonging to the skin category instead of common binary decision. To adjust output of the one-class classifier the structure-transferring filter built on probabilistic gamma-normal model is applied. It utilizes additional information about the structure of an image and coordinates local decisions in order to achieve more robust segmentation results. To make a final decision whether an image fragment is the image of human hand or not, the method of binary image matching based on skeletonization is employed. The experimental study on segmentation and detection quality of the proposed method shows promising results.
Download

Paper Nr: 64
Title:

Self-Learning 3D Object Classification

Authors:

Jens Garstka and Gabriele Peters

Abstract: We present a self-learning approach to object classification from 3D point clouds. Existing 3D feature descriptors have been utilized successfully for 3D point cloud classification. But there is not a single best descriptor for any situation. We extend a well-tried 3D object classification pipeline based on local 3D feature descriptors by a reinforcement learning approach that learns strategies to select point cloud descriptors depending on qualities of the point cloud to be classified. The reinforcement learning framework learns autonomously a strategy to select feature descriptors from a provided set of descriptors and to apply them successively for an optimal classification result. Extensive experiments on more than 200.000 3D point clouds yielded higher classification rates with partly more reliable results than a single descriptor setting. Furthermore, our approach proved to be able to preserve classification strategies that have been learned so far while integrating additional descriptors in an ongoing classification process.
Download

Paper Nr: 66
Title:

Extracting Biomarkers from Dynamic Images - Approaches and Challenges

Authors:

Jakub Nalepa, Michael P. Hayball, Stephen J. Brown, Michal Kawulok and Janusz Szymanek

Abstract: Imaging technologies have developed rapidly over the past decade proving to be valuable and effective tools for diagnosis, evaluation and treatment of many conditions, especially cancer. Dynamic contrast enhanced imaging using computed tomography or magnetic resonance has been shown particularly effective and has been intensively studied to allow for assessing the vascular support of various tumours and other tissues. In this paper, we discuss current approaches and most important challenges in extracting markers from such dynamic images. These difficulties have to be resolved in order to ultimately improve patient care.
Download

Paper Nr: 68
Title:

Transferring Information Across Medical Images of Different Modalities

Authors:

Jakub Nalepa, Piotr Mokry, Janusz Szymanek and Michael P. Hayball

Abstract: Multimodal analysis plays a pivotal role in medical imaging and has been recognized as an established tool in clinical diagnosis. This joint investigation allows for extracting various bits of information from images of different modalities, which can complement each other to provide a comprehensive view to the patient case. Since those images may be acquired using different protocols, their synchronization and transferring information, e.g., regions of interest (ROIs) between them is not trivial. In this paper, we derive the formulas for mapping ROIs between different modalities and show a real-life PET/CT example of such image processing.
Download

Paper Nr: 80
Title:

Relative Pose Estimation in Binocular Vision for a Planar Scene using Inter-Image Homographies

Authors:

Marcus Valtonen Örnhag and Anders Heyden

Abstract: In this paper we consider a mobile platform with two cameras directed towards the floor mounted the same distance from the ground, assuming planar motion and constant internal parameters. Earlier work related to this specific problem geometry has been carried out for monocular systems, and the main contribution of this paper is the generalization to a binocular system and the recovery of the relative translation and orientation between the cameras. The method is based on previous work on monocular systems, using sequences of inter-image homographies. Experiments are conducted using synthetic data, and the results demonstrate a robust method for determining the relative parameters.
Download

Paper Nr: 82
Title:

Hierarchical Electricity Demand Forecasting by Exploring the Electricity Consumption Patterns

Authors:

Yue Pang, Chaoyi Jin, Xiangdong Zhou, Naiwang Guo and Yong Zhang

Abstract: Accurate electricity demand forecasting is necessary to develop an efficient and sustainable power system. Total demand of the whole region can be disaggregated at different levels, thus producing a hierarchical structure. In the hierarchical demand forecasting, the prediction accuracy and aggregate consistency between levels are two important issues, however in the previous works the prediction accuracy is often affected by conducting the aggregate consistency. In this work, we propose a novel pattern-based hierarchical time series forecasting (PHF) method which consists of two aggregation stages. In the first aggregation stage, by exploring the electricity consuming patterns with clustering method, the bottom level electricity demand forecasting is improved, and in the second stage the region level aggregation is conducted to achieve the whole level forecasting. The experiments are conducted on the Energy Demand Research Project (EDRP) datasets, and the experimental results show that compared with the previous state-of-the-art methods, our method improves the prediction accuracy in all hierarchical levels with keeping aggregation consistency.
Download

Paper Nr: 84
Title:

Exploring BIM Data by Graph-based Unsupervised Learning

Authors:

Chaoyi Jin, Minyang Xu, Lan Lin and Xiangdong Zhou

Abstract: This paper presents an unsupervised learning method for mining the Industry Foundation Classes (IFC) based Building Information Modelling (BIM) data by exploring the inter-relational graph-like building spaces. In our method, the affinity propagation clustering algorithm is adapted with our proposed feature extraction algorithm to get exemplars of certain spaces with similar usage functions. The experiments are conducted on a real world BIM dataset. The experimental results show that some build spaces of typical usage functions can be discovered by our unsupervised learning algorithm.
Download

Paper Nr: 85
Title:

Electricity Consumption Model Analysis based on Sparse Principal Components

Authors:

Bo Yao, Yiming Xu, Yue Pang, Chaoyi Jin, Zijing Tan, Xiangdong Zhou and Yun Su

Abstract: The well-being of people, industry and economy depends on reliable, sustainable and affordable energy. The analysis on energy consumption model, especially on electricity consumption model, plays an important role in providing guidance that makes energy system stable and economical. In this paper, clustering based on electricity consumption model is imposed to categorize consumers, and Sparse Principal Components Analysis (SPCA) is employed to analyse electricity consumption model for each group clustered. Experimental results show that our methods can automatically divide a day into peak times and off-peak times, so as to reveal in detail the electricity consumption model of different types of consumers. Additionally, we study the relationships between social background of consumers and their electricity consumption model. Our experimental results show that social background of consumers has impact on their consumption model, as expected, but cannot fully determine it.
Download

Paper Nr: 91
Title:

Comparative Study of the Behavior of Feature Reduction Methods in Person Re-identification Task

Authors:

Bahram Lavi, Mehdi Fatan Serj and Domenec Puig Valls

Abstract: One of the goals of person re-identification systems is to support video-surveillance operators and forensic investigators to find an individual of interest in videos acquired by a network of non-overlapping cameras. This is attained by sorting images of previously observed individuals for decreasing values of their similarity with a given probe individual. Existing appearance descriptors, together with their similarity measures, are mostly aimed at improving ranking quality. Many of these descriptors generate a high feature vector represented as an image signature. To tackle person re-identification in real-world scenario the processing time will be crucial, so an individual of interest within a network camera should be found out swiftly. We therefore study some feature reduction methods to achieve a significant trade-off between processing time and ranking quality. Although, observing some redundancies on the generated patterns of a given descriptor are not deniable, we suggest to employ a feature reduction method before use of it in real-world scenarios. In particular, we have tested three reduction methods: PCA, KPCA, and Isomap. We then evaluate our study on two benchmark data sets (VIPeR, and i-LIDS), by using two state-of-the-art descriptors on person re-identification task. The results presented in this paper, after applying the feature reduction step, are very promising in terms of recognition rate.
Download

Paper Nr: 92
Title:

Traffic Signs Recognition and Classification based on Deep Feature Learning

Authors:

Yan Lai, Nanxin Wang, Yusi Yang and Lan Lin

Abstract: Traffic signs recognition and classification play an important role in the unmanned automatic driving. Various methods were proposed in the past years to deal with this problem, yet the performance of these algorithms still needs to be improved to meet the requirements in real applications. In this paper, a novel traffic signs recognition and classification method is presented based on Convolutional Neural Network and Support Vector Machine (CNN-SVM). In this method, the YCbCr color space is introduced in CNN to divide the color channels for feature extraction. A SVM classifier is used for classification based on the extracted features. The experiments are conducted on a real world data set with images and videos captured from ordinary car driving. The experimental results show that compared with the state-of-the-art methods, our method achieves the best performance on traffic signs recognition and classification, with a highest 98.6% accuracy rate.
Download

Paper Nr: 93
Title:

Feature-based Analysis of Gait Signals for Biometric Recognition - Automatic Extraction and Selection of Features from Accelerometer Signals

Authors:

Maria De Marsico, Eduard Gabriel Fartade and Alessio Mecca

Abstract: Gait recognition has been traditionally tackled by computer vision techniques. As a matter of fact, this is a still very active research field. More recently, the spreading use of smart mobile devices with embedded sensors has also spurred the interest of the research community for alternative methods based on the gait dynamics captured by those sensors. In particular, signals from the accelerometer seem to be the most suited for recognizing the identity of the subject carrying the mobile device. Different approaches have been investigated to achieve a sufficient recognition ability. This paper proposes an automatic extraction of the most relevant features computed from the three raw accelerometer signals (one for each axis). It also presents the results of comparing this approach with a plain Dynamic Time Warping (DTW) matching. The latter is computationally more demanding, and this is to take into account when considering the resources of a mobile device. Moreover, though being a kind of basic approach, it is still used in literature due to the possibility to easily implement it even directly on mobile platforms, which are the new frontier of biometric recognition.
Download