Abstract: |
Sensor-based human activity recognition (HAR) requires solutions to several problems such as complexities in activity recognition, sensor modality selection, and multi-subject analysis. Machine learning techniques are essential for extracting and learning knowledge from raw sensor signal data. Researchers applied traditional machine learning algorithms such as Support Vector Machines, Decision Tree, and Random Forest [3]. They perform well when only limited amount of labeled data is available and there exists domain-specific knowledge. However, they heavily rely on hand-crafted feature engineering and are constrained by domain knowledge. It limits their generalization accuracy. In contrast, deep learning methods overcome these limitations due to its ability of automatic feature extraction from raw sensor data. Deep neural networks (DNNs) can extract both low-level and high-level features, which enables them to address complex activity recognition tasks and multi-user scenarios. We propose a deep learning architecture designed to improve the efficiency and accuracy of sensor-based HAR systems. It comprises two key modules: the Multilevel Convolutional Network (MultiConvNet) and Transformer Encoder (TransEncoder). The MultiConvNet employs a multilevel convolutional architecture to extract and fuse deep features from multimodal sensor signals. Hence, it can capture low-level and high-level information from the various sensor data. Embedded feature sequences are given to the TransEncoder which captures the global features and constructs long-term dependencies. We compare the performance of the proposed model with the state-of-the-art methods using six public datasets, i.e., UCI-HAR, MotionSense, HAPT, KU-HAR, SHL2018, and PAMAP2 and show that the proposed model achieves better performance than baselines for wide range of data. We conduct ablation experiments and show that (1) the combination of MultiConvNet and TransEncoder improves the performance compared with each module alone, and (2) the proposed model effectively utilizes the multiple types of sensors. We also conduct preliminary evaluation on processing efficiency and the proposed model has comparable processing efficiency with the state-of-the-art methods. |