多边形网格
计算机科学
计算机图形学(图像)
计算机视觉
人工智能
语音识别
作者
Ariel ALVAREZ-MARTíNEZ,José Javier López Monfort
摘要
The head-related transfer function (HRTF) describes how a human receives sound from different directions in space. It is unique to each listener. The best way to obtain an individualized HRTF is through direct measurement, but it requires sophisticated laboratory equipment and long measurement times. This work proposes an approach for HRTF individualization employing ear images from 3D meshes using a deep neural network and spherical harmonics transform (SHT). The method relies on the HUTUBS dataset, including 3D meshes and HRTFs. The model uses ear images to predict a low-dimensional representation of the HRTF. Initially, ten images of the right ear of each 3D mesh are taken in different positions. The model consists of two main parts. The first is a convolutional neural network (CNN), which is used to extract features from the ear images. The second part learns from the feature map obtained and predict the spherical harmonic coefficients. Finally, the individualized HRTF is obtained through inverse SHT. The performance of the method is evaluated by computing the log-spectral distortion (LSD) between the measured HRTF and the predicted one. The results show favorable LSD values compared to other models addressing the same problem.
科研通智能强力驱动
Strongly Powered by AbleSci AI