Brain-computer interfaces (BCIs) enable people to communicate with others or devices, and improving BCI performance is essential for developing real-life applications. In this study, a steady-state visual evoked potential-based BCI (SSVEP-based BCI) with multi-domain features and multi-task learning is developed. To accurately represent the characteristics of an SSVEP signal, SSVEP signals in the time and frequency domains are selected as multi-domain features. Convolutional neural networks are separately used for time and frequency domain signals to effectively extract the embedding features. An element-wise addition operation and batch normalization are applied to fuse the time and frequency domain features. A sequence of convolutional neural networks is then adopted to find discriminative embedding features for classification. Finally, multi-task learning-based neural networks are used to correctly detect the corresponding stimuli. The experimental results showed that the proposed approach outperforms EEGNet, multi-task learning-based neural networks, canonical correlation analysis (CCA), and filter bank CCA (FBCCA). Additionally, the proposed approach is more suitable for developing real-time BCIs compared to a system where the duration of an input is 4 seconds. In the future, utilizing multi-task learning to learn the characteristics of embedding features extracted from FBCCA may further improve the performance of the proposed approach.