So far, most separation approaches of vital signs such as heartbeat and respiration, are implemented based on linear mixtures. However, some literatures have reported that non-linear mixtures actually occur in the associated applications, e.g., heart rate (HR) estimation with Doppler radar, where the simple linear demixing architecture may limit the effect of source separation. In addition, the human motions during HR measurement further complicate the mixing processes. The issue motivates us to exploit a more suitable separation approach to deal with contact-free HR estimation, considering non-linear mixtures including motions. A semi-supervised deep clustering (DC) is proposed to separate the three mixed sources of heartbeat, respiration, and motions, by segmenting the spectrogram of Doppler signal. First, through training a deep recurrent neural network (RNN) with long short-term memory (LSTM) via heartbeat/respiration-only data, the embeddings to each frame-sample from spectrogram can be acquired, which enables feature optimization in a lower dimensional space. Then, in the test phase, K-means clusters the embeddings associated with each source, to infer the masks used for spectrogram segmentation. The proposed deep clustering has three main strengths: It (i) gets rid of the restriction of mixture class, relying on data mining; (ii) can handle three-source mixtures by training two sorts of source-independent samples; (iii) only requires the mixtures from single-channel. The HR measurement experiments on subjects' sitting still and typing, validate the improvements of accuracy and robustness by our proposal, over some prevailing approaches in signal decomposition or separation.