We present a lightweight neural network with attentive score loss for frame-wise personalized voice activity detection (i.e., AS-pVAD). Instead of using an external speaker embedding extractor with a large number of parameters, AS-pVAD employs a lightweight internal model to extract the target speaker embedding. A novel attentive score loss constraint is proposed to better exploit such embedding clues for pVAD compared to conventional embedding concatenation. Through joint training with a regular VAD, AS-pVAD can be further improved to identify the target speaker in the enrollment cases while it is able to function as a regular VAD in the enrollment-less cases. Experimental results show that AS-pVAD achieves over 0.9 of AUCROC on average in two-speaker talking scenario under various noisy and reverberant environments. Our test set is also publicly released to the community to facilitate the research in this area.