Abstract Music is frequently used to establish atmosphere and to enhance/alter emotion in dramas and films. During music listening, visual imagery is a common mechanism underlying emotion induction. The present functional magnetic resonance imaging (fMRI) study examined the neural substrates of the emotional processing of music and imagined scene. A factorial design was used with factors emotion valence (positive; negative) and music (withoutMUSIC: script-driven imagery of emotional scenes; withMUSIC: script-driven imagery of emotional scenes and simultaneously listening to affectively congruent music). The baseline condition was imagery of neutral scenes in the absence of music. Eleven females and five males participated in this fMRI study. Behavioural data revealed that during scene imagery, participants’ subjective emotions were significantly intensified by music. The contrasts of positive and negative withoutMUSIC conditions minus the baseline (imagery of neutral scenes) showed no significant activation. When comparing the withMUSIC to withoutMUSIC conditions, activity in a number of emotion-related regions was observed, including the temporal pole (TP), amygdala, hippocampus, hypothalamus, anterior ventral tegmental area (VTA), locus coeruleus, and anterior cerebellum. We hypothesized that the TP may integrate music and the imagined scene to extract socioemotional significance, initiating the subcortical structures to generate subjective feelings and bodily responses. For the withMUSIC conditions, negative emotions were associated with enhanced activation in the posterior VTA compared to positive emotions. Our findings replicated and extended previous research which suggests that different subregions of the VTA are sensitive to rewarding and aversive stimuli. Taken together, this study suggests that emotional music embedded in an imagined scenario is a salient social signal that prompts preparation of approach/avoidance behaviours and emotional responses in listeners.