Emotion Recognition in Conversations has attained increasing interest in the natural language processing community. Many neural-network based approaches endeavor to solve the challenge of emotional dynamics in conversations and gain appealing results. However, these works are limited in capturing deep emotional clues in conversational context because they ignore the emotion cause that could be viewed as stimulus to the target emotion. In this work, we propose Causal Aware Interaction Network (CauAIN) to thoroughly understand the conversational context with the help of emotion cause detection. Specifically, we retrieve causal clues provided by commonsense knowledge to guide the process of causal utterance traceback. Both retrieve and traceback steps are performed from the perspective of intra- and inter-speaker interaction simultaneously. Experimental results on three benchmark datasets show that our model achieves better performance over most baseline models.