Semantic fusion of combining attributes or text with knowledge graph has recently shown great potential for the task of Zero-shot Learning (ZSL). However, there is still a problem to be solved that insufficient expression of semantic features in embedding. Especially in universal datasets with an extreme challenge, such as ImageNet, these datasets are required to recognize unseen classes without corresponding annotation. To make ZSL more applicable in the real world, we found that efficiently understanding and using the image content is often the key to distinguishing objects in human recognition patterns. Thus we propose a method with transforming the visual information into semantics to alleviate the semantic gap between image and semantic description. This method allows us to exploit the Common Sense Knowledge Graph based on the hierarchical structure and the visual graph based on visual correlation concurrently. Compared with several state-of-the-art methods, the proposed method has achieved good performance in ImageNet.