Abstract Language has been widely acknowledged as the benchmark of intelligence. However, evidence from cognitive science shows that intelligent behaviors in robust social interactions preexist the mastery of language. This review approaches human‐unique intelligence, specifically cooperation and communication, from an agency‐based theory of mind (ToM) account, emphasizing the ability to understand others' behaviors in terms of their underlying mental states. This review demonstrates this viewpoint by first reviewing a series of empirical works on the socio‐cognitive development of young children and non‐human primates in terms of their capacities in communication and cooperation, strongly suggesting that these capacities constitute the origin of human‐unique intelligence. Following, it reviews how ToM can be formalized as a Bayesian inference of the mental states given observed actions. Then, it reviews how Bayesian ToM can be extended to model the interaction of minds in cooperation and communication. The advantage of this approach is that non‐linguistic knowledge such as the visual environment can serve as the contextual constraint for multiple agents to coordinate with sparse and limited signals, thus demonstrating certain cognitive architectures underlying human communication. This article is categorized under: Applications of Computational Statistics > Psychometrics Statistical Models > Bayesian Models Statistical Models > Agent‐Based Models