In many everyday situations, our senses are bombarded by many different unisensory signals at any given time. To gain the most veridical, and least variable, estimate of environmental stimuli/properties, we need to combine the individual noisy unisensory perceptual estimates that refer to the same object, while keeping those estimates belonging to different objects or events separate. How, though, does the brain "know" which stimuli to combine? Traditionally, researchers interested in the crossmodal binding problem have focused on the roles that spatial and temporal factors play in modulating multisensory integration. However, crossmodal correspondences between various unisensory features (such as between auditory pitch and visual size) may provide yet another important means of constraining the crossmodal binding problem. A large body of research now shows that people exhibit consistent crossmodal correspondences between many stimulus features in different sensory modalities. For example, people consistently match high-pitched sounds with small, bright objects that are located high up in space. The literature reviewed here supports the view that crossmodal correspondences need to be considered alongside semantic and spatiotemporal congruency, among the key constraints that help our brains solve the crossmodal binding problem.