Recently, smart devices equipped with microphones have become increasingly popular in people's lives. However, when users type on a keyboard near devices with microphones, the acoustic signals generated by different keystrokes may leak the user's privacy. This paper proposes a robust side-channel attack scheme to infer keystrokes on the surrounding keyboard, leveraging the smart devices' microphones. To address the challenge of non-cooperative attacking environments, we propose an efficient scheme to estimate the relative position between the microphones and the keyboard, and extract two robust features from the acoustic signals to alleviate the impact of various victims and keyboards. As a result, we can realize the side-channel attack through acoustic signals, regardless of the exact location of microphones, the victims, and the type of keyboards. We implement the proposed scheme on the commercial smartphone and conduct extensive experiments to evaluate its performance. Experimental results show that the proposed scheme could achieve good performance in predicting keyboard input under various conditions. Overall, we can correctly identify 91.2% of keystrokes with 10-fold cross-validation. When predicting keystrokes from unknown victims, the attack can obtain a Top-5 accuracy of 91.52%. Furthermore, the Top-5 accuracy of predicting keystrokes can reach 72.25% when the victims and keyboards are both unknown. When predicting meaningful contents, we can obtain a Top-5 accuracy of 96.67% for the words entered by the victim.