Directed evolution, a strategy for protein engineering, optimizes protein properties (that is, fitness) by expensive and time-consuming screening or selection of a large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces an MLDE framework, cluster learning-assisted directed evolution (CLADE), which combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve the final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves global maximal fitness hit rates of up to 91.0% and 34.0% for the GB1 and PhoQ datasets, respectively, improved from the values of 18.6% and 7.2% obtained by random sampling-based MLDE. A machine learning-assisted directed evolution method is developed, combining hierarchical unsupervised clustering and supervised learning, to guide protein engineering by iteratively exploring the large mutational sequence space.