For hierarchical multi-label classification tasks, performance measures play a critical role in model selection and evaluation, which are complicated by the rich hierarchical structure. Several evaluation measures have been proposed without however taking both semantic similarity and hierarchical structure into account. Thus, this study reviews the use of hierarchical measures previously mentioned in academia and introduces a new metric that considers both hierarchical structure features and category semantic features. We demonstrate the efficacy of this metric in terms of consistency and discriminancy.