In the context of Industry 4.0, a transformative shift in industrial manufacturing, product enhancement, and distribution methods has been observed, emphasizing the critical need for precise recognition of human intention to ensure operational reliability, safety, and efficiency.Central to this recognition, especially in equipment manufacturing, is the accurate identification of tools manipulated by human operators.In this study, a novel object detection model, referred to as 'Industry-RetinaNet', has been proposed for advanced tool detection.Improvements upon the conventional RetinaNet are evident in the form of optimized anchor box shapes derived from advanced anchor generation techniques, an augmented number of detection boxes, and the reinforcement of an alternate backbone architecture.When validated against a test dataset, the model demonstrated notable performance metrics with an F1-score of 0.904, an mAP of 0.903, and a recall of 0.809, while preserving real-time processing capabilities.It is anticipated that the implementation of this methodology will pave the way for improved interpretation of worker intentions, potentially enhancing overall efficiency in the burgeoning arena of intelligent factories.