Deep-learning-based semantic segmentation networks typically incorporate object classification networks in their backbone. This leads to a loss of context because classification networks have a smaller field of view. The architecture has been extended to recover context with additional downsampling feature maps, a parallel context branch, or pyramid pooling modules after the backbone. However, these extensions increase multiply–accumulate operations and memory requirements, thus, making them unsuitable for resource-constrained devices. To overcome this limitation, a novel convolutional building block with attention-based context guidance is proposed. The block is repeated to build an efficient encoder–decoder network. Our network runs in real-time, has a lightweight design with only 0.72 Million parameters, and achieves 70.1%, and 66.3% mean intersection-over-union scores on the highly competitive Cityscapes and CamVid datasets, respectively. An efficient decoder is also designed to replace other semantic segmentation network decoders with minimal performance loss. The performance measures on mobile platforms show that our network suits resource-constrained devices. Further, experimental results show that the proposed method can optimally balance the model size-inference speed and segmentation accuracy.