Neural Representation for Videos (NeRV) encodes each video into a network, providing a promising solution to video compression. However, existing NeRV methods are limited to representing single-quality videos with fixed-size models. To accommodate varying quality requirements, NeRV methods need multiple separate networks with different sizes, resulting in additional training and storage costs. To address this, we propose a Quality Scalable Video Coding method based on Neural Representation, in which a hierarchical network consisting of a base layer (BL) and several enhancement layers (ELs) represents the same video with coarse-to-fine qualities. As the smallest subnetwork, the BL represents basic content. The larger subnetworks can be formed by gradually adding the ELs which capture residuals between the lower-quality reconstructed frames and original ones. Since the larger subnetworks share the parameters of the smaller ones, our method saves 40% of storage space. In addition, our structural design and training strategy enable each subnetwork to outperform the baseline on average +0.29 PSNR.