Daily observations of Climatic variables such as precipitation, maximum and minimum temperature over a large area exhibit typical spatio-temporal characteristics. Any simulation of these variables must be able to capture such properties. However, dynamical climate models (global or regional) or their surrogate models often miss some of these properties, such as spatial/temporal auto-correlation. In this paper we consider a deep generative approach that combines concepts from probabilistic graphical models, convolutional neural networks (CNN) and variational autoencoders (VaE). One component of this hybrid model is a graphical model based on Markov Random Field which creates a binary representation of the data, and also identifies homogeneous subregions within the concerned spatial field, based on which we do max-pooling operations as in CNN, and get a coarsened and binarized encoded representation of the original data. We train a dense multi-layer network that can reconstruct the original spatial data field from such a code. For simulation of daily spatial patterns, the temporal dynamics is modelled for the coarse binary code, analogous to a VaE. We experimentally show the merits of this hybrid approach for both generating new data and downscaling data from a climate model.