The paper describes the process of descriptive statistical analysis of a training set of educational data extracted from distributed sources of blended learning environments. The determined variability in values has indicated the use of the histogram discretization method at the preprocessing phase. Estimation of classification models created over a discretized educational data set has confirmed the importance of descriptive statistical analysis in the case of creating a training set of data from distributed sources.