A study on predictive analytics for telecommunication network data forecasting

Loading...
Thumbnail Image

Publisher

BRAC University

Citation

Abstract

The telecommunications industry is experiencing rapid and exponential development, driven by complex changes in the network infrastructure that mirror evolving customer expectations. Despite a substantial body of research, with more than 6.3 million publications, telecommunications forecasting lacks standardized integration between theoretical models and practical applications. This research presents a systematic policy-driven framework that provides a structured solution to the challenges of telecommunications forecasting throughout the entire data selection process evaluation chain. We analyze a large, real-world dataset spanning twelve months, from September 2022 onward, collected from multiple adjacent geographical regions with distinct operational profiles. Area 1 operates 129 base stations in 643 cells, while Area 2 operates 99 base stations in 613 cells. In particular, despite its smaller footprint, Area 2 delivers 25% greater downlink throughput and supports traffic volume 30% above its local capacity requirements. The complete data set, which contains no missing values for key performance indicators, provides unparalleled insight into network performance under diverse operational scenarios. Our study employs rigorous experimentation and analysis to uncover advanced findings regarding both data pre-processing techniques and model performance evaluation. The experimental results demonstrated that the Gated Recurrent Unit (GRU) achieved superior user prediction results, with an R-squared of 0.929 and low error ranges (MSE: 0.000174–0.012039, MAPE: 1.23%–14.00%). A Long Short-Term Memory (LSTM) network achieved notable success in traffic forecasting, with an Rsquared of 0.879 and a MAPE of 10.83%, outperforming traditional ARIMA models, which consistently yielded negative R-squared values. Among advanced fusion architectures, our innovative Hierarchical-LSTM proved superior, achieving an exceptional R-squared of 0.9677 and minimal error rates (MSE: 0.0008, MAPE: 5.05%). This performance surpassed other sophisticated models, including self-attention-based (R-squared: 0.9602) and Hybrid 2-Layer (R-squared: 0.956) approaches. These fusion models outperformed those that integrate ARIMA, yielding negative R-squared values. The analysis of data granularity yielded key findings essential for developing realworld implementation strategies. Area-level forecasting consistently outperformed lower-level (cell-level) forecasts in all metrics. Quantitative analysis revealed significantly lower MSE values for area-level predictions: for user predictions (ARIMA: 0.009 versus 0.075 cell level, LSTM: 0.004 versus 0.030 cell level), throughput predictions (ARIMA: 0.062 versus 0.122 cell level, LSTM: 0.028 versus 0.046 cell level), and traffic predictions (ARIMA: 0.008 versus 0.059 cell-level, LSTM: 0.004 versus 0.015 cell level). Support Vector Machines demonstrated exceptional predictive capabilities at the area level, with MSE values reaching 0.001 for user and traffic prediction outcomes. Our research evaluated the impact of combinations of feature variants on prediction outcomes using novel correlation analysis methods. User count and traffic volume exhibited the strongest synergistic relationship (correlation: 0.899, R-squared: 0.879 with the DL TRAFFIC MB variant). Although throughput maintained robust performance across variants, the analysis revealed that using only throughput and user count features resulted in poor forecast accuracy (R-squared: -5.630 and -3.748, respectively), while incorporating additional characteristics improved accuracy (Rsquared: 0.889 baseline, 0.822 with the traffic variant). Data preprocessing analysis identified optimal configuration parameters, with Min- Max scaling achieving superior normalization performance (R-squared: 0.931) compared to other techniques. Optimal model performance was achieved with a 42% training data split. Prediction accuracy decreased substantially with 20% and 90% split ratios (R-squared: 0.033 and 0.323, respectively). Investigating temporal patterns demonstrated that weekly seasonality significantly improved prediction accuracy across network elements (throughput R-squared: 0.923, user R-squared: 0.887). Systematic hyperparameter tuning revealed that the RMSprop optimizer with a batch size of 32 outperformed the Adam optimizer with a batch size of 64 (R-squared: 0.767). We introduce a novel framework that advances telecommunications forecasting by uniquely combining theoretical methods with practical implementation knowledge. Our framework integrates rigorous model selection processes with domain expertise for dataset and feature selection, while implementing privacy-preserving techniques to bridge the gap between abstract methodologies and real-world applications. Through comprehensive data analysis, we revealed crucial relationships between user behavior patterns and network traffic characteristics, establishing that multi-feature approaches with area-level predictions achieve optimal results. This comprehensive framework represents a significant advancement in telecommunications forecasting, establishing theoretical foundations that enhance both academic research and industry practice.

Description

Cataloged from PDF version of theses.
Includes bibliographical references (pages 99-103).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.

Publisher Link

Type

Thesis