A study on predictive analytics for telecommunication network data forecasting
Loading...
Date
Publisher
BRAC University
Authors
Citation
Abstract
The telecommunications industry is experiencing rapid and exponential development,
driven by complex changes in the network infrastructure that mirror evolving
customer expectations. Despite a substantial body of research, with more than
6.3 million publications, telecommunications forecasting lacks standardized integration
between theoretical models and practical applications. This research presents
a systematic policy-driven framework that provides a structured solution to the
challenges of telecommunications forecasting throughout the entire data selection
process evaluation chain.
We analyze a large, real-world dataset spanning twelve months, from September
2022 onward, collected from multiple adjacent geographical regions with distinct
operational profiles. Area 1 operates 129 base stations in 643 cells, while Area 2
operates 99 base stations in 613 cells. In particular, despite its smaller footprint,
Area 2 delivers 25% greater downlink throughput and supports traffic volume 30%
above its local capacity requirements. The complete data set, which contains no
missing values for key performance indicators, provides unparalleled insight into
network performance under diverse operational scenarios.
Our study employs rigorous experimentation and analysis to uncover advanced findings
regarding both data pre-processing techniques and model performance evaluation.
The experimental results demonstrated that the Gated Recurrent Unit (GRU)
achieved superior user prediction results, with an R-squared of 0.929 and low error
ranges (MSE: 0.000174–0.012039, MAPE: 1.23%–14.00%). A Long Short-Term
Memory (LSTM) network achieved notable success in traffic forecasting, with an Rsquared
of 0.879 and a MAPE of 10.83%, outperforming traditional ARIMA models,
which consistently yielded negative R-squared values.
Among advanced fusion architectures, our innovative Hierarchical-LSTM proved superior,
achieving an exceptional R-squared of 0.9677 and minimal error rates (MSE:
0.0008, MAPE: 5.05%). This performance surpassed other sophisticated models,
including self-attention-based (R-squared: 0.9602) and Hybrid 2-Layer (R-squared:
0.956) approaches. These fusion models outperformed those that integrate ARIMA,
yielding negative R-squared values.
The analysis of data granularity yielded key findings essential for developing realworld
implementation strategies. Area-level forecasting consistently outperformed
lower-level (cell-level) forecasts in all metrics. Quantitative analysis revealed significantly
lower MSE values for area-level predictions: for user predictions (ARIMA:
0.009 versus 0.075 cell level, LSTM: 0.004 versus 0.030 cell level), throughput predictions
(ARIMA: 0.062 versus 0.122 cell level, LSTM: 0.028 versus 0.046 cell level),
and traffic predictions (ARIMA: 0.008 versus 0.059 cell-level, LSTM: 0.004 versus
0.015 cell level). Support Vector Machines demonstrated exceptional predictive capabilities
at the area level, with MSE values reaching 0.001 for user and traffic
prediction outcomes.
Our research evaluated the impact of combinations of feature variants on prediction
outcomes using novel correlation analysis methods. User count and traffic volume
exhibited the strongest synergistic relationship (correlation: 0.899, R-squared: 0.879
with the DL TRAFFIC MB variant). Although throughput maintained robust performance
across variants, the analysis revealed that using only throughput and user
count features resulted in poor forecast accuracy (R-squared: -5.630 and -3.748,
respectively), while incorporating additional characteristics improved accuracy (Rsquared:
0.889 baseline, 0.822 with the traffic variant).
Data preprocessing analysis identified optimal configuration parameters, with Min-
Max scaling achieving superior normalization performance (R-squared: 0.931) compared
to other techniques. Optimal model performance was achieved with a 42%
training data split. Prediction accuracy decreased substantially with 20% and 90%
split ratios (R-squared: 0.033 and 0.323, respectively). Investigating temporal patterns
demonstrated that weekly seasonality significantly improved prediction accuracy
across network elements (throughput R-squared: 0.923, user R-squared:
0.887). Systematic hyperparameter tuning revealed that the RMSprop optimizer
with a batch size of 32 outperformed the Adam optimizer with a batch size of 64
(R-squared: 0.767).
We introduce a novel framework that advances telecommunications forecasting by
uniquely combining theoretical methods with practical implementation knowledge.
Our framework integrates rigorous model selection processes with domain expertise
for dataset and feature selection, while implementing privacy-preserving techniques
to bridge the gap between abstract methodologies and real-world applications.
Through comprehensive data analysis, we revealed crucial relationships between user
behavior patterns and network traffic characteristics, establishing that multi-feature
approaches with area-level predictions achieve optimal results.
This comprehensive framework represents a significant advancement in telecommunications
forecasting, establishing theoretical foundations that enhance both academic
research and industry practice.
Description
Cataloged from PDF version of theses.
Includes bibliographical references (pages 99-103).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Includes bibliographical references (pages 99-103).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2025.
Publisher Link
Type
Thesis