Digging Into IPv6 Traffic to Google: Is 28% Deployment Really the Limit?

Voiced by Amazon Polly

By Christofer Flinta

Christofer Flinta

After some years of accelerating IPv6 deployment, we are now into a period of slower growth and it’s not clear where we are heading. It is therefore interesting to try to predict the future of IPv6 over the coming years. At Ericsson Research, we have been working on this topic since 2013, but just recently created a forecast model that seems to be quite accurate. However, it gives a disappointing message of a very low final level of IPv6 deployment at less than 30%!

The model is based on the commonly used data set, “percentage of users that access Google over IPv6”, provided by Google, from which we use the time series for native IPv6 traffic. We assume this data set can be used as an approximative indicator of global IPv6 deployment, even though some countries, like China, are not properly represented, due to national regulations. Figure 1 shows a recent snapshot of the Google data set from 2008 to 2019.

Figure 1. Percentage of users that access Google over native IPv6.

The data set is quite noisy, and we also want to avoid impact from the well-known intra-week periodicity and from variations at the start and end of the months. Therefore, we sample the data monthly, using an average of two weeks in the middle of each month.

We then create a growth model for the sampled data based on logistic growth. This type of model is common when describing the evolution of new technology, where there is an accelerating phase in the beginning and a decelerating phase at the end, forming an S-shaped curve that approaches a maximum level over time. The results from the model is shown in figure 2, where the predicted values are shown in red, and the real data is shown in blue. The last data point from May 2019 is at 24.5%.

Figure 2. Forecast of the percentage of users that access Google over native IPv6. The red curve indicates the predicted values, while the blue curve shows monthly sampled data.

We can see in the figure that the predicted S-curve fits the data set quite well. Currently, the model predicts a surprisingly low final level of IPv6 deployment at only around 28%. According to the forecast, the IPv6 share will grow slower over the coming years and be close to this estimated end level in late 2022.

The predicted curve can be interpreted as a single step of growth, going from zero to 28% over a 15-year period. This is a bit unexpected since there has been a lot of hope that IPv6 would replace IPv4 quite soon and then 100% would be the obvious asymptotic end level. If our model is correct, IPv6 will not replace IPv4 – or even be the dominant network protocol – in the foreseeable future!

Model history

Considering the strange forecast, how much can we trust this model? We don’t know, but the model has evolved in our lab and each time the fit of predictions to real data have become better. Let’s look at the model history.

The first model was created back in 2013 when IPv6 deployment had been growing with accelerating rate for some years. It was by then natural to create a model based on simple exponential growth. For a year, the predictions were quite accurate, but then the predicted and the real data started to deviate, so this model had to be abandoned. Also, from a theoretical point of view, exponential growth is not sustainable in the long run.

The next model was based on logistic growth, which is a commonly used model for all types of growth where there is an upper limit. In our first attempt, we expected the limit to be 100% and used that value as a fixed parameter in our prediction model.

However, the predictions from this model didn’t make a perfect fit either — the real data tended to oscillate around the predicted curve. As a fix, we added a sine-wave oscillator around the logistic-growth curve, estimated from the sinusoidal difference between the growth model and the data. The idea was that if there is some feedback mechanism in the market that creates an oscillation, the model should be able to catch it. Both models are shown in figure 3, where the pure logistic growth model is shown in green and the model with an added oscillator is shown in yellow.

Figure 3. Different growth models for the percentage of users that access Google over native IPv6. Green and yellow curves show a logistic model with a final level of 100%, without and with an added oscillation. Red curve shows a logistic model with an estimated end level of around 28%. Blue curve shows monthly data. 

This oscillating model seemed to work for some years, but at the beginning of 2018, the forecasts again started to deviate too much from real data, so also this model had to be abandoned. We had assumed growth in one big step from zero to 100%, but apparently, this is not a correct assumption. Furthermore, the sine-wave correction was just an ad-hoc fix, not based on any specific market mechanism. From figure 3 it is obvious that a model based on a single logistic step up to 100%, with or without sine wave corrections, is not compatible with the current growth trajectory of real IPv6 data, shown as a blue curve.

In 2018 we, therefore, decided to skip the idea with a fixed terminal level of 100%, but instead tried to fit the data to a pure logistic growth curve with a terminal level not known in advance. The end level is thus estimated from the data set. As we can see from the red curve in figure 3, the new model gives quite a good fit to historical data without any need for corrections — in fact, the previous oscillations can be fully captured by this logistic model having the end level at around 28% instead of at 100%.

Model stability

Our new model seems to be quite stable over time. Experiments were performed to see how long a training period is needed to get good predictions. For all experiments, the time series is split up in one training period and one test period, both with varying lengths. The training period always starts in September 2008 but ends in different months, while the test period is the remaining part of the data set.

It turns out that, for all experiments having the training period ending in any of the last 13 months (March 2018 or later), the predicted end levels are confined within a very small interval between 27.5% and 28.5%. Even shorter training periods give similar results, but with a larger spread — for training periods ending in any of the last 24 months (May 2017 and later), we get predicted end levels in the interval between 24% and 32%. Our conclusion is that during the last two years, the model is not very sensitive to the length of the training period, indicating a stable logistic growth, with a final level of around 28%.

The statistic metric R2 is consistently very high for all experiments during the last 13 months period — at 0.99 for the training sets and around 0.85 for the test sets. R2 can be interpreted as how large part of the variance of the data set a model is explaining, where a higher value is better. The high values in our experiments indicate a good fit of the model to the data.

We also tried to see if the sampling method affected the predicted final level, but it seems to be quite independent of data points being sampled daily, monthly or quarterly.

The future

So, this is it? Will the great hope for the future of Internet be stalling at a mere 28% of global deployment? Perhaps this is not the end of the story – there is always a possibility that the growth of IPv6 takes place in steps, like the evolution of many other technologies. One scenario is that, after a couple of years with an IPv6 deployment level of around 28%, there might be a start of a new period of accelerating growth, leveling out at a higher percentage. So far, there is no sign of any such next step, but even if there would soon be a new boost of IPv6 rollout, we will probably have to wait for a long time before IPv6 deployment is getting close to 100%.

By Christofer Flinta, Senior Researcher at Ericsson Research


Leave a Reply