Data gravity: what does it mean for your data center choices?

With datasets growing fast and data increasingly being processed in real time, companies are facing new challenges.

With datasets growing fast and data increasingly being processed in real time, companies are facing new challenges. How do you combine data that is currently still in different clouds? Which data do you process real-time and so keep close to the location where you collect and use it? And which data is best stored centrally? These are all matters to consider when deciding on your IT strategy and, following on from that, what your organisation’s data center landscape should look like.


Analysis at the location where data is processed

NorthC COO Jarno Bloem notes that more and more companies are asking for advice in this area. “In order to properly understand what your IT strategy should look like, you first need to understand the impact of ‘data gravity’ on the type of data your organisation processes.” Data gravity is the growing capacity to exert a pull on applications and services. This has everything to do with the mass of data versus that of applications or services: in many cases, the datasets are much bigger than the applications or services that process that data, so it is more efficient to move the applications and services to the data rather than the other way around.

“The quantity of data will only increase in the years to come. At the same time, companies are looking to extract more and more value from that data. That means data gravity will become a significantly bigger factor. The location where you want to process data determines where you need processing power and analytical tools. In view of the issue of latency, you want to process real-time data ‘at the edge’, whereas you may prefer to process other data centrally, for example in a data lake.”


Latency, network costs and laws and regulations are key

The choice of which data you store and process where depends on several factors. The first is latency. “Take diagnostic images from a hospital, such as MRI scans and digital pathology images. If an oncologist uses an AI algorithm to compare the images of their patient with images from thousands of similar patients when making a diagnosis, latency does not play a role. It’s a different story with real-time applications. For instance, a camera monitoring the transition from a conveyor belt to a machine. The camera checks that the products enter the machine properly. If one falls over, the conveyor belt needs to be halted immediately. There is no time to first send the images from the camera to a data center hundreds of kilometres away, interpret the data there and feed the results back to the factory for further action. The latency that would entail would already have caused chaos by then”, says Bloem.

Network costs also play a role in the choice of data center location. If you are constantly sending all your data over long distances, network costs can quickly escalate – particularly now that more and more data-intensive IoT devices are being used, for example cameras that monitor production processes. For this reason, many organisations are opting for regional collocation data centers.

If the processing of data takes place in a public cloud, laws and regulations also come into the picture – because it is not always desirable or even permissible to store data in a public cloud if it is not clear which physical data center that data is stored in. In many cases, legislation requires that the data remain in the Netherlands or at least within the EU.

Data center landscape with four types of data centers

All of this means that infrastructure specialists will have to consider their data center strategies much more carefully: at which physical locations do I need data center capacity? The days when a single data center plus a backup data center were sufficient are truly behind us. Companies will have to consider per application which data they want to store and process where, depending on latency, network costs and laws and regulations. In practice, a landscape will often arise consisting of four types of data centers:

  • on–site: a server in a factory, in an oil field, on a farm or in a self-driving car;
  • micro data centers: in the coming years, networks of micro data centers will emerge with a distance of no more than ten kilometres between them. They will perform an ultra-local collocation role;
  • regional data centers: these are situated no more than 50 km from the location where the data is needed. These data centers offer greater scalability and set themselves apart in terms of security and high availability;
  • metro data centers: these are data centers that serve a very large region, for example the whole of the Netherlands or a conurbation in the US or China. They often house the local hubs of large cloud and content providers.In the Netherlands, there is little difference between regional data centers and metro data centers because the Netherlands is comparable to a conurbation in terms of size.

Hub role for regional data centers

More and more is expected of regional data centers. They no longer function only as a nearby alternative to an on-premises data center. They are also becoming nodes between different data center locations. This is because, on the one hand, companies need ultra local storage and processing of data at the edge in an on-site data center or micro data center, while on the other that data is located in metro data centers, for example data in a public cloud. The regional data center can link these two environments together. In doing so, the regional data center supplements its existing role as a nearby alternative to having your own data center with that of a node between different data locations. In other words, it has become a cloud and connectivity hub.

Organisations that have edge applications on a national scale will opt for a reliable network of regional data centers. This applies to companies with multiple factories at different sites across the country, for example, or to service providers that offer IoT services to organisations throughout the Netherlands. For example, security companies monitoring surveillance cameras for multiple customers.