Cloud first and data first: how do you connect the data silos?

More and more companies want to become ‘data driven’. The number of initiatives to collect and analyse data is growing fast: IoT (Internet of Things), AI (artificial intelligence), ML (machine learning), etc. Moreover, there is a great need to analyse data from different sources and in different clouds in a coherent manner. But how do you connect the data in those different silos? And what is the appropriate data center strategy?

Data in lots of different clouds

Due to the popularity of the cloud, companies face a significant additional challenge: how to connect the data in the different clouds? A lot of data is locked up in SaaS applications – thanks to the popularity of Office 365 and Salesforce, for example. Plus there are more and more workloads running on IaaS infrastructures, such as Microsoft Azure and Amazon Web Services (AWS). And of course much of the data is still on-premises or in a private cloud in a colocation data center.

Challenges caused by data fragmentation

This fragmentation leads to a number of problems, says Alexandra Schless, CEO of NorthC. “Clients are not always aware what data is located where. That can include privacy-sensitive data belonging to customers or employees. Those companies therefore find it hard to comply with the GDPR.”

Moreover, IT departments struggle to guarantee that all the cloud and other databases met the same high standards in terms of quality, continuity and security. For instance, how many organisations make a back-up of Office 365?

On top of that, it is often hard to extract data from the different silos due to data gravity: the growing phenomenon of data pulling applications and services towards itself, simply because it is simpler to move that application or service to where the data is than the other way around.

Data lake in the cloud

Schless: “These three aspects mean that analytics works fine within an application, for example analyses of customer data in Salesforce. But it becomes a challenge to combine the data from your CRM system with, for example, the ERP data in a different cloud. You can combine the results of the separate analyses, but not the underlying datasets.”

For this reason, more and more firms are opting for a ‘data lake’: a central place where all the data is stored in an unstructured form. This makes it possible to analyse data from different sources together. In view of the enormous quantities of data in the data lake, it might seem obvious to put that data lake itself in the cloud. And, because you can’t easily move those enormous quantities of data, to put the analysis tools in the same cloud. Schless: “What many companies don’t realise is the network costs a strategy like that entail. After all, it involves moving very large datasets to the data lake. The question is whether there is always a good business case to back that up.”

Edge computing provides the answer

So it is no surprise that edge computing is on the rise. Why would you transport data that is collected somewhere far away from the central data center (for example, by IoT sensors) to a central location for processing if you can do it locally? For example, in a regional data center. There are a number of reasons to opt for local analysis in a regional data center. Firstly, for real-time applications, latency is an important criterion. The closer the data is processed to the source, the lower the latency. In addition, network costs are kept much lower when data is stored and processed close to the source, in a regional data center. This argument is becoming ever more important because the IoT is rapidly expanding with the addition of data-intensive devices such as surveillance cameras. And finally, laws and regulations mean it may not be desirable or even possible to store data in a central cloud. In many cases, legislation requires that the data remain in the Netherlands or at least within the EU. This is very much a factor in healthcare, for example, where due to the nature of the services they provide, organisations handle highly privacy-sensitive data that may not be sent abroad.

Some data still analysed centrally

At the same time, the desire to combine data from different sources remains. For this reason, CIOs (Chief Information Officers) and CDOs (Chief Digital Officers) are well advised to consider which data is and is not suitable for central processing. Which data do you want to analyse ‘at the edge’ so you can act on it there immediately? And which data do you first want to move to the lake because you can use it to optimise business processes or develop new functionalities and services? Schless: “If you are reading out data from a machine with IoT sensors, you don’t need to collect all that data centrally. Most of it you will want to analyse ‘at the edge’, in the immediate vicinity. Only the results of that data are stored – the rest of the information you no longer need. But if a machine goes wrong, you want to be able to use the data to establish the cause of the fault. That specific data does need to be stored and analysed centrally.

Think about the data center landscape

The combination of cloud first and data first means companies need to think about their data center landscapes. Because on the one hand they are dealing with data in lots of different SaaS, PaaS and IaaS silos, and on the other with data that ideally needs to be stored and processed as close as possible to the source in view of latency and network costs. This calls for a different data center strategy. NorthC’s response to this challenge is a regional, cloud and carrier-neutral data center that brings together both types of environments quickly and securely to provide a connectivity hub in the ecosystem that is now emerging.

Article