The future is in data capture, not machine learning

The adoption of artificial intelligence (AI) has accelerated since the pandemic hit as the whole world moved towards digitalization. A study from Oxford University and Yale University indicates that AI will surpass humans in many ways and automate all human tasks within the next 120 years. By 2024, AI will be better than humans at translation, write bestselling books by 2049, and perform surgery by 2053. Machine learning (ML), the skill of a machine to imitate the human ability to accumulate knowledge and use it to generate ideas, is generally considered the basis of AI.

Data is the engine of AI

Although AI may depend on its machine learning capabilities, we need to take a step back and realize that ML does not happen in a vacuum. ML is driven by Big Data, without which it cannot take place. In fact, therefore, AI is entirely dependent on the amount of data we can capture and the methods we use to process and manage it. For this reason, we need to pay more attention to capturing, transporting, processing and storing data if we are to realize the promise of AI in the future.

Data capture is key

Data capture is essential, whether for software-based AI applications, intelligent AI-based robots or machine learning. When AI products were initially designed, developers spent huge research and development resources to collect data on human behavior from both industry and consumer side.

In the field of health, many intelligent applications offer predictive analysis of prognoses and treatments. As these programs become progressively smarter, they could be made even more accurate by applying increased intelligence gathered from human data.

User data is essential for developing smarter technologies, whether software systems, hardware devices, IoT devices or home automation equipment. However, one of the most difficult aspects of capturing data in edge environments is transmitting it securely to a data center/cloud due to the threat of ransomware or virus attacks.

With data, more is more

Statista’s projections indicate that by the end of 2025, the world will potentially generate 181 zettabytes of data, a 129% increase from 79 zettabytes in 2021. This particularly applies to medical sciences, where various organizations collect massive amounts of data.

For example, data from the first Covid-19 vaccines administered helped determine the accuracy of doses for all age groups.

Similarly, we need more data to achieve greater accuracy and more efficient devices, whether for software, robotics or something else.
We also need more data from real edges, whether static or moving, and regardless of location, to be able to run AI and ML applications in a timely manner.

The future of AI will depend on capturing more data through real-time applications from edges such as a gas pipeline, a submarine in the ocean, a defense front, healthcare, IoT devices, satellites or rockets in space.

Data management challenges

To optimize AI for the future, we also need high-performance systems. These can be storage or cloud-based systems, processed by modern, data-intensive applications. The more data you feed these applications, the faster they can run their algorithms and deliver insights, whether for micro-strategy tools or business intelligence tools. This is commonly referred to as data mining, and in the past we did it by putting the data in a warehouse and then running applications to process it.

However, these methods are full of challenges. Data-generating devices now continuously generate ever-increasing amounts of information. Whether the source is self-driving vehicles or healthcare, and whether the platform is a drone or an advanced device, all are capable of generating greater amounts of data than ever before. So far, the data management industry has been unable to capture these quantities, whether through networks, 5G, the cloud, or any other storage method.

These circumstances led to the loss of 90% of the data collected due to insufficient storage capacity and the inability to process it quickly and transmit it to a data center. The findings also apply to critical data captured at remote sites that don’t have cloud connectivity or applications running at the edge.

Forward to the future

The more data we have, the better the performance of the AI. The more real-time information we can gather from real users in the field, the smarter we can make our AI devices. The more we can make AI applicable to use cases, the more we can make the human connection and the better we can solve user problems.

To date, much of the big data we generate goes unused, primarily because organizations cannot capture, transport, and analyze it fast enough to create real-time insights. It is essential for us to develop ways to solve these challenges, to allow us to reap the benefits of putting AI at the service of humanity.



The opinions expressed above are those of the author.