The objective of this article is to understand the role of the digital world and the science of big data analysis in addressing the current emergency of COVID-19. Before starting this review, however, it is necessary to clarify some general definitions, which may not be so clear to our regular readers.
Big Data is the name given to “a very large set of data that is produced by people using the Internet and can only be stored, understood and used with the help of special tools and methods”. If we wanted to make an example of Big Data these are the posts on Facebook, the data we generate every time we make an online purchase, but also, in simpler terms, everything that is stored on a virtual space. For example, virtual medical records and all epidemiological databases are an important source of Big Data.
Data Science, or Human Data Science, as it is now called, is “the use of scientific methods to obtain useful information from computer data, in particular from large amounts of data”. Data scientists are those who set up research algorithms to analyze the enormous amount of data that we generate every day, who are able to read the results of this research thanks to their expertise in the field of research, and communicate the data and responses generated by these systems to a group of decision-makers. If we think about the world of health, data scientists are teams of people with different backgrounds including statisticians, mathematicians, programmers, biologists and doctors.
Artificial Intelligence (or AI), on the other hand, is “the study of how to produce machines that have some of the qualities that the human mind has, such as the ability to understand language, to recognize images, to solve problems, and to learn to use them.” This term is used above all to define here processes in which the computer is able to learn from the data that is entered into the system each time to be analyzed: the more the system is used the closer the answers will be to what is expected. For example, AI is now widely used for voice recognition tools, search engines, translation services, but also in the medical diagnostic field.
Now that the concepts behind this digital world are clearer, let’s go back to trying to understand how and if the world is using these tools in managing the COVID-19 emergency.
An article published online on February 20 by the Lancet looks at these concepts and their applications in dealing with the emergency, but looking on the web there are many references on this topic. To make some clarity I will try here to summarize the various areas of application with some examples.
Data Science and AI in the epidemiological field
One of the areas currently most cited as the application of digital tools in the management of COVID-19 is epidemiology. The Canadian company BlueDot was able to raise the alarm about this virus as early as December 2019, thanks to the analysis of languages and ways of expressing themselves used on social networks through a technique called Natural Language Processing (NLP) integrated with the analysis of the geo-location of users. This technique is also applied by other research institutes and by the Chinese authorities to identify cases of infection or to monitor the movements of residents or sick people in the “red zones”. This type of monitoring is possible thanks to our use of social networks, smartphones and e-tracking tools, however if in China this data is controlled by the government, in Western countries the data is fragmented in the databases of the different suppliers who sell the aggregated data to companies capable of analyzing it. In addition, not all people may have agreed to share their data or not all may have access to these technologies, which would result in difficult to understand grey areas.
Another use in this area is the daily monitoring of new cases in the world, the number of deaths and the number of hospitalizations. Thanks to platforms such as Healthmap or the one created by Johns Hopkins, it is possible to view contagions country by country and their spread. These tools are useful both to understand the epidemiology of the virus and to have time to implement security measures. One can see, for example, how the strategies adopted by China have led to a stabilization of new cases of infection, while for other states the growth continues to be exponential.
Data Science and AI in the diagnostic field
One area where these technologies are gaining more ground is diagnostics. The ability of machines to read and identify anomalies in diagnostic reports is becoming more and more accurate thanks to AI techniques and the reading of a lung CT scan can go from 15 minutes (the average time taken by a doctor) to 10 seconds as Kuan Chen, founder of Infervision, stated. Infervision, indeed, is a Chinese company that has developed a complete set of medical imaging solutions that can provide assisted imaging diagnostics for multiple areas of the body, including, but not limited to, brain, lungs, skeleton and bones. During this emergency they were able to convert their software to recognize pulmonary lesions due to COVID-19 and thus speed up diagnosis especially during emergency phases.
Data Science and AI in the therapeutic field
Also in the definition of new therapeutic strategies for the treatment of COVID-19 many companies are using these technologies to speed up the process of identifying molecular targets that may be targets for vaccines or new therapies.
In particular, the most active companies in this sector are:
- Insilico Medicine that identified six new molecules through digital software that could limit the virus’ ability to replicate;
- BenevolentAI and the Imperial College of London which, through their algorithms, have identified a target protein for this disease (the protein kinase 1) that is the target of a drug, baricitinib, already approved for rheumatoid arthritis;
- Mateon Therapeutics who has discovered some antiviral candidates and is particularly active in research also in association with universities and pharmaceutical companies.
These technologies have also been used in the identification of possible vaccines, but little has yet been leaked by companies actively working on this front. Surely these researches and applications have been possible thanks to the sharing of genomic coronavirus data on open-source platforms that facilitate the sharing of research results and international cooperation rather than competition. This in contrast to standard publication models through peer-reviewed journals that are now going through a potential crisis.
In conclusion, we can say that digital technology is certainly helping us in monitoring, diagnosing and identifying new therapies, and that sharing and transparency of information has become key requirements for the management of this emergency. However, these technologies could also be used to support government decision-making processes to predict how virus containment measures can impact on the economy and education and thus make risk-benefit analyses supported by data from even very distant sources. However, these technologies are not yet fully accepted in our country (Italy) and the data are still too fragmented into municipal, provincial and regional databases and servers to be used as they were in China. Suffice it to say that the Fascicolo Sanitario Elettronico (Italian digitalized clinical chart) has not yet been implemented in the whole Italian territory and that, even where it is present, few citizens make use of it, lacking a strategy at Italian level to understand the potential of this data repository. We hope, therefore, that this emergency can generate a new common feeling and accelerate the process of digitization of health care, not in terms of bureaucracy, as it is still often perceived, but as a concrete support both in the doctor-patient relationship and in national health management.