STIAS recently hosted Michal Linial and Samuel Jubé, respectively the directors of the Israel Institute for Advanced Studies and the Nantes Institute for Advanced Study, to discuss the programmes of the three institutes and explore opportunities for collaboration.
During their stay Michal Linial presented a public lecture to explain the notion of “Big Data”, and in particular its role in the future of human health. Michelle Galloway, media officer at STIAS, captured her presentation as follows.
“The future of health care is data driven,” said Michal Linial. “Our management of health and disease will depend to a huge extent on our ability to manage big data.”
Prof. Linial is the Director of the Israel Institute for Advanced Studies and the Sudarsky Center for Computational Biology at the Department of Biological Chemistry at The Hebrew University of Jerusalem. Prof. Linial presented a public lecture during her recent visit to STIAS.
In her discussion of the impact of Big Data on the future of human health, Prof. Linial pointed to both the tremendous opportunities and challenges that technological breakthroughs over the past decade in particular have presented to human existence.
Big Data refers to the volume and diversity of data accumulated in different domains. “The collection of datasets that are so large and complex that it is impossible to use hands-on data management or traditional data processing,” she explained.
“Such data are usually stored in large data farms and accessed by cloud computing.”
“Big Data is usually characterised by the three Vs – volume, velocity and variety. When it comes to volume we are talking about at least petabytes (1000 terabytes) – more than anyone could use in a lifetime. Velocity refers to the speed of the data flow, and variety to the fact that the data are accumulated from different sources.”
Datasets like this come with many subtasks including data capture, curation, storage, searching and extracting, sharing, transfer, analysis, querying, updating and, most importantly, privacy.
“All of these subtasks bring their own challenges. The quality and accuracy of the original data are obviously of huge importance,” said Prof. Linial. “But equally the storage (usually Cloud) must be secure, the speed of processing and infrastructure must be appropriate, extracting must be easy, and the updating must be dynamic.”
“Most importantly, data aren’t worth much if we cannot extract knowledge. We need analysis to extract value from the data, to bring data to knowledge and, especially in the case of health, to better inform decision making.”
“So there is a need for an additional 2 Vs – veracity which refers to the accuracy of the data, and value which refers to the knowledge we extract from it.”
“In the last two years the human population has accumulated more data than ever before in human history,” said Prof. Linial.
To give some perspective on the volume of data in circulation, Prof. Linial presented some figures from Google and Facebook. “Google processes an estimated 3.5 billion requests per day and Facebook collects 500 terabytes of data per day. There are an estimated 60 trillion individual pages in Google and by 2020, 40 zettabytes (a zettabyte = 1 billion terabytes) will be created by cloud technology.”
Tip of the iceberg
Of course, the estimates change all the time but what is clear is that we are at the tip of the iceberg in terms of the potential quantity of data available and the potential uses for such data even, in some cases, beyond the original intention.
Data can be used in surprising ways. Prof. Linial referred to examples where a restaurant-ranking application is used by the New York City Health Department to investigate cases of food poisoning; where Google keyword searching is used to predict regional flu outbreaks before the Centers for Disease Control can do so; where municipal trash-collection planning is based on a driver’s app that shows traffic congestion; and, even a case in the United States where a large store predicted a teen pregnancy before the family was aware of it due to changes in an individual’s clothing size purchases.
“The potential for effective use and also misuse is great. Concerns regarding privacy, the ownership of data, and the threat of misuse, remain unsolved even in the biomedical arena where it has to be on the frontline. However, big data carries tremendous opportunities,” said Prof. Linial.
Turning specifically to the use of big data in human health Prof. Linial said: “The volume and dimensionality of biomedical data are growing very fast. Sequencing of thousands of genomes and the exponential growth in molecular information, makes this a leading example of a data-intensive science with a wealth of opportunities for an impact on clinical practice.”
“Sequencing of proteins, genes and entire organisms has proceeded at a tremendous rate particularly over the last five years, along with the machines and technology to process it,” said Prof. Linial. “In 1953 the structure of DNA was discovered. Today the entire human genome can be depicted on one slide. “All sequenced DNA information is now available every day to the entire global community.”
“The information needed to change human health is literally in our hands.”
“But,” she warned. “We need to understand the complexity of the iceberg. We need big data analysis – without analysis we are not developing new understanding. We are just repeating information.”
“There is a need to build a collaborative setting that merges basic research with clinical expertise,” she continued. “An example is ELIXIR, which is a Pan-European project investigating the development of information infrastructure in the health sciences. ELIXIR is looking at uniting life-science organisations in Europe to manage and safeguard data generated by publicly funded research.”
What is needed is fit-for-purpose data. “We need to transform data for specific purposes and understand that one size doesn’t fit all. Data allow us to extract correlations and relationships, but we must bear in mind that correlation is not equivalent to causality, particularly in human health – although it may be a good starting point.”
“Information sharing is all about a very delicate balance,” said Prof. Linial. “If we look at the field of so-called orphan diseases as an example, information sharing via social media has sometimes been very beneficial. It has facilitated the linking of people with the same phenotype and allowed for the diagnosis of very rare diseases.”
Thinking in numbers for the future
Prof. Linial concluded her talk by making a specific plea for developing the field of quantitative thinking: “What is critical is awareness. Biology today is quantity. I believe quantitative thinking should be taught at the undergraduate level,” she said. “It should be a national language. We should be training people to think in a quantitative manner. It’s not well adopted in most educational settings. We introduce it at The Hebrew University to our first-year undergraduate students in their first class.”
Michelle Galloway: Part-time media officer at STIAS
Photo: Christoff Pauw