Biobank features to consider

Access to samples and data from biobanks for companies is sometimes limited. Various factors contribute to this limitation, including legal restrictions, limited donor consent, the lack of a commercial setup in many biobanks, and the primary focus on academic research and healthcare. In cases where direct access is not possible, companies can collaborate with academic partners.

The size of cohorts within biobanks varies significantly, ranging from a few patients to hundreds of thousands. Specific features such as rare disease cohorts, population isolates, consanguinity prevalence, deep endophenotype profiles, and longitudinal electronic health information can guide companies in their search for the most suitable biobank.

Biological samples, ranging from blood and saliva to tissues and DNA, offer opportunities for the search of biomarkers, estimation of causality, and deeper insights into disease mechanisms. Molecular profiles, including whole genome sequencing, gene expression, and metabolomics, provide richer information for research on disease pathophysiology and drug discovery. The depth and source of phenotype data, such as lifestyle and disease information, vary among biobanks, with some offering more comprehensive data through physical evaluations or longitudinal questionnaires.

Consent limitations often exist for research and commercial use, but collaborating with academic partners can alleviate some of these constraints. Data sharing policies and regulations differ across countries, affecting the accessibility of samples and data. Actively used biobanks tend to have higher data quality and more commercial partnerships, while third-party support from specialized experts enables efficient processing and analysis of biobank data. The scientific track record of data custodians can also influence the attractiveness of a biobank for commercial collaborations.

Access for companies

Some biobanks and health data registries do not offer companies direct access to samples and data. There are several reasons for this limitation, such as:

Legal restrictions in some countries
Limited donor consent
Many biobanks and registries do not have a commercial set up
Most collections are built up for academic research and health care purposes only

When companies cannot access samples and data directly, they can collaborate with an academic partner for access – for example, the sample or data custodian (at the research group collecting the samples and/or data).

Cohort size

The cohort size varies widely between collections, from only a few patients to more than 500,000. Generally, collections with a unique patient or participant population with 1,000 individuals are often sufficient for most purposes. If the collection is based on a general patient population recruited from hospitals or dedicated recruitment centres, the cohort size is preferably more than 100,000.

Following biobank features can help you in your search for samples and data:

Rare disease cohort
- Population isolates
- Prevalence of consanguinity in society
- Deep endophenotype profiles
- Degree of clinically characterized patient populations
- Degree of longitudinal electronic health information

Biological samples

Biological samples allow for the search of predictive biomarkers, estimation of causality, and development of deeper mechanistic insights into disease pathophysiology. Most research biobanks only store blood samples, while clinical biobanks keep almost every piece of a patient. The more types of biological material a biobank stores, the more relevant it often is for companies. Here are examples of samples that are sought after by industry:

Whole blood
- Saliva
- Urine
- Stool
- Any tissue other than blood
- DNA extracted from whole blood
- Plasma
- Serum
- White blood fraction

Molecular profiles

Most biobanks have genetic profiles of their samples, but only a few offer other molecular profiles. Yet, the following molecular profiles might also be of interest when you are looking for the right biobank for you:

Whole genome sequencing data
- Whole exome sequencing data
- Genotyping array based genetic data
- Transcriptomes e.g., gene expression
- Proteomes; metabolomes
- Methylomes
- Metagenomes e.g., microbiomes from human sample
- Lipidomes

Biological data on samples in biobanks that have already been profiled on a molecular level are preferred when working on disease pathophysiology, development of biomarkers, and drug discovery. Molecular profiles offer direct readouts of biological processes and thus represent substantially more complex and deeper information compared to questionnaires or health records.

Source and depth of phenotype data

Most biobanks are cross-sectional, meaning they collect data from only a single point in time and all phenotype information is gathered during the recruitment process. In such settings, most lifestyle and disease data is self-reported, making it less valuable for research and development.

However, some biobanks perform regular physical evaluations at recruitment centers or retrieve lifestyle and health status updates from electronic databases. These biobanks have a more comprehensive overview of participants’ life trajectory, enabling insights into more sophisticated phenomena or behavioral patterns.

Other biobanks gather longitudinal data by sending questionnaires regularly to participants. However, this approach has several drawbacks, including low response rates and incomplete registration of answers when participants selectively answer questions. Nonetheless, in cases where the participant pool is large, such as 23&Me with more than 8 million clients, electronic surveys are the only feasible model, and even a very low response rate (around 1%) can provide close to 100,000 unique observations.

For a company to use biological samples and data from biobanks for research and development, it is essential that patients and participants have provided consent. However, this is often not the case, as consent for most collections is limited to a specific study or non-commercial use only. Nevertheless, if research and analytical steps are carried out by an academic partner and only the results are shared, some consent limitations may no longer apply.

It is estimated that there are over 50 million biological samples stored in biobanks in Europe alone. However, only a small fraction of these samples have proper consent for research purposes, and an even smaller fraction has longitudinal health data or the option for recall.

Data sharing policies vary from country to country, and access to biological samples and data can depend on the location of the requestor. Obtaining biological samples from certain countries, such as China and Russia, can be complicated. For example, authorization from the government or university senate is required to access the Estonian Biobank. In some countries, there are no limitations on access if the appropriate institutional review board (IRB) approvals are in place.

Digital data is generally less regulated than biological samples, but in countries such as Denmark, Iceland, and the UK, all data processing must be carried out on local servers, and the use of cloud services is not allowed. In general, European biobanks accept cloud providers, but the cloud warehouses must be physically located in Europe. For some projects, industry partners may prefer to work directly with raw data. In such cases, if the data cannot leave the biobank servers, secure and scalable IT solutions must be provided by the biobank.

Research activity

It is generally advised to look for biobanks that are actively used by the scientific community, as the data in these biobanks is often more structured and of higher quality. This is due to the more extensive harmonization, validation, and quality control processes that are typically carried out. Additionally, biobanks that are more actively used tend to have more commercial partnerships.

Third party support

When access to biobank data is limited, it is essential for the biobank to have a competent team of specialists to process and analyse the data on behalf of third parties. These specialists typically have expertise in statistics, bioinformatics, genetic epidemiology, informatics, and medical research. They can help identify relevant biomarkers, analyse complex molecular data, and provide insights into disease mechanisms. Additionally, they can work closely with industry partners to design custom analyses that meet their specific research needs.

Data custodians scientific track record

When data custodians, the individuals responsible for specific data, samples, or collections, have an outstanding scientific track record, the associated biobank is more likely to attract commercial partnerships. A strong scientific track record can demonstrate the quality and value of the data and samples in the biobank, and can give companies confidence in the potential for research and development using those resources.

Scientists at the company deCODE Genetics have published an unprecedented number of high-impact scientific reports and have attracted more than 1 billion USD over the past decade, which has helped to transform deCODE Genetics into a global leader in human genetics.