Current Perspectives on Data Sharing and Open Science in Pharmacogenomics.

Deanne Nixie R Miao, Feryal Ladha,Sarah M Lyle, Daniel W Olivier,Samah Ahmed,Britt I Drögemöller

Clinical pharmacology and therapeutics(2023)

引用 0|浏览12
暂无评分
摘要
Genome-wide association studies (GWAS) summary statistics provide opportunities to accelerate human genomics research. In pharmacogenomics, assembling large, statistically powerful sample sizes is challenging, emphasizing the vital role of data sharing. To investigate data sharing practices in this field, we reviewed the availability of GWAS summary statistics in 593 pharmacogenomic GWAS articles and found that only 2% contained links to publicly available summary statistics. This highlights the need for improved data sharing initiatives within the pharmacogenomics community. In the last decade, GWAS have transformed the field of human genomics by identifying novel disease susceptibility genes and providing insights into their biological relevance.1 These studies primarily rely on high-throughput genotyping technologies and bioinformatic pipelines to identify associations between genetic variants and human phenotypes based on individual-level genetic data. However, many GWAS face limitations due to small sample sizes, resulting in reduced statistical power and limited generalizability. To address these challenges, pooling data from multiple studies can increase the power to detect genetic variants associated with traits across diverse ancestral groups.2 However, the pooling of individual-level data is not always feasible due to challenges related to data acquisition and management, including the establishment of strict data sharing plans to allow for the transfer of sensitive human genomics data across institutes and countries. Further, processing of individual-level genomic data is expensive, requires substantial storage space, and is time and computationally intensive.3 One promising alternative to using individual-level genotype data is the use of GWAS summary statistics. In this approach, statistical measurements (e.g., sample size, effect size, standard error, and P value) for each genetic variant are summarized for the population of interest.4 The use of summary statistics offers several advantages over individual-level data in GWAS. Summary statistics permit the analysis of large volumes of data without imposing a significant computational burden. The use of summary statistics also eliminates the need to disclose sensitive or personally identifiable information, which helps to protect the privacy of study participants. Moreover, summary statistics can be used to analyze data from multiple studies, which is particularly useful for conducting meta-analyses. These benefits lead to decreased resource utilization, and simplified data collection, facilitating analyses with increased sample sizes and enhanced statistical power. These factors lead to more robust, generalizable results, and improved detection of meaningful findings. Large-scale studies, such as the UK Biobank and FinnGen, have made their summary statistics publicly available, allowing researchers worldwide to perform various types of analyses (see Box 1 for examples). The diversity of the analyses that are made possible through access to these data maximizes the potential for the generation of meaningful results, ultimately leading to a more comprehensive understanding of the genetic basis of complex traits and diseases. GWAS represent useful tools in pharmacogenomics to identify clinically relevant genetic variants that are associated with treatment outcomes and results generated from these studies provide important insights into drug action biology.5 Given that findings from pharmacogenomic GWAS can be used to guide actionable changes to therapeutic treatments, these results can be more rapidly transferred to a clinical setting compared to the results obtained from GWAS investigating non-drug-related phenotypes.6 Although the benefits of pharmacogenomic GWAS are evident, unfortunately most pharmacogenomic GWASs are limited by small sample sizes due to difficulties recruiting patients who have been uniformly treated, which limits the power of these studies to uncover associations with genetic variants.5 The pooling of GWAS summary statistics for the same or similar drugs and corresponding phenotypes is therefore a key cost-effective strategy to address these limitations and will provide increased power to detect variants associated with pharmacogenomic traits.7 Similarly, whereas the use of PGS to stratify individuals into risk groups provide ideal opportunities to guide the selection of optimal treatments, the development of these scores necessitates large sample sizes to ensure accurate weighting of risk effects for individual variants.7, 8 Therefore, sharing of summary statistics will be essential to provide opportunities to create clinically useful PGS. Further, there are many examples of how summary statistics-based analyses (Box 1) have been successfully used in pharmacogenomic research.9 Therefore, the sharing of GWAS summary statistics in the field of pharmacogenomics will be essential to ensure maximal output from pharmacogenomic GWAS. Given the clear advantages of GWAS summary statistics data sharing for the advancement of research in the field of pharmacogenomics, we investigated the availability of these data in pharmacogenomic publications. To do this, we performed a literature search using the Ovid MEdline and EMBASE databases (July 8, 2022) to identify pharmacogenomic GWAS publications that were published between 2007 and 2022. We initiated our search from 2007, as this marked a pivotal point for the use of GWAS.1 Each article was screened by at least two reviewers to determine whether it should be included in the study (authors S.L., D.M., or F.L.). In cases where reviewers disagreed on whether an article should be included, a third reviewer (authors S.A. or B.D.) screened the article and a consensus call was made. Only articles that included a GWAS of a pharmacogenomic trait were included. This list of articles was compared to the list of articles from McInnes et al.5 to identify articles that were missed by the initial search, resulting in a total of 593 articles for inclusion in this investigation (Table S1). Ethical approval for this study was obtained from the University of Manitoba. Examination of the 593 publications revealed that 80 (13%) of the articles contained a data availability statement. A further three articles described data availability in the article but did not contain a data availability statement. Out of the 83 articles that described data availability, 35 (42%) provided links for data access (e.g., dbGaP), but only 14 provided a link to publicly available GWAS summary statistics (Table S1). To further explore the willingness of authors to share summary statistics, we emailed the corresponding authors of the remaining 579 articles (98% of the total publications) whose publications did not contain links to publicly available GWAS summary statistics. Out of 579 requests, 67 (12%) of the corresponding authors had email addresses that were no longer in use, and responses were absent from 332 (57%) authors. Among the 180 (31%) authors who responded, 32 (18%) shared their datasets. A further 11 (6%) authors were willing to share data, however, they provided incomplete data that were incompatible with downstream analyses. A total of 69 authors (38%) declined access, citing reasons which included inability to share data, data access agreement requirements, and challenges in data retrieval. Some authors expressed interest in sharing the data, but requested more information (n = 18, 10%) or directed us to alternative sources for access (n = 15, 8%), whereas others indicated the possibility of formal collaboration (n = 6, 3%) for data sharing. Last, a subset of 29 (16%) authors expressed willingness to share data but failed to do so by the time of this publication. These findings are summarized in Figure 1. The benefits of sharing GWAS summary statistics in the field of pharmacogenomics are clear. However, our investigations revealed that only 2% of pharmacogenomics GWAS publications included direct links to publicly accessible GWAS summary statistics. This lack of accessibility necessitates requests for data access, significantly hindering timely data analyses that could otherwise be facilitated by seamless access to these datasets. These delays in data analyses are further compounded by additional data-sharing requirements, which take time to establish. Although there remain significant gaps in data sharing practices in the field of pharmacogenomics, we did receive responses from 31% of authors who were contacted and were able to obtain summary statistics from nearly 8% of pharmacogenomic publications. This serves as positive evidence for the willingness of members of the pharmacogenomics community to collaborate. Indeed, many of the reasons that were provided as to why authors were unable to share the data were beyond the control of authors. For example, several authors were unable to retrieve data from articles that were published several years ago, emphasizing the urgency of mandating links to publicly available data at the time of publication. In light of these findings, we make the following recommendations. First, authors should ensure that their summary statistics include the mandatory reporting elements described by MacArthur et al.10 For the pharmacogenomics community, the reporting of specific pharmacogenomic metadata is also important. This includes reporting the drug(s) under investigation, the phenotyping method used to assess safety or effectiveness, as well as covariates that were included in the GWAS analyses, for example, drug dosage, age, sex, and ancestry. To ensure ease of downstream analyses, data should be appropriately harmonized as described by the following online resource: https://bit.ly/GWAS_harmonization. These standardized data should be shared through established data sharing platforms such as the GWAS Catalog (https://www.ebi.ac.uk/gwas/), zenodo (https://zenodo.org/), and figshare (https://figshare.com/). Finally, publishers and funding organizations play a crucial role in promoting data sharing and open science initiatives. Therefore, in addition to authors taking the initiative to share their data, it is imperative that these organizations provide specific requirements and instructions for sharing of GWAS summary statistics. In conclusion, GWAS summary statistics provide ideal opportunities to maximize the utility of pharmacogenomics GWAS, while reducing challenges related to acquiring, managing, and analyzing individual-level genetic data. However, the lack of accessibility and standardization of these data in the field of pharmacogenomics limits their wide-scale use. Although there are many areas that could be improved, our investigations did reveal that the sharing of these data is becoming increasingly common (Figure 1) and there is a desire to work together to maximize the potential to generate meaningful results. It is essential that this trend in data sharing continues and that future studies include data availability statements with links to GWAS summary statistics to ensure that the field of pharmacogenomic can benefit from the numerous analyses that can be performed using these data. The results that will be enabled through these open science initiatives will maximize potential for the identification of clinically actionable findings. D.N.R.M. is supported by an MSc studentship funded by Research Manitoba and the NSERC Visual and Automated Disease Analytics Graduate Training Program. D.W.O. is supported by a National Research Foundation-Mitacs South Africa-Canada Globalink Postdoctoral Exchange Scheme Program. S.A. is supported by an MSc studentship funded by the NSERC Visual and Automated Disease Analytics Graduate Training Program. B.I.D. is supported by a CIHR Tier 2 Canada Research Chair in Pharmacogenomics and Precision Medicine. The authors declared no competing interests for this work. S.L.: Data collection, manuscript review. F.L.: Data collection and analyses, manuscript writing and review. D.N.M.: Data collection and analyses, manuscript writing and review. D.W.O.: Data collection, manuscript review. S.A.: Data collection, manuscript review. B.D.: Project conception and supervision, manuscript review. Table S1. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要