Software Development Skills for Health Data Researchers
BMJ Health & Care Informatics(2022)
Abstract
© Author(s) (or their employer(s)) 2022. Reuse permitted under CC BY. Published by BMJ. INTRODUCTION Health data researchers are increasingly required to develop complex analytic code in order to implement sophisticated analyses on large health datasets. While writing analysis scripts (box 1) for academic projects is distinct from general purpose software development, they share many of the same features. A researcher’s script usually consists of a sequence of commands executed by a computer to extract, reshape, clean, describe and analyse data. If the quality of this analytic code cannot be reasonably assured, then results cannot be trusted: programming errors have resulted in high profile retractions. Similarly, if lengthy scripts for data management cannot be reused, then work is needlessly duplicated. The software engineering community has developed a range of techniques to improve the quality, reusability, efficiency and readability of code. Organisations such as the Software Sustainability Institute support this approach to code development and provide more detailed guidance and education which are well worth reviewing. In this brief guide we explain how researchers can borrow best practices and freely available tools from this community to improve their work. We specifically cover the following three topics: Writing High Quality Code, Working Collaboratively and Sharing your work. Throughout the piece we often refer to examples from Python or R, two popular open source programming languages used by academics, but our advice is universal and there will be analogues to these examples in any commonly used statistical or general purpose programming language.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined