Here are my suggestions on coping with big health data challenges
1. Missing data
Some methods to deal with missing data are the following:
a) complete-case analysis (CCA)
– all persons with missing values on one or more variables are excluded from the analysis. This method has a lot of drawbacks and should be avoided in general because it generates unbiased results only in some situations
b) imputation
– replacement of missing data by real values
– multiple imputation is recommended over single imputation methods (mean imputation, imputation based on linear regression, and last value/observation carried forward) because most single imputation methods lead to an artificial decreased standard deviation in the variables to be analyzed, resulting in too small standard errors
– multiple imputation consists of three phases: imputation, analysis, and pooling
2. Selection bias
In a paper published by Rojas-Saunero et al. (2023), the following solutions can prevent selection bias in health research:
a) Clearly specify the target population
b) Collect primary data in a way that ensures accessibility for participants who are often marginalized. Ideally, all social groups should be recruited from the same source, rather than creating a distinct recruitment pipeline that draws from different populations to achieve diversity.
c) Design retention strategies to prevent differential loss to follow up
3. Data analysis and training
This can be solved by training the clinicians and researchers on informatics and tools for big health data analysis.
4. Interpretation and Translational Applicability of Results
Tools for visualization of big data such as Tableau, Microsoft Power BI, Google Looker Studio, and D3.js can be used to present big data into information that is easy to understand and interpret.
5. Privacy and Ethical Issue
Regulations on the use of data, for example the General Data Protection Regulation (GDPR) in Europe and Data Privacy Act in the Philippines, must be followed to ensure data privacy and confidentiality.
References:
Heymans, M. W., & Twisk, J. W. (2022). Handling missing data in clinical research. Journal of Clinical Epidemiology, 151, 185–188. https://doi.org/10.1016/j.jclinepi.2022.08.016
Rojas-Saunero, L. P., Glymour, M. M., & Mayeda, E. R. (2023). Selection Bias in Health Research: Quantifying, Eliminating, or Exacerbating Health Disparities? Current Epidemiology Reports, 11(1), 63–72. https://doi.org/10.1007/s40471-023-00325-z
Staff, C. (2025, April 11). Big Data Visualization tools: Types, benefits, and how to choose. Coursera. https://www.coursera.org/articles/big-data-visualization-tools-types-benefits-and-how-to-choose
