To cope with the challenges
1. Addressing Missing Data
Improve source data entry: Agree-on standardized variable and mandatory data entry field ensuring all essential data are filled.
Capacity building: sufficient training is provided for the assigned data entry staff.
Use alternative analyses for handling high missing values: for example, amputation techniques, mixed effects regression models, generalized estimating equation.
Use appropriate statistical methods based on the level of missingness (e.g. multiple imputation, mixed-effects models).
Conduct regular review sessions: data audits are regularly conducted to identify operational and systematic gaps to smoothen workflow and improve data quality.
2. Reducing Selection Bias
Use advanced data analytic methods: including propensity score analysis, instrumental variable analysis and Mendelian randomization.
Use big data mainly for hypothesis-generation: always check and validate with RCT or triangulate multiple studies to be used for clinical practice.
Ensure transparency: about inclusion/exclusion criteria and participants characteristics.
3. Strengthening Data Analysis Capacity
Build a team including experts with various skills for data handling: to handle very large datasets with multiplicity requiring multiple analyses to establish the significance of a hypothesis and identify correlations.
Build a multidisciplinary team: including clinicians, researchers, health informaticians, data scientists, statisticians and others.
Capacity building programs: for researchers including data science, health informatics, statistics and machine learning.
Standardize analytical protocols: to reduce multiple testing inappropriately and false positives.
Use validated algorithms and reproducible methods: for data analyses ensuring accuracy, transparency and ability to verify independently and therefore improve reliability of findings.
4. Improving Interpretation and Translational Use
Early involvement of relevant stakeholders: for example, involve clinicians from the beginning of the study (designing to interpretation of results) to ensure clinical relevance and produce actionable results.
Produce results in clinical usable/meaningful formats: focus to provide actionable insights, not complex ones.
Enhance documentation: standardized essential data variables and documentation to be interpreted and effectively used.
5. Managing Privacy and Ethical Issues
Data governance, oversight and data protection: regular audit trials with access control, encryption. Clear laws and policies should be in place to mitigate the breach of personally identifiable information.
Anonymize data: to reduce identification risks.
Minimize data: only necessary information should be provided and used by researchers to reduce data breach.
Ethical Board consent: to request consent from board members to balance privacy with public health benefits and ensure research is conducted ethically.
6. System-Level and Policy Solutions
Data sharing policy: ensure responsible data sharing across organisations and departments with clear regulatory and ethical safeguards.
Data standards: develop national standards for EHR to enhance interoperability, data quality and health information exchange.
Digital infrastructure: safeguarding infrastructure according to minimum standards to prevent cyberattacks and data breaches.
In conclusion, many adjunctive and robust procedures should be planned and implemented at each steps of data processing to make the greatest possible use of big data and improve public health.
