Missing data: Standardized data collection tools should be used across the EHR and make mandatory entry for essential variables.
Selection bias: A representative sampling should be used across the study. Randomization and control trails (if possible but will be resource intensive for large scale survey) should also be used for comparative study
Data analysis and training: Domain knowledge is a must for researchers and enumerators for better understanding of the meaning of the data and interpretation. Knowledge on statistical methodologies are also important factor for hypothesis testing and statistical modelling.
Privacy and ethnical issue: Most of the researchers make the data to be anonymous. However, consent from patients are not usually taken especially for big data analysis and use. Getting consent should be mandatory to ensure the data will be used for the sake of good and to ensure the patients understanding on what they are agreeing to.
Additional suggestions: In addition to above areas mentioned in the literature, I would suggest the quality of data should be one of the major challenges for bid data analysis. Accuracy, consistent and completeness (similar to missing data) are crucial for data governance otherwise the result be garbage in and garbage out.