
Hi All,
Below is my suggestion on each challenge.
Missing data:
I agree with the paper that most of the data is not missing at random. It caused by the standard practice or patient refusal. If the proportion of the missing is quite low, I would personally start with the completed case analysis. If the proportion is relatively high, I would try the data analytic technique, for example, imputation. However, I would always keep in concern that the imputed data is not real one. The imputation could lead to the analytical bias. If there is the alternative datapoint that could be used and that datapoint is more completed, I would try that also.
Selection Bias:
As this paper mentioned about the selection bias due to the nature of data collection, I do agree with this statement. The selection bias could not be avoided when we obtained the data since we did not create it or collect it ourselves with the standard method. In my practice, I would perform preliminary test to get the overview of the data first. The basic statistical parameter should be calculated as well as the study of data distribution. This could give the analyst the big picture of the obtained data and sometimes we may see the weird pattern due the selection bias as well.
Data analysis:
The limitation of knowledge on big data analytic and the algorithm developed to handle big data has been discussed for a while. I suggestion that the researcher should be trained on the handle of big data along with the statistics. The refreshing training and update on the newly released algorithm should be provided on the regular basis.
Applicability of the results:
For this issue, I suggest that the analyst should provide the result and ensure the data processing transparency as much as possible. The complex algorithm is not generally acceptable in healthcare field since most of healthcare staff has little to no data literacy. The complex algorithm is the “black box” for them. In term of the reader, all of us should aware of the big data trend and the important of data literacy that we should seek for.
Privacy and Ethical issue:
For this issue, I suggest that the data owner should prioritize the data privacy and the data security on the top of all things. Data is the asset. The data owner should be aware of it and invest on the data security measures. Apart from that, the data owner should try their best to comply with the local law and regulation.