
With the advancement of AI and big-data, there are numerous methods now to identify each person from using only online traits. E.g.,
1) Computer Model
2) Browser Name
3) MAC address
4) Unique advertisement ID
5) Shopping habits (time, via an app, through browser)
6) IP address
7) Social media activity and behavior
8) Search engine queries
9) Location data
10) Device fingerprints
11) Biometric data (fingerprint, facial recognition, voice recognition)
12) Email and online account activity
13) Connected devices (smartphone, smartwatch, etc.)
14) Online purchase history and credit card information
15) Gaming and app usage history.
That’s why I used privacy focus browser and app tracking protection like DuckDuckGo but in the age of IT, privacy is increasing becoming a myth. Cambridge Analytica expose was an eye opener for me.
Bringing myself back to the realm of health informatics, to follow Arjan example,
Sex: Male
Education: MD
Occupation: Senior health financing consultant
Workplace: Community Partners International
Those would be very easy to identify me, someone can easily go to my organization website to look for my other identifiable data.
If someone have access to some survey’s raw data like national NCD survey (which I participate in), by combining the data (sex,education, occupation, workplace, and name), they could have access to my disease history and all sort of data.
Additionally, there are advances in profiling where some of the following non-identifiable data can be used.
1) Geographic location
2) Income level
3) Marital status
4) Ethnicity
5) Religion
6) Language spoken
7) Education level
8) Employment status
9) Health conditions
10) Hobbies and interests
11) Internet usage patterns
12) Social media activity
So, from ethical standpoint, I believe researchers have to code the identifiable data as well as encrypt the RAW data, so even if unauthorized person had access to the data there could be barrier from the unauthorized person to access the data.