- This topic has 13 replies, 8 voices, and was last updated 6 months, 1 week ago by
Tanaphum Wichaita.
-
AuthorPosts
-
-
2024-09-09 at 4:19 pm #45438
Saranath
KeymasterCan you give an example of data that you think it could be considered as “Big Data”? What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
-
2024-09-16 at 8:12 am #45486
Aung Thura Htoo
ParticipantAn example of bid data in today’s age is the data generated by social media like Facebook, Instagram, and Twitter. According to Wiener and Bronson (2014), Facebook generates 4 new petabytes daily (4,000,000 GB). It meets the general criteria of big data that is very complex, hard to process, and growing at a very fast pace.
5Vs Characteristics:
Volume: As I mentioned above, it generates nearly 4 million GB of data daily, so, the volume is large.
Velocity: 4 million GB per day highlights the fast-paced growth of the data.
Variety: The daily generated new data includes data of different formats and types including tweets, hyperlinks, embedded photos, videos, live stream data, and so on.
Veracity: Since most data is uploaded by the users, the quality of the data is diverse, and vary. But some standard Facebook records like number of clicks, and views are of high reliability as it is recorded using standardized methods.
Value: Analyzing those data can have a great value in understanding consumer behavior, and market trends, as well as targeted advertisement for potential buyers.7Vs Characteristics: (Khan et al., 2018)
Variability: The flow of data is inconsistent, since the new and evolving updates like introducing live stream, and reels, and the changes in posting behavior of the users.
Viscosity: It refers to the complex nature of the data, that includes processing unstructured data like text in user’s posts and comments.10Vs Characteristics: (Khan et al., 2018)
Volatility: It means the durability of the data (lifetime of data stored). For example, some posting and clicking data will not be relevant in a few years later, and retention requirements is another important aspect for big data.
Viability: Not all the collected data will be useful, and determining usefulness of variables in the big data for future prediction modelling is another important aspect of the big data.
Validity: After ensuring the veracity of the data, the validity of the data-driven decision is another important property of the big data. For example, filtering and checking bot-generated data to detect misinformation and hate speech ensures validity of Facebook community standard restriction warnings.References
Khan, N., Alsaqer, M., Shah, H., Badsha, G., Abbasi, A., & Salehian, S. (2018). The 10 Vs, issues and challenges of big data. Proceedings of the 2018 International Conference on Big Data and Education, 52-56. https://doi.org/10.1145/3206157.3206166.
Wiener, J., & Bronson, N. (2014, October 22). Facebook’s top open data problems. Facebook Research. https://research.facebook.com/blog/2014/10/facebook-s-top-open-data-problems/. -
2024-09-18 at 8:28 am #45559
Cing Sian Dal
ParticipantBefore explaining Big Data characteristics, I would like to mention the evolution of humans first.
In the Stone Age, humans lived a hunter-gatherer lifestyle. It didn’t stop there, humans evolved to the agricultural revolution where humans began the development of agriculture and domestication of animals. Humans didn’t end there. It continued to the Industrial Revolution where the invention of machines, and electricity led to rapid urbanization and industrialization. Human evolution did not end there; they kept getting better into the information age where the development of computers and the internet transformed communication and global connectivity. Humans continued to advance into globalization and now artificial intelligence. Here, you can see the characteristics of humans evolving through different evolution
Similarly, the characteristics of Big Data started with 3Vs and progressed to 5Vs, 7Vs, and then 10Vs. Initially, the definition of Big Data satisfied with three characteristics: Volume (the large quantity of data), velocity (the speed of data processing), and variety (various forms of data types); however, eventually, it was no longer true. It transitioned into the need for two additional characteristics: Veracity (the trustworthiness or accuracy of the data) and value (the usefulness or benefits of the data). Over time, Big Data possessed two additional characteristics: Variability (the inconsistencies or changes over time), and Visualization (the ability of data presentation). As time goes by, Big Data has grown into 10Vs in which the new three characteristics are Validity (the accuracy of data for intended use), Volatility (the lifespan of data or data relevance), and Vulnerability (the security and privacy of data).
As an example, let’s suppose that we own a private hospital. As an earlier stage, we implemented electronic health records which require:
Volume: The massive amount of patient data including past medical history, lab test results, and treatment plans
Velocity: EHR data are constantly being updated as patients receive treatment, tests are performed, and diagnoses are made.
Variety: EHRS data contains structured data (vital signs, blood pressure, blood type, etc.) and unstructured data (physician notes, etc.)
Then, our hospital has advanced into integrating with wearable health devices for specific patients. In this case, the following characteristics can be seen:
Veracity: Data collected through wearable devices are made sure to be reliable and accurate
Value: The insightful values such as early disease detection and personalized health recommendations are given back using the collected data.
As time goes by, the hospital has incorporated genomic medicine which involves:
Variability: Genetic data are greatly varied with individuals and populations requiring a sophisticated analytic system.
Visualization: Without visualization of genetic data, it would be quite impossible to identify the patterns and their relationships.
Over time, the hospital tried to develop the analytic platform to enhance the clinical decision support system that demands:
Validity: The data will need to be made sure that the retrieved data is correct and accurate for decision-making, analysis, research purposes, personalized medicine recommendations, and so on.
Vulnerability: At the same time, sensitive patient data are protected from data breaches, unauthorized access, and attacks
Vlatility: The duration of data storage and managing long-term storage should be considered based on data retention laws or policies.
In conclusion, different characteristics of Big Data can be seen as it evolves like a baby growing into an adult and then an elderly.
-
2024-09-24 at 10:29 am #45640
Aung Thura Htoo
ParticipantHello Cing, it is a great analogy to compare the evolution of human with the evolution of big data characteristics. It is an interesting read indeed.
-
-
2024-09-21 at 10:57 pm #45598
Wannisa Wongkamchan
ParticipantElectronic Health Records (EHRs) are a clear example of “Big Data.” These digital records hold extensive information about a patient’s health, including medical diagnoses, treatments, medications, allergies, and lab results.
Characteristics of Big Data in terms of the 5Vs:
1. Volume Hospital data is generated massive amounts of data daily, ranging from patient records, radiology images, to genomic data, and other types of data.
2. Velocity Data in healthcare is generated at a rapid rate, particularly from real-time monitoring devices.
3. Variety EHRs contain a wide range of data formats, including structured data (like patient demographics, billing codes and lab results) and unstructured data (e.g., physician notes, medical images).
4. Value The potential value derived from EHRs data is enormous, ranging from improving patient outcomes, identify health trends, lowering healthcare costs to develop new treatments.
5. Veracity EHR data can be noisy and inconsistent, leading to data quality issues. Errors in documentation, incorrect coding, or discrepancies between different sources.
Extended Characteristics of Big Data (for models with 7Vs or 10Vs):
6. Variability Healthcare data is known for being highly context-sensitive. Some data can change rapidly and hard to predic. The meaning of data can change over time. e.g., a patient’s vital signs and blood pressure.
7. Visualization Presenting complex healthcare data in a comprehensible format is crucial, especially to decision-makers need clear insights.
8. Vulnerability Healthcare data is sensitive and must be protected. EHR data requires strong encryption and access controls to protect patient privacy.
9. Volatility The lifespan of data in healthcare can vary, real-time ICU monitoring data might be discarded after use, but patient medical histories are stored for decades.
10. Validity The correctness and accuracy of healthcare data are critical. Data integrity must be ensured for it to be useful in clinical decision-making. Inaccurate patient records can lead to misdiagnosis, affecting the quality of care. The accuracy and quality of EHR data can vary, depending on factors like data entry practices and data validation processes.
In summary, EHR data is an excellent example of Big Data in healthcare that fits the extended characteristics (Vs) of Big Data due to its vastness, speed, and complexity across multiple dimensions.
-
2024-09-24 at 10:36 am #45642
Aung Thura Htoo
ParticipantHello Wannisa, thank you for sharing example. I agree with you that EHRs represent the characteristics of big data that is expanding and hard to process in a timely manner with conventional tools.
-
2024-09-24 at 9:39 pm #45654
Aye Thinzar Oo
ParticipantIt is great sample of big data in health information system and thank you sharing in your expansion on 5Vs, 7Vs, 10Vs characteristics of big data model.
-
-
2024-09-22 at 8:42 pm #45622
Aye Thinzar Oo
ParticipantCan you give an example of data that you think it could be considered as “Big Data”? What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
Big data is a combination of unstructured or structured data collected by organizations or governance. Big data can used to improve operations, provide better customer service, and create decision-making.
I understand the big data on the census population data, electronic health records, social media data, customer databases, and emails. The example of data that I mention for “Big Data” can be census population data, which involves counting the population for official purposes, such as the number of people living in a country, and obtaining demographic information like age, sex, and race.Characteristics of Big Data in terms of the 5Vs, 7 Vs, and 10Vs:
Volume: The census data is increased or decreased amount of data daily according to birth registration or death records.
Velocity data in census refers to how quality data is collected, processed, and published.
Variety census data contain demographics, economic, geographic, and household data.
Value in census data is large, enough to improve quality data outcomes and identify population and household trends, the benefits of those data can be provided and related to directly can do with the collected data.
Veracity in census data can be inconsistent, and missing data leads to quality issues. It leads to better decision-making and resource allocation based on demographic information.
Variability in the census data can be inconsistent in the data. When people move to other locations, even can change geographical information, but personal data will be the same.
Visualization of census data to help the public understand complex data in a more digestible.
Validity: The data needs to be made sure data is corrected and accurate for using internal, confidential data to enhance reliability.
Volatility: To handle those data, long-term storage plans, and warehouses or use data management systems for data retention.
Viability in census data of the demographic information and used to assess the healthcare sector and sustainability of populations. Evaluating facts such as birth and death rates, age structure, and genetic risks facing the population and also be analyzed for developing strategies for future management.
Let me summarize an example of big data in census population data, it can be extended to use every department at the national level (government, healthcare sector, Schools, …) for the respective purposes.
I am just sharing my old working experience in the 2015 Myanmar election experience.-
2024-09-24 at 10:31 am #45641
Aung Thura Htoo
ParticipantHello Aye, thank you for sharing your ideas and experiences on 2015 election process. Yes, I agree with you that census is an example of big data that is expanding periodically which contains variables that are hard and complex to analyze in some periods.
-
-
2024-09-24 at 2:42 pm #45646
Alex Zayar Phyo Aung
Participant1) Example of big data
If we want to review the big data from V characteristics, let me share the Myanmar Demographic Health Survey 2015-2016 data as big data of Myanmar. Over 13,000 households including 17,000 + individuals were interviewed to capture a wide range of socio demographic characteristics, fertility rate, data for composite coverage index (for UHC? But I am not sure), nutrition, access to health care, 3 disease, and women empowerment.2) MDHS 2015-2016 in 5 V characteristics
Volume
The MDHS was a large-scale data collection for more than 17,000 sample size. The dataset includes data on numerous variables such as health outcomes, family structures, and socioeconomic indicators. This large-scale collection qualifies as Big Data due to the volume of individual records and responses.Velocity
Initially, DHS was planned to conducted periodically and although the data is not real-time, the process has improved with digital data collection tools. Survey responses were collected through tablet which enable faster data processing and more rapid analysis compared to traditional paper-based methods. It reduced the human error as well.Variety
MDHS ollected structured data like demographic statistics, health outcomes) and semi-structured data including qualitative data from individual and household interviews, qualitative survey responses. The variety of data includes numerical metrics, categorical health outcomes, and more subjective answers regarding access to healthcare or family planning services.Veracity
In MDHS, veracity issues might arise due to self-reporting biases (e.g., underreporting of sensitive topics like domestic violence or HIV status)Value
The Myanmar DHS data is highly valuable for health actors including international organizations and UN agencies. It helps inform public health policies, guides resource allocation, and contributes to initiatives like improving maternal and child healthcare, tackling malnutrition, and addressing infectious diseases. Even though it was published last 8 years ago, public health actors are still using it as a baseline information for health program management since no national level health survey can be conducted after this nationwide survey.-
2024-09-24 at 8:35 pm #45649
Wannisa Wongkamchan
ParticipantThank you for sharing about the MDHS. In my opinion, sensitive topics like domestic violence or HIV status should be approached carefully in large-scale surveys. Instead of asking these questions to everyone, using a sampling technique might be a more effective approach. This way, the survey can collect important data from a smaller, more specific group, making people feel more comfortable while still keeping the data accurate.
-
2024-09-24 at 9:55 pm #45655
Aye Thinzar Oo
ParticipantThank you for your sharing a sample of Big Data pointing to Myanmar Demographic Health Survey 2015-2016 data as big data of Myanmar. I think those data will be used for UHC perspective.
-
-
2024-09-24 at 9:35 pm #45653
Siriluk Dungdawadueng
ParticipantHealthcare Data from Wearable Devices: Wearable health technology (like fitness trackers and smartwatches) collects extensive data about users’ health metrics, such as heart rate, activity levels, sleep patterns, and more.
Big Data Characteristics (5Vs, 7Vs, 10Vs)
1. Volume: Millions of users generate terabytes of data daily from their devices. The sheer scale of data collected from numerous devices across a large population qualifies as Big Data.
2. Velocity: Continuous data streaming from wearables in real-time, such as heart rate monitoring or step counts. Data is generated at high speed, requiring real-time processing to provide immediate insights (like alerting users to abnormal heart rates).
3. Variety: Data includes structured data (like heart rate and steps), unstructured data (like user-generated notes or logs), and semi-structured data. Wearable devices collect a diverse array of data types, making the dataset rich and complex.
4. Veracity: Data may vary in accuracy depending on device calibration, user behavior, and context (e.g., heart rate readings during exercise vs. rest). Ensuring data quality is crucial because inaccurate health data can lead to incorrect health assessments or alerts.
5. Value: Insights derived from the data can help in personalized health recommendations, predicting health risks, and improving overall wellness. The ability to analyze user health data provides significant value to both users and healthcare providers for proactive health management.
6. Variability: Data usage can spike during certain events (like fitness challenges) or vary based on user engagement (e.g., a user may only wear the device sporadically). The variability in data flow based on individual usage patterns makes it essential to adapt analytics accordingly.
7. Visualization: Dashboards that visualize heart rate trends, activity levels, and sleep quality over time. Effective visualization tools help users and healthcare professionals interpret complex data easily and make informed decisions.
8. Volatility: Real-time alerts (like an abnormal heart rate) are only relevant for a short period, while long-term trends in fitness may be analyzed over the years. Some data is transient and needs immediate action, while other data contributes to long-term health assessments.
9. Validity: Ensuring that data collected (like blood pressure readings) is accurate and aligns with clinical standards. Validity is crucial to ensure that health data can be trusted for making health decisions.
10. Vulnerability: Sensitive personal health data collected by wearables must be protected against unauthorized access and breaches. The sensitive nature of health data requires stringent security measures to protect user privacy and comply with regulations like HIPAA.
The data generated by wearable health devices has significant volume, is produced at high velocity, contains diverse variety, faces challenges in veracity, and has high value for health insights. The additional variability, the need for effective visualization, and concerns around volatility, validity, and vulnerability further illustrate its complexity as Big Data.
-
2024-09-26 at 3:47 am #45676
Tanaphum Wichaita
ParticipantCan you give an example of data that you think it could be considered as “Big Data”? What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
3 Vs
Volume – size and amounts of big data.
Velocity – speed at which data is generated, received, stored, and managed.
Variety – range of different data types, including unstructured data.5 Vs
Value – usefulness, importance and business value of Big Data
Veracity – Trustworthiness and reliability of data and information.7 Vs
Variability – Handling inconsistency and changing data patterns.
Visualization – Representing Big Data insights visually for better understanding.10 Vs
Validity – Correctness, accuracy, and relevance of data.
Vulnerability – Security and privacy risks associated with Big Data.
Volatility – Time relevance and the lifespan of data. (How long does data need to be kept for?)
-
-
AuthorPosts
You must be logged in to reply to this topic. Login here