
1. Can you give an example of data that you think it could be considered as “Big Data”?
Example of Big Data that I could consider in real life is Big Data of Banking sectors and social media. But if we are talking about Public Health Informatics, in which Big Data could be Electronic Health Records.
2. What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
2.1 Volume: The data volume of Banking sectors is too large to be analyzed with traditional methods.
2.2 Velocity: The data collected in timely manner and a rapid pace like streams in Banking application.
2.3 Variety: many data types and come from many sources including structured and unstructured such as text and sensor data such as bar code or QR code.
2.4 Veracity: The reliability of the data source is trustworthy, authenticity and accountability, its context, and how meaningful it is to the analysis based on it.
2.5 Value: characterizing by the important of the data can bring to the intended process, activity, predictive analysis for the disease emerging in EHR.
2.6 Variability: The homogenization of the data even the same type, the dynamic, evolving, spatiotemporal data, time series, seasonal, and any other type of non-static behavior in data sources as we can see in EHR. Variability is different from variety. A coffee shop may offer 6 different blends of coffee, but if you get the same blend every day and it tastes different every day, that is variability. The same is true of data, if the meaning is constantly changing it can have a huge impact on your data homogenization.
2.7 Visualization: how critical it is to visualize the large amount of complex data.
2.8 Validity: How accurate and correct the data is understandably for analysis.
2.9 Volatility: How old does your data need to be before it is considered irrelevant, historic, or not useful any longer? How long does data need to be kept for?
2.10 Vulnerability: How secure the Big Data need to be concerned is like in all Banking sectors, social media and EHRs.
Some other Vs
Viability: Data activeness and its robustness
Viscosity or Vocabulary: Data complexity, schema, data models, semantics, ontologies, taxonomies, and other content- and context-based metadata that describe the data’s structure, syntax, content, and provenance.
Venue: distributed, heterogeneous data from multiple platforms, from different owners’ systems, with different access and formatting requirements, private vs. public cloud.
Vagueness: confusion over the meaning of big data. Is it something that we’ve always had? What’s new about it? What are the tools? Which tools should I use?