- This topic has 25 replies, 13 voices, and was last updated 3 years, 4 months ago by
imktd8.
-
AuthorPosts
-
-
2019-09-26 at 11:32 am #13887
Saranath
KeymasterCan you give an example of data that you think it could be considered as “Big Data”?
What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
-
This topic was modified 3 years, 5 months ago by
Manatsanan (admin).
-
This topic was modified 3 years, 5 months ago by
admin.
-
This topic was modified 3 years, 5 months ago by
-
2019-10-23 at 3:45 pm #15059
Chalermphon
ParticipantCan you give an example of data that you think it could be considered as “Big Data”?
I think the example of Big Data is 43 file databases. 43 file databases in Thailand are coming from the differences of HIS (Hospital Information Systems) and different standard Data set. 43 file databases have a standard structure of health record to use together. Benefits of 43 files such as Report processing services to the relevant departments at the provincial and ministry levels, Report according to the data set, Provincial Health Office can report the report to make indicators. Analyze, present data from 43 files in various dimensions. The use of data that is processed together and Reduce the burden of creating department reports such as report application from summary data and data visualization “https://hdcservice.moph.go.th”
What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
5Vs Improve data coordination , Avoid data error reduce cost high quality. Increase variety and high velocity hinder.
7Vs The data in a manner that’s readable and accessible and The data whose meaning is constantly changing.
10Vs Data security and data breach, Data accuracy and readiness for analysis. Data considered obsolete or irrelevant -
2019-10-24 at 5:29 pm #15099
Pacharapol Withayasakpunt
ParticipantHealth information in various Electronic Health Records across different hospitals can be considered Big Data.
1. Volume — As the number of patients is high, the volume of data is undoubtedly high. It is also high in other aspects as will be described.
2. Velocity — Healthcare facility works like a factory. A lot of data is continually added everyday, with no holidays
3. Variety — At least, there will always be digitizable formats, such as typing in; and undigitizable formats, such as scans and videos. If variety of data is not accepted, some valuable information will be lost.
4. Variability — Not all data is equally important. Data can be categorized by importance.
5. Value — All data recorded to the system should have value. This may also include interoperability.6. Validity — Some information is actually lost by digitization.
7. Veracity — Inputting of data should be accurate, but isn’t always in reality.8. Venue — Different venue may or may not use the same application, and even with the same application, the guideline on how to use the application is not well developed.
9. Vocabulary — Data models and data structures are usually the same if controlled by the same application. But in reality, they don’t always use the same application. Also, there are several supplement applications.
10. Vagueness — After all, value of information depends on context. There needs to be research on which context to use, and possibility of using it in a new context.I might also add
– Volatility — Usually the information is not volatile in health. Complete natural history helps.
– Virality and viscosity — As the data is big. The emphasis is not put equally on every part of data, resulting in bias.
– Visualization — to prevent bias, and to make use of data.
– Redundancy — There will be repetition of data from different sources in different context.-
2019-10-26 at 2:04 am #15186
Saranath
KeymasterGreat! Pacharapol
-
2019-10-30 at 10:07 pm #15332
tullaya.sita
ParticipantI agree that EMR is BIG data by means of volume. However, by EMR itself is not really matched with big data definition ( by xV criteria). We need to add some features like machine learning, data visualization to process data in EMR to get the new outcome or process activity.
-
-
2019-10-25 at 11:08 am #15165
supawat.cht
ParticipantCan you give an example of data that you think it could be considered as “Big Data”?
A large number of complex data could be considered as Big Data in my opinion. If I think of volume, traffic information in the google map apps would be an example that it can show where are the congested areas. For velocity, I think of registration of ชิม ช้อป ใช้ program until now I still can’t be able to register. And for variety, definitely I think of health informatics since each individual patient never be the same.
What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
5 Vs are plus Variability and Validity
7 Vs are plus Veracity and Vulnerability
10 Vs are plus Volatility, Visualization and Value -
2019-10-26 at 12:08 am #15185
Ameen
ParticipantI think twitter is one of the famous examples of Big Data implemented on so many health informatics projects. It can represent all characters of big data as follows;
1 Volume: Twitter generates more than 500 million “tweets” a day. A tweet contains around 280 characters or roughly 40 words per tweet. That means twitter create 20,000 million words a day!
2 Velocity: You can observe how massively and continuously twitters generate data in real-time here https://www.internetlivestats.com/twitter-statistics/
3 Variety: Tweets are mostly texts, and pictures posted and are unstructured.
4 Veracity: Tweets are posted from all groups of age, ethnic, background, worldwide, etc., to represent thoughts, views, truths, situations in some specific time and area. In Thailand, you can check daily talk-of-the-town by top tweets on the day. The tweets can be a rumor or a truth.5 Value: texts, pictures in tweets are posted in real-time and can predict the prevalences of some pandemic like flu, which is valuable for health informatics.
6 Variability: You can see trends and change of meaning in tweets as data is generated differently when the time has passed, effected by different situations on a minutely, daily or monthly basis.
7 Visualization: Sources like twitter may sound not trustable, so all information analyzed from twitter must be in an excellent presentation to support the purpose of intended projects.
8 Validity: to make analyzed data be able to represent accurate results, consistency of qualified tweets and definitions of each evidence texted or pictured in tweets is vital unless the trends can be biased.
9 Vulnerability: For large value mount of data like big data, it’s essential to have such data protected by adopting data safety practices. Data breach in big data can impact a vast population.
10 Volatility: as the date comes into sever with such a rapid stream like shows in Velocity, it’s crucial to manage how long to store collected tweets data before getting rid of such data to reserve storage space for new coming tweets data.-
2019-10-26 at 2:09 am #15187
Saranath
KeymasterAmeen, twitter is a good example of Big Data. It’s really a challenge to analyze the data considering its volume, variability, variety, and validity of the tweets.
-
2019-10-27 at 8:50 am #15204
Pyae Phyo Aung
ParticipantI rarely use twitter. It really is a Big Data like any other social media. Twitter usage statistics is awesome.
-
2019-10-27 at 4:05 pm #15209
Ameen
ParticipantHi Pyae Phyo Aung! ….Twitter is most popular among teenage and working-age in Thailand. There are some saying, kinda political things, that “new generation is on Twitter, older is on LINE”. There is also some sub-culture group that does ‘storytelling’ on twitter. Considering big data, sampling methodology to find out qualified tweets for analysis is challenging as it doesn’t represent a certain population as a whole.
-
This reply was modified 3 years, 5 months ago by
Ameen.
-
This reply was modified 3 years, 5 months ago by
-
-
-
-
2019-10-26 at 11:16 pm #15201
THONGCHAI
ParticipantCan you give an example of data that you think it could be considered as “Big Data”?
I think Big Data are line , facebook and twitter but it only social media not health information. In Thailand Big Data of public health information such as
1. 43 File data set of Ministry of Public Health
2. NTIP (National Tuberculosis Information Program) is a program about TB prevention , lab result , and drug used in TB patient.
3. NAP (NATIONAL AIDs Program) is a program about HIV prevention , OI , lab result , blood test , condom used and drug used in HIV patient.
4. 506 is Epidemiological report.What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
The characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data consist of
1. Volume is a large amount of data. And is larger than general data
2. Velocity is data that is constantly changing. And is fast
3. Value is valuable information. Can be used for real use
4. Variety is the form of data There are many Whether text Or will be various media such as pictures, videos
5. Veracity is the accuracy of the information.
6.Variability is Data that is not stable
7.Visualization is using charts and graphs to visualize of complex data is much more effective in meaning.
8. Volatility How old does your data need to be before it is considered irrelevant , historic.
9. Visualization Another characteristic of big data is how challenging it convert database to visualize.
10. Value Lastest the most important of all, is value. The other characteristics of big data are meaningless if you don’t derive business value from the data. -
2019-10-26 at 11:35 pm #15202
w.thanachol
ParticipantFacebook is the most obvious Bigdata in my opinion because it takes information about user’s demography, interesting and the like. Moreover, numbers of web-based application also allow us to use the Facebook account to sign in instead of sing up a new account, hence the Facebook could access user’s information wider. After that, a learning machine in the Facebook algorithm could analyze a user’s preference and push the most appropriate feed and advertisement to each user. However, in Thailand, there is Electronic health information called 43-file database which accumulates electronic health record from public hospitals throughout Thailand can be an example of big data in Thailand’s health sector.
5 V of big data consists of volume, velocity, variety, veracity, and variability.
7 v of big data consists of volume, velocity, variety, veracity, variability, value and visualization.
10 v of big data consists of volume, velocity, variety, veracity, variability, value, validity, venue, vocabulary and vagueness. -
2019-10-27 at 8:45 am #15203
Pyae Phyo Aung
ParticipantHealth Information (from OpenMRS) in different State and Region of our country can be considered as ‘Big Data’ .
1. Volume – every day, patients record are increasing from new patients records and follow up records.
2. Velocity – Individual patients data are recording continuously in real-time or retrospective records.
3. Variety – Information come with different format from different format. CSV file from patient records, image file from radiological department, data from dispensing unit and laboratory.
4. Veracity – our EMR are mostly structured data with lost of validation and predefined words and suggestion so that there will be less error in data entry.
5. Value – All the patients information are valuable helping clinician better decision making during patients management and reduce the program workload generating necessary reports.
6. Variability – all the concepts, dictionary and treatment option can be changed time to time.
7. Validity – EMR set validation rules for users, standard concepts and dictionary but in term of clinical note there might be less validity and need data cleansing as it is a free text.
8. Vulnerability – All information are sensitive and need good protection. Good news is all the data are kept in offline regional servers.
9. Volatility – As our country only store structured data and few variable , still no need to worry about the storage space. No history is health data is volatile.
10. Vagueness – As our database stored mostly structured data , there is not much vagueness. All the information are usable for analysis. Need less data cleansing. -
2019-10-27 at 9:30 pm #15216
Saranath
KeymasterGreat examples from you all! Data from social media are certainly called big data. And we have seen that the companies have been utilizing their data very well, in terms of business. For example, they use machine learning technique to provide suggested pages related to your interests. In healthcare, EMRs from HIS can also be considered as big data. One of challenges of utilizing EMR from HIS is that data from EMRs consist of both structured and unstructured data (scanned doctor note, radiology images, etc). It’s difficult for conventional statistical methods to analyze these data.
-
2019-10-27 at 10:16 pm #15224
Ameen
ParticipantHi…Aj. Saranath…I have some silly question that How big does it called “Big”? If EMRs in hospital can be a big data…whether data stored in a 30-bed hospital can be the big ? For me the 5vs, 7vs, 10vs criteria does not give much a clear cut.
-
2019-10-30 at 8:59 am #15310
Saranath
KeymasterGood question! And I don’t have an answer for that. It’s very difficult to determine how big of the data should be called “Big data”. That’s why people keep adding more Vs characteristics to the term “Big data”. One of my project, we applied a machine learning technique (suppose to be a technique for big data analysis) to only 400 patient records (with many variables), but it turns out well. However, the more data we have, the more precision and accuracy of data analysis would be.
-
2019-10-30 at 10:15 pm #15333
tullaya.sita
ParticipantAj Saranath, I still wonder about the definition of big data. Is it a definition for defined the volume of data or it defined for the application or system that can catch all data and process it? As you mentioned for EMR as a big data, but EMR itself didn’t have all other Vs characteristic unless we doing something with that data.
-
2019-11-02 at 12:14 am #15386
Ameen
ParticipantHi..Dr.Tullaya…I am trying to get to understand big data too….I have read an article explaining that big data can be like we go to a big pool or ocean where we don’t know what is under the water…what we know is that there are somethings deep down and we are finding it …. or may be we wanted to find something…so we go to the big pool to check if there are such things under the water. I’m not not sure if this is a good explanation
-
This reply was modified 3 years, 5 months ago by
Ameen.
-
This reply was modified 3 years, 5 months ago by
-
2019-11-02 at 12:14 am #15387
Ameen
ParticipantHi..Dr.Tullaya…I am trying to get to understand big data too…I have read an article explaining that big data can be like we go to a big pool or ocean where we don’t know what is under the water…what we know is that there are somethings deep down and we are finding it …. or maybe we wanted to find something…so we go to the big pool to check if there are such things under the water. I’m not sure if this is a good explanation
-
2019-11-03 at 10:05 pm #15414
tullaya.sita
ParticipantThank you, Ameen! I have more understanding about big data.
-
-
-
2019-11-02 at 12:05 am #15385
Ameen
ParticipantThank you Aj. Saranath…So you meant, big data can be defined by numbers of Vs and all are independent to each other….and that a big data may have no the “volume” but they can have other random Vs instead… after all, things that can label somethings as a big data is a technique for big data analysis ?
-
-
-
-
2019-10-27 at 11:17 pm #15226
weerawan.hat
Participant1. Can you give an example of data that you think it could be considered as “Big Data”?
The telecommunication system in Thailand including clients of DTAC, AIS, True, TOT, CAT. The data of interest such as customer churn, service expansion.2. What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
Volume: too large to analyse with traditional methods
Velocity: data collected in real-time and a rapid pace
Variety: many data types and come from many sources
Veracity: error free and credible reliability of the data source, its context, and how meaningful it is to the analysis based on it.
Variability: dynamic, evolving, spatiotemporal data, time series, seasonal, and any other type of non-static behavior in data sources
Validity: data accuracy and readiness for analysis
Viability: difficult to build robust models
Volatility: How old does your data need to be before it is considered irrelevant, historic, or not useful any longer? How long does data need to be kept for?
Visualization: how challenging it is to visualize.
Value: characterizing the business value, potential of big data to transform our organization -
2019-10-28 at 1:50 am #15228
tullaya.sita
ParticipantCan you give an example of data that you think it could be considered as “Big Data”?
I think the data that could be considered as big data, for an example, is data frrom facebook
What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
For the facebook it fits to be a big data with
1. Volume: there are a large volume of data generated on facebook each day from a lot of users around the world.
2. Variety: many sources and types of data both structured and unstructured such as text, picture, audio and VDO
3. Velocity: the speed at which the data is created, stored and visualized in the facebook is very fast.
4. Veracity: the data on facebook database has an uncertainty because it is generated from a lot of users with different background
5. Value: Data recorded on facebook has been collected and brought into the process activity or predictive analysis -
2019-11-03 at 11:24 am #15407
Penpitcha Thawong
ParticipantFor my experience, sequencing data could be considered Big Data.
Both of whole-genome sequencing (WGS) or even whole-exome sequencing (WES) produce a vast volume of data. The data size of WGS per person requires around 100 GB storage (around 3000 million base pairs per person) Regarding analyzing data, to handle with very large data, the data need to be fast analyzed for use in the right time (Velocity). For example, in pharmacogenetics, now some drugs are known to cause adverse respond to some people because of their genetic makeup or genes. So if we know that as fast as possible, it will help a doctor’s decision for ordering the medicine to patients.
The variety of this type of data may be about the function of each sequencing in different locations: intron, exon, regulatory element, for example. Therefore a user has to understand the structure and function before analyzing. There are many technologies for sequencing, data incompatibilities can happen all the time (variability). Therefore, veracity is also important and should be concerned, a bad quality may lead to unreliable results. -
2019-11-20 at 4:57 pm #15729
Dr.Watcharee Arunsodsai
Participant1. Can you give an example of data that you think it could be considered as “Big Data”?
Example of Big Data that I could consider in real life is Big Data of Banking sectors and social media. But if we are talking about Public Health Informatics, in which Big Data could be Electronic Health Records.
2. What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
2.1 Volume: The data volume of Banking sectors is too large to be analyzed with traditional methods.
2.2 Velocity: The data collected in timely manner and a rapid pace like streams in Banking application.
2.3 Variety: many data types and come from many sources including structured and unstructured such as text and sensor data such as bar code or QR code.
2.4 Veracity: The reliability of the data source is trustworthy, authenticity and accountability, its context, and how meaningful it is to the analysis based on it.
2.5 Value: characterizing by the important of the data can bring to the intended process, activity, predictive analysis for the disease emerging in EHR.
2.6 Variability: The homogenization of the data even the same type, the dynamic, evolving, spatiotemporal data, time series, seasonal, and any other type of non-static behavior in data sources as we can see in EHR. Variability is different from variety. A coffee shop may offer 6 different blends of coffee, but if you get the same blend every day and it tastes different every day, that is variability. The same is true of data, if the meaning is constantly changing it can have a huge impact on your data homogenization.
2.7 Visualization: how critical it is to visualize the large amount of complex data.
2.8 Validity: How accurate and correct the data is understandably for analysis.
2.9 Volatility: How old does your data need to be before it is considered irrelevant, historic, or not useful any longer? How long does data need to be kept for?
2.10 Vulnerability: How secure the Big Data need to be concerned is like in all Banking sectors, social media and EHRs.Some other Vs
Viability: Data activeness and its robustness
Viscosity or Vocabulary: Data complexity, schema, data models, semantics, ontologies, taxonomies, and other content- and context-based metadata that describe the data’s structure, syntax, content, and provenance.
Venue: distributed, heterogeneous data from multiple platforms, from different owners’ systems, with different access and formatting requirements, private vs. public cloud.
Vagueness: confusion over the meaning of big data. Is it something that we’ve always had? What’s new about it? What are the tools? Which tools should I use? -
2019-11-23 at 8:03 pm #15799
imktd8
Participant1, Can you give an example of data that you think it could be considered as “Big Data”?
Firstly, I think I am the last one to reply to Aj. Saranath’s question and all of my friends can reply with the good answer. Then I have no anything answer haha ^^ (just kidding kha)
As several researchers define about “Big Data” deffinition, for eaample, Big Data is a greate amount of data that traditional data management techniques cannot manage and process due to the complexity and size of this data. Below are some examples of big data data:
– Business transaction (Oil industry, Retail industry, Airline ticket booking etc.)
– Social media (Facebook has 500+terabytes in everyday, twitter, instragram, line etc.) This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.
– Bank, Finalcial service (New York Stock Exchange generates about one terabyte of new trade data per day)
– Health service (Electronic Medical Record (EMR))2. What are the characteristics of the data that fit into 5Vs, or 7Vs, or 10Vs of Big data characteristics?
—- 5Vs
1. Volume : the vast amounts of data generated every second.
2. Velocity : the speed in which data is accessible
3. Variety the different types of data we can now use, for example, text, picture, video etc.
4. Veracity : the messiness or trustworthiness of the data.
5. Value : the valuable information
—- 7Vs
6. Variability : the unstable of data (is different from variety) Ex. A coffee shop may offer 6 different blends of coffee, but if you get the same blend every day and it tastes different every day, that is variability
7. Visualization : using charts and graphs to visualize large amounts of complex data.
—-10Vs
8. Validity : how accurate and correct the data is for its intended use
9. Vulnerability : all information are sensitive, then need security to protect data
10. Volatility : the life duration, that for how long time data is valid and for how long time it should be stored-
This reply was modified 3 years, 4 months ago by
imktd8.
-
This reply was modified 3 years, 4 months ago by
-
-
AuthorPosts
The forum ‘TMHG 523 Principles and Foundations of Public Health Informatics 2019’ is closed to new topics and replies.