- This topic has 2 replies, 2 voices, and was last updated 5 years, 2 months ago by .
Viewing 2 reply threads
	
Viewing 2 reply threads
	
You must be logged in to reply to this topic. Login here
 Sunil Mammen.
Sunil Mammen. Sunil Mammen
Sunil MammenHi,
I think I am at a roadblock from the getgo. I downloaded the data from the link and was trying to compare it with the attribute description. Column 2 is meant to be Diagnosis ( B or M) but the data link has a bunch of numbers. I checked on Kaggle for a similar description and their data had B and M as the actual values. Just wondering if I was using the data from the UCI link wrong or should I go with the Kaggle version of the data:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
Hope someone could shed some light 🙂 Thanks very much !
 Pimwadee Chaovalit
Pimwadee ChaovalitThe dataset I believe has been processed from its original. The data with a bunch of numbers with no B or M diagnosis is in fact described here. http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.names
An excerpt from the above file is below:
=========================
7. Attribute Information: (class attribute has been moved to last column)
   #  Attribute                     Domain
   — —————————————–
   1. Sample code number            id number
   2. Clump Thickness               1 – 10
   3. Uniformity of Cell Size       1 – 10
   4. Uniformity of Cell Shape      1 – 10
   5. Marginal Adhesion             1 – 10
   6. Single Epithelial Cell Size   1 – 10
   7. Bare Nuclei                   1 – 10
   8. Bland Chromatin               1 – 10
   9. Normal Nucleoli               1 – 10
  10. Mitoses                       1 – 10
  11. Class:                        (2 for benign, 4 for malignant)
 Sunil Mammen
Sunil MammenThanks so much 🙂
You must be logged in to reply to this topic. Login here
