- This topic has 2 replies, 2 voices, and was last updated 4 years, 7 months ago by .
Viewing 2 reply threads
Viewing 2 reply threads
You must be logged in to reply to this topic. Login here
Home › Forums › Data mining and machine learning › Archive 2020 › Assignment › Week 2 Assignment : Data Source
Hi,
I think I am at a roadblock from the getgo. I downloaded the data from the link and was trying to compare it with the attribute description. Column 2 is meant to be Diagnosis ( B or M) but the data link has a bunch of numbers. I checked on Kaggle for a similar description and their data had B and M as the actual values. Just wondering if I was using the data from the UCI link wrong or should I go with the Kaggle version of the data:
https://www.kaggle.com/uciml/breast-cancer-wisconsin-data
Hope someone could shed some light 🙂 Thanks very much !
The dataset I believe has been processed from its original. The data with a bunch of numbers with no B or M diagnosis is in fact described here. http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.names
An excerpt from the above file is below:
=========================
7. Attribute Information: (class attribute has been moved to last column)
# Attribute Domain
— —————————————–
1. Sample code number id number
2. Clump Thickness 1 – 10
3. Uniformity of Cell Size 1 – 10
4. Uniformity of Cell Shape 1 – 10
5. Marginal Adhesion 1 – 10
6. Single Epithelial Cell Size 1 – 10
7. Bare Nuclei 1 – 10
8. Bland Chromatin 1 – 10
9. Normal Nucleoli 1 – 10
10. Mitoses 1 – 10
11. Class: (2 for benign, 4 for malignant)
Thanks so much 🙂
You must be logged in to reply to this topic. Login here