- This topic has 1 reply, 2 voices, and was last updated 1 year, 5 months ago by .
Viewing 1 reply thread
Viewing 1 reply thread
You must be logged in to reply to this topic. Login here
Home › Forums › Data mining and machine learning › Archive 2023 › Week 2 Decision tree: sample size & continuous attribute
Hi all. I have a few question in decision tree analysis class,
1) What is the appropriate sample size of training set and proper number of attributes for a good decision tree , are there rules like minimum data or maximum number of attributes?
2) For continuous attributes, can we use decision tree, how it handle? Do we need to manually convert before training, how can we get a cut off point for each classes?
Thank you so much
Hi Tanyawat,
I’m sorry it took a while for me to circle back to answering this question in written form.
1) An answer for this question is highly data-dependent. But the rules of thumb for data training is the more the better. And then you set that aside for training 70-80% of all the data you have.
If there are clear patterns hidden in your data, then not much data is needed for training (as clearly seen in our sample dataset from the lectures). If you find that even the large amount of data could not produce satisfying decision trees, perhaps the right attributes for prediction are not present in your data, or perhaps they were there but there were confounding effects at play, in which case a preprocessing of data might lead to better results.
This idea applies to the appropriate number of attributes. If you have the “right” attributes, then you will not need many of them to predict the data. But, of course, real-life data usually are not perfect. Another thing to consider is the more number of attributes are being fed into the algorithms, the more time it takes and the more complex the resulting decision trees will be.
2) The decision tree algorithm that we use in class for r (rpart) should be able to handle this. For other tools / programming languages / software, you will need to read the documentation of that specific tool/language/software to see if they have this ability embedded.
Regards,
Pimwadee
You must be logged in to reply to this topic. Login here