Back

Forum Replies Created

Viewing 2 reply threads
  • Author
    Posts
    • #22052
      Pimphen Charoen
      Keymaster

      Well done, you worked it out! One of problems that you will come across quite often in R is the format of an input. You need to find out which format is required and if your input is not in the correct format, you have to re-format it accordingly. This might involve commands as.vector, as.factor, as.numeric, as.character, as.matrix, etc. Perhaps the explanation below might also help. It does take time and practice to get familiar with R and we hope learning-by-doing should help you to go through this 🙂

      To use confusionMatrix(data, reference), you need to have both data and reference as a factor (use help in R by typing ?confusionMatrix or just google the command, for ex. this link will also do
      https://www.rdocumentation.org/packages/caret/versions/3.45/topics/confusionMatrix)

      # assuming you have “predicted2” as a factor
      > predicted2 <- as.factor(c("no","yes","no","no","yes","no","yes","yes","yes","yes"))
      > predicted2
      [1] no yes no no yes no yes yes yes yes
      Levels: no yes

      # assuming you have “actual” as a vector
      > actual <- as.vector( c("no","yes","yes","no","yes","yes","yes","no","yes", "no"))
      > actual
      [1] “no” “yes” “yes” “no” “yes” “yes” “yes” “no” “yes” “no”

      > cbind(predicted2,actual)
      predicted2 actual
      [1,] “1” “no”
      [2,] “2” “yes”
      [3,] “1” “yes”
      [4,] “1” “no”
      [5,] “2” “yes”
      [6,] “1” “yes”
      [7,] “2” “yes”
      [8,] “2” “no”
      [9,] “2” “yes”
      [10,] “2” “no”

      # to use confusionMatrix(data, reference), you need to have both data and reference as a factor

      # you can use as.factor to do this however you should always print this out to double check. Sometimes you still need to play with it to get the right format. In this case you would also want it to have the same levels, for ex. Levels: no yes

      > as.factor(actual)
      [1] no yes yes no yes yes yes no yes no
      Levels: no yes
      > predicted2
      [1] no yes no no yes no yes yes yes yes
      Levels: no yes

      > confusionMatrix(predicted2, as.factor(actual))
      Confusion Matrix and Statistics

      Reference
      Prediction no yes
      no 2 2
      yes 2 4

      Accuracy : 0.6
      95% CI : (0.2624, 0.8784)
      No Information Rate : 0.6
      P-Value [Acc > NIR] : 0.6331

      Kappa : 0.1667

      Mcnemar’s Test P-Value : 1.0000

      Sensitivity : 0.5000
      Specificity : 0.6667
      Pos Pred Value : 0.5000
      Neg Pred Value : 0.6667
      Prevalence : 0.4000
      Detection Rate : 0.2000
      Detection Prevalence : 0.4000
      Balanced Accuracy : 0.5833

      ‘Positive’ Class : no

    • #21856
      Pimphen Charoen
      Keymaster

      Good questions! Let me ask you a little bit further (and your peers are also very welcome to discuss!).

      You are right. For two dimensions, we can simply visualise this in a scatter plot. Do you think how many dimensions we can do visualisation? and when we have many dimensions, is there a way to do so?

      Based on the dendrograms from our iris example, can you make a good guess at which cut-off point on the tree can be used to infer a number of clusters and why? Is this the same guess between dendrograms generated from DIANA and AGNES?

      For Elbow method, we are going to add R commands for generating the plot in a bit (in section 1.6). Thanks for letting us know!

    • #21816
      Pimphen Charoen
      Keymaster

      Great! Another simple approach to calculate a ‘35’ would be to take an average across attributes, e.g. suppose in the raw data, we have 3 attributes: ID.3 shows 10 20 30, ID.5 shows 2 4 6. Therefore, ID.35 would be 6 12 18. Then you can use this ID.35 to calculate distance matrix again for the next clustering.

Viewing 2 reply threads