Appeal 2009-3941 Application 10/334,370 9. The next step in the process is determining a score for each word stem and word stem sequence. This is carried out on a statistical basis. One example of a calculation of the likelihood or probability of occurrence of each of the stem words, doubles, and triples will now be described. It should be noted that, while a mathematical probability is given in the following examples, this need not be the case in practice (¶ 0098). 10. The classification system processes texts in the same way as the training texts to identify word stems and their count, which are determined by a score (¶ 0112). 11. For each axis, the probability of the new text belonging to each group on the axis is calculated (¶ 0125). This relates the probability of the text being allocated to a particular group on each axis on the basis of the training data and the text being classified. This is performed by multiplying (for every word) the probabilities of that word occurring in a document that is allocated to that group (based on the training data) (¶ 0126). 12. Having determined the differences using the split-merge- compare algorithm for the original training data, the classifications and word stem data for texts that were determined to give scores of high confidence are added to the original training data to provide modified training data, which is compared with the differences generated for the original training data (¶ 0157). 13. Brown discloses different methods for comparison between scores. As depicted in Figure 13, the hierarchical structure of a classification tree is illustrated. In this embodiment the qualities or axes have extreme values indicating how much the document is concerned with a 6Page: Previous 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Next
Last modified: September 9, 2013