Ex Parte Gosby et al - Page 6

Appeal 2009-3941
Application 10/334,370

9. The next step in the process is determining a score for each
word stem and word stem sequence. This is carried out on a statistical basis.
One example of a calculation of the likelihood or probability of occurrence
of each of the stem words, doubles, and triples will now be described. It
should be noted that, while a mathematical probability is given in the
following examples, this need not be the case in practice (� 0098).
10. The classification system processes texts in the same way as the
training texts to identify word stems and their count, which are determined
by a score (� 0112).
11. For each axis, the probability of the new text belonging to each
group on the axis is calculated (� 0125). This relates the probability of the
text being allocated to a particular group on each axis on the basis of the
training data and the text being classified. This is performed by multiplying
(for every word) the probabilities of that word occurring in a document that
is allocated to that group (based on the training data) (� 0126).
12. Having determined the differences using the split-merge-
compare algorithm for the original training data, the classifications and word
stem data for texts that were determined to give scores of high confidence
are added to the original training data to provide modified training data,
which is compared with the differences generated for the original training
data (� 0157).
13. Brown discloses different methods for comparison between
scores. As depicted in Figure 13, the hierarchical structure of a
classification tree is illustrated. In this embodiment the qualities or axes
have extreme values indicating how much the document is concerned with a

Page: Previous 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Next

Last modified: September 9, 2013