Microsoft Word turney-littman-acm doc



Yüklə 200 Kb.
Pdf görüntüsü
səhifə8/18
tarix22.05.2023
ölçüsü200 Kb.
#119806
1   ...   4   5   6   7   8   9   10   11   ...   18
Threshold
A
c
c
u
ra
c
y
AV-ENG
AV-CA
TASA
Figure 1. Accuracy of SO-PMI with the HM lexicon and the three corpora. 
Table 5 shows the accuracy of SO-PMI with the GI lexicon, which includes adverbs, 
nouns, and verbs, in addition to adjectives. Figure 2 gives more detail. Compared with 
Table 4 and Figure 1, there is a slight drop in accuracy, but the general trends are the 
same.
Table 5. The accuracy of SO-PMI with the GI lexicon and the three corpora. 
Percent of full 
test set 
Size of test set 
Accuracy with 
AV-ENG 
Accuracy with 
AV-CA 
Accuracy with 
TASA 
100% 
3596 
82.84% 
76.06% 
61.26% 
75% 
2697 
90.66% 
81.76% 
63.92% 
50% 
1798 
95.49% 
87.26% 
47.33% 
25% 
899 
97.11% 
89.88% 
68.74% 
Approx. num. of words in corpus 
1 × 10
11
2 × 10
9
1 × 10
7


19
0
10
20
30
40
50
60
70
80
90
100
0
10
20
30
40
50
60
70
80
90
100
Threshold
A
c
c
u
ra
c
y
AV-ENG
AV-CA
TASA
Figure 2. Accuracy of SO-PMI with the GI lexicon and the three corpora. 
5.3. Varying the Laplace Smoothing Factor 
As we mentioned in Section 3.1, we used a Laplace smoothing factor of 0.01 in the 
baseline version of SO-PMI. In this section, we explore the impact of varying the 
smoothing factor.
Figure 3 graphs the accuracy of SO-PMI as a function of the smoothing factor, which 
varies from 0.0001 to 10,000 (note the logarithmic scale), using the AV-ENG corpus and 
the GI lexicon. There are four curves, for four different thresholds on the percentage of 
the full test set that is classified. The smoothing factor has relatively little impact until it 
rises above 10, at which point the accuracy begins to fall off. The optimal value is about 
1, although the difference between 1 and 0.1 or 0.01 is slight. 
Figure 4 shows the same experimental setup, except using the AV-CA corpus. We see 
the same general pattern, but the accuracy begins to decline a little earlier, when the 
smoothing factor rises above 0.1. The highest accuracy is attained when the smoothing 
factor is about 0.1. The AV-CA corpus (approximately 2 × 10

words) is more sensitive 
to the smoothing factor than the AV-ENG corpus (approximately 1 × 10
11
words). A 
smoothing factor of about 0.1 seems to help SO-PMI handle the increased noise, due to 
the smaller corpus (compare Figure 3 and Figure 4).


20
0
20
40
60
80
100
120
0.0001
0.001
0.01
0.1
1
10
100
1000
10000

Yüklə 200 Kb.

Dostları ilə paylaş:
1   ...   4   5   6   7   8   9   10   11   ...   18




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin