Microsoft Word turney-littman-acm doc



Yüklə 200 Kb.
Pdf görüntüsü
səhifə6/18
tarix22.05.2023
ölçüsü200 Kb.
#119806
1   2   3   4   5   6   7   8   9   ...   18
but
awkwardly (–)” 
rather than “He ran quickly (+) 
and
awkwardly (–)”. However, it seems less likely that 
HM would work well with nouns and verbs. There is nothing wrong with saying “the rise 
(+) 
and
fall (–) of the Roman Empire” or “love (+) 
and
death (–)”.
7
Indeed, “but” would 
not work in these phrases. 
Kamps and Marx [2002] use the WordNet lexical database [Miller 1990] to determine 
the semantic orientation of a word. For a given word, they look at its semantic distance 
from “good” compared to its semantic distance from “bad”. The idea is similar to SO-A, 
except that the measure of association is replaced with a measure of semantic distance
based on WordNet [Budanitsky and Hirst 2001]. This is an interesting approach, but it 
has not yet been evaluated empirically. 
4.2. Classifying Reviews 
Turney [2002] used a three-step approach to classify reviews. The first step was to apply 
a part-of-speech tagger to the review and then extract two-word phrases, such as 
“romantic ambience” or “horrific events”, where one of the words in the phrase was an 
adjective or an adverb. The second step was to use SO-PMI to calculate the semantic 
orientation of each extracted phrase. The third step was to classify the review as positive 
or negative, based on the average semantic orientation of the extracted phrases. If the 
7
The Rise and Fall of the Roman Empire is the title of a book by Edward Gibbon. Love and Death is the title of 
a movie directed by Woody Allen. 


13
average was positive, then the review was classified as positive; otherwise, negative. The 
experimental results suggest that SO-PMI may be useful for classifying reviews, but the 
results do not reveal how well SO-PMI can classify individual words or phrases. 
Therefore it is worthwhile to experimentally evaluate the performance of SO-PMI on 
individual words, as we do in Section 5. 
The reviewing application of SO-A illustrates the value of an automated approach to 
determining semantic orientation. Although it might be feasible to manually create a 
lexicon of individual words labeled with semantic orientation, if an application requires 
the semantic orientation of two-word or three-word phrases, the number of terms 
involved grows beyond what can be handled by manual labeling. Turney [2002] observed 
that an adjective such as “unpredictable” may have a negative semantic orientation in an 
automobile review, in a phrase such as “unpredictable steering”, but it could have a 
positive (or neutral) orientation in a movie review, in a phrase such as “unpredictable 
plot”. SO-PMI can handle multiword phrases by simply searching for them using a 
quoted phrase query.
Pang 
et al.
[2002] applied classical text classification techniques to the task of 
classifying movie reviews as positive or negative. They evaluated three different 
supervised learning algorithms and eight different sets of features, yielding twenty-four 
different combinations. The best result was achieved using a Support Vector Machine 
(SVM) with features based on the presence or absence (rather than the frequency) of 
single words (rather than two-word phrases).
We expect that Pang 
et al.
’s algorithm will tend to be more accurate than Turney’s, 
since the former is supervised and the latter is unsupervised. On the other hand, we 
hypothesize that the supervised approach will require retraining for each new domain. 
For example, if a supervised algorithm is trained with movie reviews, it is likely to 
perform poorly when it is tested with automobile reviews. Perhaps it is possible to design 
a hybrid algorithm that achieves high accuracy without requiring retraining. 
Classifying reviews is related to measuring semantic orientation, since it is one of the 
possible applications for semantic orientation, but there are many other possible 
applications (see Section 2). Although it is interesting to evaluate a method for inferring 
semantic orientation, such as SO-PMI, in the context of an application, such as review 
classification, the diversity of potential applications makes it interesting to study semantic 
orientation in isolation, outside of any particular application. That is the approach 
adopted in this paper. 


14
4.3. Subjectivity Analysis 
Other related work is concerned with determining subjectivity [Hatzivassiloglou and 
Wiebe 2000; Wiebe 2000; Wiebe 
et al.
2001]. The task is to distinguish sentences (or 
paragraphs or documents or other suitable chunks of text) that present opinions and 
evaluations from sentences that objectively present factual information [Wiebe 2000]. 
Wiebe
 et al.
[2001] list a variety of potential applications for automated subjectivity 
tagging, such as recognizing “flames” [Spertus, 1997], classifying email, recognizing 
speaker role in radio broadcasts, and mining reviews. In several of these applications, the 
first step is to recognize that the text is subjective and then the natural second step is to 
determine the semantic orientation of the subjective text. For example, a flame detector 
cannot merely detect that a newsgroup message is subjective, it must further detect that 
the message has a negative semantic orientation; otherwise a message of praise could be 
classified as a flame. 
On the other hand, applications that involve semantic orientation are also likely to 
benefit from a prior step of subjectivity analysis. For example, a movie review typically 
contains a mixture of objective descriptions of scenes in the movie and subjective 
statements of the viewer’s reaction to the movie. In a positive movie review, it is 
common for the objective description to include words with a negative semantic 
orientation, although the subjective reaction may be quite positive [Turney 2002]. If the 
task is to classify the review as positive or negative, a two-step approach seems wise. The 
first step would be to filter out the objective sentences [Wiebe 2000; Wiebe 
et al.
2001] 
and the second step would be to determine the semantic orientation of the words and 
phrases in the remaining subjective sentences [Turney 2002]. 
5. EXPERIMENTS 
In Section 5.1, we discuss the lexicons and corpora that are used in the following 
experiments. Section 5.2 examines the baseline performance of SO-PMI, when it is 
configured as described in Section 3.1. Sections 5.3, 5.4, and 5.5 explore variations on 
the baseline SO-PMI system. The baseline performance of SO-LSA is evaluated in 
Section 5.6 and variations on the baseline SO-LSA system are considered in Section 5.7. 
The final experiments in Section 5.8 analyze the effect of the choice of the paradigm 
words, for both SO-PMI and SO-LSA. 


15
5.1. Lexicons and Corpora 
The following experiments use two different lexicons and three different corpora. The 
corpora are used for unsupervised learning and the lexicons are used to evaluate the 
results of the learning. The 

Yüklə 200 Kb.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   18




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin