Microsoft Word turney-littman-acm doc

Yüklə 200 Kb.

Pdf görüntüsü

səhifə	1/18
tarix	22.05.2023
ölçüsü	200 Kb.
	#119806

1 2 3 4 5 6 7 8 9 ... 18

Information Storage and Retrieval

Measuring Praise and Criticism: Inference of
Semantic Orientation from Association
PETER D. TURNEY
National Research Council Canada
and
MICHAEL L. LITTMAN
Rutgers University
________________________________________________________________________
The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates
praise (e.g., “honest”, “intrepid”) and negative semantic orientation indicates criticism (e.g., “disturbing”,
“superfluous”). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong).
An automated system for measuring semantic orientation would have application in text classification, text
filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems
(chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical
association with a set of positive and negative paradigm words. Two instances of this approach are evaluated,
based on two different statistical measures of word association: pointwise mutual information (PMI) and latent
semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs,
nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The
method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is
allowed to abstain from classifying mild words.
Categories and Subject Descriptors: H.3.1 [Information Storage and Retrieval]: Content Analysis and
Indexing — linguistic processing; H.3.3 [Information Storage and Retrieval]: Information Search and
Retrieval — information filtering, search process; I.2.7 [Artificial Intelligence]: Natural Language Processing
— text analysis
General Terms: Algorithms, Experimentation
Additional Key Words and Phrases: semantic orientation, semantic association, web mining, text mining, text
classification, unsupervised learning, mutual information, latent semantic analysis
________________________________________________________________________
Authors’ addresses: P.D. Turney, Institute for Information Technology, National Research Council Canada, M-
50 Montreal Road, Ottawa, Ontario, Canada, K1A 0R6, email: peter.turney@nrc.ca; M.L. Littman, Department
of Computer Science, Rutgers University, Piscataway, NJ 08854-8019, USA: email: mlittman@cs.rutgers.edu.

2
1. INTRODUCTION
In an early study of subjective meaning, Osgood et al. [1957] asked people to rate words
on a wide variety of scales. Each scale was defined by a bipolar pair of adjectives, such
as sweet/sour, rugged/delicate, and sacred/profane. The scales were divided into seven
intervals. Osgood et al. gathered data on the ratings of many words by a large number of
subjects and then analyzed the data using factor analysis. They discovered that three main
factors accounted for most of the variation in the data.
The intuitive meaning of each factor can be understood by looking for the bipolar
adjective pairs that are most highly correlated with each factor. The primary factor, which
accounted for much of the variation in the data, was highly correlated with good/bad,
beautiful/ugly, kind/cruel, and honest/dishonest. Osgood et al. called this the evaluative
factor. The second factor, called the potency factor, was highly correlated with
strong/weak, large/small, and heavy/light. The third factor, activity, was correlated with
active/passive, fast/slow, and hot/cold.
In this paper, we focus on the evaluative factor. Hatzivassiloglou and McKeown
[1997] call this factor the semantic orientation of a word. It is also known as valence in
the linguistics literature. A positive semantic orientation denotes a positive evaluation
(i.e., praise) and a negative semantic orientation denotes a negative evaluation (i.e.,
criticism). Semantic orientation has both direction (positive or negative) and intensity
(mild or strong); contrast okay/fabulous (mild/strong positive) and irksome/horrid
(mild/strong negative). We introduce a method for automatically inferring the direction
and intensity of the semantic orientation of a word from its statistical association with a
set of positive and negative paradigm words.
It is worth noting that there is a high level of agreement among human annotators on
the assignment of semantic orientation to words. For their experiments, Hatzivassiloglou
and McKeown [1997] created a testing set of 1,336 adjectives (657 positive and 679
negative terms). They labeled the terms themselves and then they validated their labels by
asking four people to independently label a random sample of 500 of the 1,336
adjectives. On average, the four people agreed that it was appropriate to assign a positive
or negative label to 89% of the 500 adjectives. In the cases where they agreed that it was
appropriate to assign a label, they assigned the same label as Hatzivassiloglou and
McKeown to 97% of the terms. The average agreement among the four people was also
97%. In our own study, in Section 5.8, the average agreement among the subjects was
98% and the average agreement between the subjects and our benchmark labels was 94%

3
(25 subjects, 28 words). This level of agreement compares favourably with validation
studies in similar tasks, such as word sense disambiguation.
This paper presents a general strategy for inferring semantic orientation from
semantic association. To provide the motivation for the work described here, Section 2
lists some potential applications of algorithms for determining semantic orientation, such
as new kinds of search services [Hearst 1992], filtering “flames” (abusive messages) for
newsgroups [Spertus, 1997], and tracking opinions in on-line discussions [Tong, 2001].
Section 3 gives two examples of our method for inferring semantic orientation from
association, using two different measures of word association, Pointwise Mutual
Information (PMI) [Church and Hanks 1989] and Latent Semantic Analysis (LSA)
[Landauer and Dumais 1997]. PMI and LSA are based on co-occurrence, the idea that “a
word is characterized by the company it keeps” [Firth 1957]. The hypothesis behind our
approach is that the semantic orientation of a word tends to correspond to the semantic
orientation of its neighbours.
Related work is examined in Section 4. Hatzivassiloglou and McKeown [1997] have
developed a supervised learning algorithm that infers semantic orientation from linguistic
constraints on the use of adjectives in conjunctions. The performance of their algorithm
was measured by the accuracy with which it classifies words. Another approach is to
evaluate an algorithm for learning semantic orientation in the context of a specific
application. Turney [2002] does this in the context of text classification, where the task is
to classify a review as positive (“thumbs up”) or negative (“thumbs down”). Pang et al.
[2002] have also addressed the task of review classification, but they used standard
machine learning text classification techniques.
Experimental results are presented in Section 5. The algorithms are evaluated using
3,596 words (1,614 positive and 1,982 negative) taken from the General Inquirer lexicon
[Stone et al. 1966]. These words include adjectives, adverbs, nouns, and verbs. An
accuracy of 82.8% is attained on the full test set, but the accuracy can rise above 95%
when the algorithm is allowed to abstain from classifying mild words.
The interpretation of the experimental results is given in Section 6. We discuss
limitations and future work in Section 7 and conclude in Section 8.
2. APPLICATIONS
The motivation of Hatzivassiloglou and McKeown [1997] was to use semantic
orientation as a component in a larger system, to automatically identify antonyms and
distinguish near synonyms. Both synonyms and antonyms typically have strong semantic

4
associations, but synonyms generally have the same semantic orientation, whereas
antonyms have opposite orientations.
Semantic orientation may also be used to classify reviews (e.g., movie reviews or
automobile reviews) as positive or negative [Turney 2002]. It is possible to classify a
review based on the average semantic orientation of phrases in the review that contain
adjectives and adverbs. We expect that there will be value in combining semantic
orientation [Turney 2002] with more traditional text classification methods for review
classification [Pang et al. 2002].
To illustrate review classification, Table 1 shows the average semantic orientation of
sentences selected from reviews of banks, from the Epinions site.
1
In this table, we used
SO-PMI (see Section 3.1) to calculate the semantic orientation of each individual word
and then averaged the semantic orientation of the words in each sentence. Five of these
six randomly selected sentences are classified correctly.
Table 1. The average semantic orientation of some sample sentences.
Positive Reviews
Average SO
1.
I love the local branch, however communication may break down
if they have to go through head office.
0.1414
2.
Bank of America gets my business because of its extensive branch
and ATM network.
0.1226
3.
This bank has exceeded my expectations for the last ten years.
0.1690
Negative Reviews
Average SO
1.
Do not bank here, their website is even worse than their actual
locations.
-0.0766
2.
Use Bank of America only if you like the feeling of a stranger’s

Yüklə 200 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 ... 18