Microsoft Word turney-littman-acm doc



Yüklə 200 Kb.
Pdf görüntüsü
səhifə1/18
tarix22.05.2023
ölçüsü200 Kb.
#119806
  1   2   3   4   5   6   7   8   9   ...   18


Measuring Praise and Criticism: Inference of 
Semantic Orientation from Association 
PETER D. TURNEY 
National Research Council Canada 
and 
MICHAEL L. LITTMAN 
Rutgers University 
________________________________________________________________________ 
The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates 
praise (e.g., “honest”, “intrepid”) and negative semantic orientation indicates criticism (e.g., “disturbing”, 
“superfluous”). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). 
An automated system for measuring semantic orientation would have application in text classification, text 
filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems 
(chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical 
association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, 
based on two different statistical measures of word association: pointwise mutual information (PMI) and latent 
semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, 
nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The 
method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is 
allowed to abstain from classifying mild words. 
Categories and Subject Descriptors: H.3.1 [Information Storage and Retrieval]: Content Analysis and 
Indexing — linguistic processing; H.3.3 [Information Storage and Retrieval]: Information Search and 
Retrieval — information filtering, search process; I.2.7 [Artificial Intelligence]: Natural Language Processing 
— text analysis 
General Terms: Algorithms, Experimentation 
Additional Key Words and Phrases: semantic orientation, semantic association, web mining, text mining, text 
classification, unsupervised learning, mutual information, latent semantic analysis 
________________________________________________________________________ 
Authors’ addresses: P.D. Turney, Institute for Information Technology, National Research Council Canada, M-
50 Montreal Road, Ottawa, Ontario, Canada, K1A 0R6, email: peter.turney@nrc.ca; M.L. Littman, Department 
of Computer Science, Rutgers University, Piscataway, NJ 08854-8019, USA: email: mlittman@cs.rutgers.edu. 


2
1. INTRODUCTION 
In an early study of subjective meaning, Osgood et al. [1957] asked people to rate words 
on a wide variety of scales. Each scale was defined by a bipolar pair of adjectives, such 
as sweet/sour, rugged/delicate, and sacred/profane. The scales were divided into seven 
intervals. Osgood et al. gathered data on the ratings of many words by a large number of 
subjects and then analyzed the data using factor analysis. They discovered that three main 
factors accounted for most of the variation in the data.
The intuitive meaning of each factor can be understood by looking for the bipolar 
adjective pairs that are most highly correlated with each factor. The primary factor, which 
accounted for much of the variation in the data, was highly correlated with good/bad, 
beautiful/ugly, kind/cruel, and honest/dishonest. Osgood et al. called this the evaluative 
factor. The second factor, called the potency factor, was highly correlated with 
strong/weak, large/small, and heavy/light. The third factor, activity, was correlated with 
active/passive, fast/slow, and hot/cold. 
In this paper, we focus on the evaluative factor. Hatzivassiloglou and McKeown 
[1997] call this factor the semantic orientation of a word. It is also known as valence in 
the linguistics literature. A positive semantic orientation denotes a positive evaluation 
(i.e., praise) and a negative semantic orientation denotes a negative evaluation (i.e., 
criticism). Semantic orientation has both direction (positive or negative) and intensity 
(mild or strong); contrast okay/fabulous (mild/strong positive) and irksome/horrid 
(mild/strong negative). We introduce a method for automatically inferring the direction 
and intensity of the semantic orientation of a word from its statistical association with a 
set of positive and negative paradigm words. 
It is worth noting that there is a high level of agreement among human annotators on 
the assignment of semantic orientation to words. For their experiments, Hatzivassiloglou 
and McKeown [1997] created a testing set of 1,336 adjectives (657 positive and 679 
negative terms). They labeled the terms themselves and then they validated their labels by 
asking four people to independently label a random sample of 500 of the 1,336 
adjectives. On average, the four people agreed that it was appropriate to assign a positive 
or negative label to 89% of the 500 adjectives. In the cases where they agreed that it was 
appropriate to assign a label, they assigned the same label as Hatzivassiloglou and 
McKeown to 97% of the terms. The average agreement among the four people was also 
97%. In our own study, in Section 5.8, the average agreement among the subjects was 
98% and the average agreement between the subjects and our benchmark labels was 94% 


3
(25 subjects, 28 words). This level of agreement compares favourably with validation 
studies in similar tasks, such as word sense disambiguation. 
This paper presents a general strategy for inferring semantic orientation from 
semantic association. To provide the motivation for the work described here, Section 2 
lists some potential applications of algorithms for determining semantic orientation, such 
as new kinds of search services [Hearst 1992], filtering “flames” (abusive messages) for 
newsgroups [Spertus, 1997], and tracking opinions in on-line discussions [Tong, 2001]. 
Section 3 gives two examples of our method for inferring semantic orientation from 
association, using two different measures of word association, Pointwise Mutual 
Information (PMI) [Church and Hanks 1989] and Latent Semantic Analysis (LSA) 
[Landauer and Dumais 1997]. PMI and LSA are based on co-occurrence, the idea that “a 
word is characterized by the company it keeps” [Firth 1957]. The hypothesis behind our 
approach is that the semantic orientation of a word tends to correspond to the semantic 
orientation of its neighbours.
Related work is examined in Section 4. Hatzivassiloglou and McKeown [1997] have 
developed a supervised learning algorithm that infers semantic orientation from linguistic 
constraints on the use of adjectives in conjunctions. The performance of their algorithm 
was measured by the accuracy with which it classifies words. Another approach is to 
evaluate an algorithm for learning semantic orientation in the context of a specific 
application. Turney [2002] does this in the context of text classification, where the task is 
to classify a review as positive (“thumbs up”) or negative (“thumbs down”). Pang et al. 
[2002] have also addressed the task of review classification, but they used standard 
machine learning text classification techniques. 
Experimental results are presented in Section 5. The algorithms are evaluated using 
3,596 words (1,614 positive and 1,982 negative) taken from the General Inquirer lexicon 
[Stone et al. 1966]. These words include adjectives, adverbs, nouns, and verbs. An 
accuracy of 82.8% is attained on the full test set, but the accuracy can rise above 95% 
when the algorithm is allowed to abstain from classifying mild words. 
The interpretation of the experimental results is given in Section 6. We discuss 
limitations and future work in Section 7 and conclude in Section 8. 
2. APPLICATIONS 
The motivation of Hatzivassiloglou and McKeown [1997] was to use semantic 
orientation as a component in a larger system, to automatically identify antonyms and 
distinguish near synonyms. Both synonyms and antonyms typically have strong semantic 


4
associations, but synonyms generally have the same semantic orientation, whereas 
antonyms have opposite orientations. 
Semantic orientation may also be used to classify reviews (e.g., movie reviews or 
automobile reviews) as positive or negative [Turney 2002]. It is possible to classify a 
review based on the average semantic orientation of phrases in the review that contain 
adjectives and adverbs. We expect that there will be value in combining semantic 
orientation [Turney 2002] with more traditional text classification methods for review 
classification [Pang et al. 2002]. 
To illustrate review classification, Table 1 shows the average semantic orientation of 
sentences selected from reviews of banks, from the Epinions site.
1
In this table, we used 
SO-PMI (see Section 3.1) to calculate the semantic orientation of each individual word 
and then averaged the semantic orientation of the words in each sentence. Five of these 
six randomly selected sentences are classified correctly. 
Table 1. The average semantic orientation of some sample sentences. 
Positive Reviews 
Average SO 
1. 
I love the local branch, however communication may break down 
if they have to go through head office. 
0.1414 
2. 
Bank of America gets my business because of its extensive branch 
and ATM network. 
0.1226 
3. 
This bank has exceeded my expectations for the last ten years. 
0.1690 
Negative Reviews 
Average SO 
1. 
Do not bank here, their website is even worse than their actual 
locations. 
-0.0766 
2. 
Use Bank of America only if you like the feeling of a stranger’s 

Yüklə 200 Kb.

Dostları ilə paylaş:
  1   2   3   4   5   6   7   8   9   ...   18




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin