5
One application of review classification is to provide summary statistics for search
engines. Given the query “Paris travel review”, a search engine could report, “There are
5,000 hits, of which 80% are positive and 20% are negative.” The search results could
also be sorted by
average semantic orientation, so that the user could easily sample the
most extreme reviews. Alternatively, the user could include the desired semantic
orientation in the query, “Paris travel review orientation: positive” [Hearst 1992].
Preliminary experiments indicate that semantic orientation is also useful for
summarization of reviews. A positive review could be summarized
by picking out the
sentence with the highest positive semantic orientation and a negative review could be
summarized by extracting the sentence with the lowest negative semantic orientation.
Another potential application is filtering “flames” for newsgroups [Spertus 1997].
There could be a threshold, such that a newsgroup message is held for verification by the
human moderator when the semantic orientation of any word in the message drops below
the threshold.
Tong [2001] presents a system for generating
sentiment timelines. This system tracks
online discussions about movies and displays a plot of the number of positive sentiment
and negative sentiment messages over time. Messages are classified by looking for
specific phrases that indicate the sentiment of the author towards the movie, using a
hand-built lexicon of phrases with associated sentiment labels. There are many potential
uses for sentiment timelines: Advertisers could
track advertising campaigns, politicians
could track public opinion, reporters could track public response to current events, and
stock traders could track financial opinions. However, with Tong’s approach, it would be
necessary to provide a new lexicon for each new domain. Tong’s [2001]
system could
benefit from the use of an automated method for determining semantic orientation,
instead of (or in addition to) a hand-built lexicon.
Semantic orientation could also be used in an automated chat system (a
chatbot), to
help decide whether a positive or negative response is most appropriate. Similarly,
characters in software games would appear more realistic if they responded to the
semantic orientation of words that are typed or spoken by the game player.
Another application is the analysis of survey responses to open ended questions.
Commercial tools for this task include TextSmart
2
(by SPSS) and Verbatim Blaster
3
(by
StatPac). These tools can be used to plot word frequencies
or cluster responses into
categories, but they do not currently analyze semantic orientation.
2
See http://www.spss.com/textsmart/.
3
See http://www.statpac.com/content-analysis.htm.
6
3. SEMANTIC ORIENTATION FROM ASSOCIATION
The general strategy in this paper is to infer semantic orientation from semantic
association. The semantic orientation of a given word is calculated from the strength of
its association with a set of positive words, minus the strength of its association with a set
of negative words:
(1)
(2)
(3)
(4)
We assume that A(
word
1
,
word
2
) maps to a real number. When A(
word
1
,
word
2
) is
positive, the words tend to be associated with each other. Larger values correspond to
stronger associations. When A(
word
1
,
word
2
)
is negative, the presence of one word
makes it likely that the other is absent.
A word,
word, is classified as having a positive semantic orientation when
SO-A(
word) is positive and a negative orientation when SO-A(
word) is negative. The
magnitude (absolute value) of SO-A(
word) can be considered the strength of the semantic
orientation.
In the following experiments, seven positive words and
seven negative words are
used as paradigms of positive and negative semantic orientation:
(5)
(6)
These fourteen words were chosen for their lack of sensitivity to context. For example, a
word such as “excellent” is positive in almost all contexts. The sets also consist of
opposing pairs (good/bad, nice/nasty, excellent/poor, etc.). We experiment with randomly
selected words in Section 5.8.
It could be argued that this is a supervised learning algorithm with fourteen labeled
training examples and millions or billions of unlabeled training examples, but it seems
more appropriate to say that the paradigm words are
defining semantic orientation, rather
than
training the algorithm. Therefore we prefer
to describe our approach as
unsupervised learning. However, this point does not affect our conclusions.
This general strategy is called SO-A (Semantic Orientation from Association).
Selecting particular measures of word association results in particular instances of the
Dostları ilə paylaş: