9
ratios [Dunning 1993] and the Z-score [Smadja 1993]) appear to perform worse than PMI
when used with search engine hit counts.
However, if we do not restrict our attention to measures of word association that are
compatible with search engine hit counts, there are many possibilities. In the next
subsection, we look at one of them, Latent Semantic Analysis.
3.2. Semantic Orientation from LSA
SO-LSA applies Latent Semantic Analysis (LSA) to calculate the strength of the
semantic association between words [Landauer and Dumais 1997]. LSA uses the Singular
Value Decomposition (SVD) to analyze the statistical relationships among words in a
corpus.
The first step is to use the text to construct a matrix
X, in which the row vectors
represent words and the column vectors represent chunks of text (e.g., sentences,
paragraphs, documents). Each cell represents the
weight of the corresponding word in the
corresponding chunk of text. The weight is typically the tf-idf score (Term Frequency
times Inverse Document Frequency) for the word in the chunk. (tf-idf is a standard tool in
information retrieval [van Rijsbergen 1979].)
5
The next step is to apply singular value decomposition [Golub and Van Loan 1996] to
Dostları ilə paylaş: