boston.lti.cs.cmu.edu

Boston.lti.cs.cmu.edu

Opinion Detection by Transfer Learning
5000 Forbes Ave, Pittsburgh, PA, USA, 15213 who wrote it”. Given the great variety and complexity of Opinion detection is the main task of TREC 2006 Blog track, human language, opinion detection is a challenging job.
which identifies opinions from text documents in the TRECblog corpus. Given that it is the first year of the task, there is In year 2006, Text REtrieval Conference (TREC) started a no available training data provided. Using knowledge about new track to study research topics in the blog domain, and how people give opinions on other domains, for example, opinion detection in blogs is the main task [7]. Since it is movie review, product review and book review, is the best the first year and blog data is pretty new in the research available training data for opinion detection in blog domain.
community, there is a lack of training data. Given the lack This work describes how to apply transfer learning in opin- of training data from blog corpus, simple supervised learning ion detection. A Bayesian logistic regression framework is is not possible. How to transfer knowledge about opinions used and knowledge from training data in other domains is from other domains, which have labelled training data, is captured by a non-diagonal prior covariance matrix. The experimental results show that the approach is effective andachieve an improvement of 32% from baseline.
This paper gives a try to use techniques in transfer learning[2][8][9][10][11][13] to incorporate common features for opin- ion detection across different domains to solve the problemof no training data. Bayesian Logistic Regression is the main H.4 [Information Systems Applications]: Miscellaneous framework used. The common knowledge is formed into anon-diagonal covariance matrix for the prior of regression coefficients. The learned prior from movie and product re- views is used to estimate whether a sentence is an opinionor not in the blog domain. Moreover, different from classic text classification task, opinion detection has its own effec-tive features in the classification process. This paper also describes ”Target-Opinion” word pairs and word synonymsand their effects on opinion detection.
1. INTRODUCTIONOpinion detection is an emerging topic that attracts more The remainder of this paper is organized into several sec- and more research interests from researchers in data mining tions. Section 2 gives a brief literature review of transfer and natural language processing [1][3][5][12][14]. Given a learning, opinion detection and explains the existing work document, opinion detection task identifies and extracts the done during TREC 2006. Section 3 details the transfer opinionated expressions for a certain topic. Some opinions learning algorithm used in the opinion detection process.
expressed in a general way as in ”I really like this work”, Section 4 explains feature selection for opinion detection.
hence words with sentiment polarity are playing an impor- Section 5 describes the datasets used in this research. Sec- tant role to recognizing the presence of an opinion. On the tion 6 elaborates the evaluation and experimental results, other hand, there are many opinions have its own way to ex- also gives an analysis to the results. Section 7 concludes the press, for example, ”Watching the film is like reading a times paper. Appedix A lists the query topics evaluated in this portrait of grief that keeps shifting focus to the journalist 2. RELATED WORK2.1 Opinion DetectionResearchers in Natural Language Processing (NLP) commu-nity are the pioneers for the opinion detection task. Turney[14] groups online words whose point mutual information isclose to two words - ”excellent” and ”poor”, and then usethem to detect opinions and sentiment polarity. Riloff andWiebe [5] use a high-precision classifier to get high quality opinion and non-opinion sentences, and then extract surface sentence. Therefore, sentence is selected as the basic unit text patterns from those sentences to find more opinions and non-opinions and repeat this process to bootstrap. Pang etal. [1] treated opinion and sentiment detection and as a text The remaining task is to identify which sentences contain classification problem and use classical classification meth- opinion, which are not. It can be considered as a binary clas- ods, like Naive Bayes, Maximum Entropy, Support Vector sification problem. Baysian Logistic Regress is the frame- Machines, with word unigram to predict them. Pang and work used here. Each sentence is represented as X = [x1, x2, ., xn], Lee [3] in their another work also tried to use Minicuts to where n is the total number of word features xi. The entire cluster sentences based on their subjectivity and sentiment dataset is represented X = {X(1), X(2), ., X(m)}, where m orientation. Researchers from data mining community also is the total number of sentences. A class label for a sen- study the problem of opinion mining. Morinaga et al. [12] tence is either opinion or non-opinion, and is represented used word polarity, syntactic pattern matching rules to ex- by Y = {0, 1}.
tract opinions. They also use principal component analysisto create correspondence between the product names and Logistic regression assumes sigmoid-like data distribution keywords with the distance on a map showing the closeness.
and predicts the class label according to the following for-mula: We participated in TREC-2006 Blog track evaluation. The main task is opinion detection in blog domain. The system P (Y = 1|X = x, θ) = [6] is mainly divided into two parts: passage retrieval andopinion classification. During passage retrieval, the topicsprovided by NIST are parsed and query expansion is done where θ is the regression coefficient. It usually is learned by before sending the topics as queries to the Lemur search en- coordinate descent, while a global optimum is guaranteed to gine1. Documents in the corpus are segmented into passages around 100 words and are the retrieval units for the searchengine. The top 5,000 passages returned by Lemur are then However, logistic regression, like many other classification sent into a binary text classification program to classified /regression algorithms, suffers from overfitting. Usually when into opinions and non-opinions based the average over their large regression coefficients are observed, prediction accu- sentence-level subjectivity score. The performance of the racy is very sensitive to test data, and overfitting occurs. To system is among top five participated groups.
avoid this problem, usually a multivariate Gaussian prior isadded on θ. For simplicity, zero mean and equal varianceare assumed. Hence the prior is N (0, σ2I) and the objective Transfer learning is to learn from other related tasks andapply the learned model into the current task. The most l(Y = 1|X; λ) = general form of transfer learning is to learning the similar if (xi)−log(1+exp(f (xi)))]− 2 tasks from one domain to another domain so that transfer the ”knowledge” from one to another. In the early research where f (x) = θT x. and Maximum A Posteriori (MAP) es- of transfer learning, Baxter [2] and Thrun [13] both used hi- erarchical Bayesian learning methods to tackle this problem.
In recent years, Lawrence and Platt [9], Yu et al. [8] also used hierarchical Bayesian models to learn hyper-parameters where θ∗ is the maximum likelihood estimation for θ.
Ando and Zhang [10] proposed a framework for Gaussian The above prior is the most common prior which used in logistic regression with transfer learning for the task of clas- many research problems. It assumes equal variances for all sification and also provided a theoretical prove for transfer the features, which is not valid in the real world settings.
learning in this setting. They learned from multiple tasks to Hence, a general prior with non-diagonal covariance N (0, Σ) form a good classifier and apply it onto other similar tasks.
is used in this research. The MAP estimation becomes: Raina et al. [11] continued this approach and built informa- tive priors for gaussian logistic regression. These informative priors actually corresponds to the hyper-parameter in other approaches. We follow closely with Raina et al.’s approachand adapt it into the opinion detection task.
To apply the above formula, it is required to get the value ofcov(θi, θj) for every pair of regression coefficients (θi, θj). By the definition of covariance, it is the difference of expected After retrieving 5000 paragraphs for each topic, sentence joint probability of E[θiθj] and the product of individual ex- segmentation is done for each paragraph. Though in TREC pected probability E[θi] and E[θj]. The following equation assessment, document is the evaluation unit, sentence, is actually a more nature unit for the task of opinion detec- cov(θi, θj) = E[θiθj] − E[θi]E[θj] tion because different opinions could be present in the samedocument but much less possible to be present in the same Given that the prior’s mean is 0, both of individual expected values of θi equal to 0, i.e.,E[θi] = E[θj] = 0. Therefore, the covariance of any two regression coefficients becomes: individual covariance for each word pair is turned into theproblem of learning the correspondence between an under- cov(θi, θj) = E[θiθj] line common feature, which will be shared by many word which is just the expected joint probability of those two pairs, and a word pair itself. Mathematically, if the indirect common features are defined as a feature vector Fij, and thesmall fraction of covariances are defined as C, in which all 3.1 MCMC for Covariance of Pair-wised Co- the values are calculated by the method given in section 3.1and are represented by c ij , the objective function to learn the correspondence ψ is given in the following least squared The covariance for pair-wised regression coefficients can be obtained by Markov Chain Monte Carlo (MCMC) method.
Instead of real covariance, which is not going to be achieved but can be closely estimated by the sample covariance. MCMC suggests to sample several small vocabularies with the two where K is the set of words whose covariances are calculated words corresponding to θi and θj. Each small vocabulary is used as training data to train an ordinary logistic regres-sion model whose objective function is defined in equation 2.
By learning the correspondence of the word feature and indi- The sample covariance is obtained by going through words rect common features, i.e., by learning ψ, the entire covari- in each training set and vocabulary.
ance matrix C can be estimated by computing its (i, j)thelement as : θ(v,t)θ(v,t) A valid covariance matrix needs to be positive semi-definite where V is the number of vocabularies and T is the number (PSD), which is a Hermitian matrix with all of its eigenval- of training sets from each vocabulary.
ues nonnegative. In other words, it needs to be a square,self-adjoint matrix with nonnegative eigenvalues. Clearly, Hence the covariance is due to both randomness of vocabu- the individual pair-wise covariances obtained in section 3.1 laries and training sets. However, only the covariance due to are not going to be such a matrix automatically. And the vocabulary change is desired in our case. Hence a correction covariance matrix obtained by equation 10 is not PSD ei- step is performed through minus a bootstrap estimation of ther. Hence, a projection from the original covariances to a the covariance due to randomness of training set change.
PSD cone is necessary to make the matrix usable. There-fore, the covariance matrix C should be as close to a PSD cov(θi, θj) = sample covariance(θi, θj) - matrix Σ as possible, which is represented in the followingmean squared error objective function: θ(v))(θ(v,t) − ¯ θ(v) are sample mean of regression coefficients for each vocabulary across different training sets.
This can be related to the indirect common features by sub-stituting cij with ψT Fij, and the objective finction for get- By doing the above calculation, the covariances of each pair of regression coefficient is able to be obtained. However, given that the number of regression coefficients is corre- sponding to the number of word features, the total amountof computation is huge and not feasible. Therefore, a smarter note that different from in equation 10, where ψ is the target way of calculating just a small amount of pair-wise covari- to be learned, ψ is a fixed values vector now.
ances is necessary.Moreover, individual pair-wise covariancescan only be used to estimate relationship between two words, As we can see so far, for each concern of how to learn a however, what is needed is to estimate relationship among good covariance matrix, an objective function is found. To all the words. In another word, a covariance matrix is the solve the first and second in sequence is less effective and less efficient than solve them as a combined objective functionsince at the first step, the learned covariance matrix C can be highly indefinite, and hence at the second step, many en-tries need to be adjusted to satisfying the PSD constraints, As pointed out in the previous section, it is extremely inef- and the knowledge learned in the first step is wasted and has ficient to calculate every pair of individual covariances for to learned again. By combining two objective functions into all word features. Instead, learning indirect common fea- one, while learning ψ, the consideration of PSD constraints tures and representing the word features as those features is also effective. Therefore, the overall objective function be- will dramatically reduce the amount of computations. In comes a joint optimization problem and can be represented this way, only a small fraction of word pairs need to be calculated their pair-wise covariances. And the rest of wordpairs’ covariances can be estimated by a transformation from (cij −ψT Fij)2 +(1 −λ) their indirect features. Therefore, the problem of learning where λ is the trade-off coefficient between the two sub ob- which carries the common knowledge embedded in differ- jectives. As λ goes to 0, only the PSD constraints are taken ent opinion related corpora, for logistic regression coeffi- care of, and as λ goes to 1, only the word pair relationship cients. The prior is represented as a Gaussian distribution constraints are taken care of. We set λ = 0.6 in this research, with non-diagonal covariance, which can be used to repre- which is a good trade-off coefficient learned empirically.
sent word to word relationship which is absent in the abovetwo approaches, which treat each word features are identi- The joint optimization problem in equation 13 can be solved cally independent distributed (i.i.d.). The third approach in an minimization-minimization procedure by fixing one is described in section 3, which forms the word to word re- argument and minimizing on another. In our case, alter- lationship as a function of indirect common features across natively, ψ is minimized over when Σ is fixed, and Σ is different opinion related corpora. What are the good indi- minimized over when ψ is fixed. When minimizing over rect features for opinion detection is investigated.
ψ, quadratic programming (QP) is sufficient. There aremany QP sovlers2 available and can be easiliy obtained.
One prominent phenomenon in opinion and also one of the When minimizing over Σ, this is a special semi-definite prob- difficult part of opinion detection is that people are not al- lem (SDP), and can be easily done by performing eigen- ways using ”blah blah is good”, ”awesome blah blah!” to decomposition and keeping the nonnegative eigenvalues, which express opinions, instead, different opinion targets relate to can be done in any standard SDP solvers.
their own customary opinion expressions. For example, weusually say ”A person is knowledgeable” and ”A computer Since equation 13 is convex, which can be proved, there is processor is fast”, not ”A person is fast” and ”A computer a global minimum existing. Therefore, the minimization- processor is knowledgable”. Target-specific opinions are not minimization procedure repeats the two minimization steps to be well-identified with simple word polarity test either.
and continues until a guaranteed convergence.
For example, ”A computer processor is running like a horse”.
There is no positive or negative adjectives available in the sentence and polarity test will say this is not an opinion even Given that there is no training data available in the target domain, transfer learning is the only choice besides manuallytagging a corpus. The most naive way of transfer learning To model the correspondence of a target and its custom- will be training a model on some external domain’s data, ary opinion expression, a feature, which is a pair of (tar- which is handy, and using the external domain’s vocabu- get,opinion), is designed to explicitly formulate this corre- lary, creating unigram or bi-gram features, testing on the spondence and kept in the prior covariance matrix. To do test corpus, with the hope that some unigram and bi-gram so, in the training corpus, extract ”subject and object” pair, features are also present in the test corpus. Since different ”subject and predicate” pair, ”modifier and subject” pair features play different roles in different domains, for exam- from a positive sentence (opinion). In the testing corpus, ple, ”movie” is a key word feature and appearing in many if one such pair is observed, the corresponding feature value opinion sentences within the movie review domain, while it is definitely not a key feature for opinion detection in productreview, since it has low probability that a sentence talking Another important feature is word synonyms. This is be- about movie is an opinion about some product, for example, cause that if only ”This movie is good” is observed in the Canon camera. There is obvious bias between the training training corpus, and has a sentence says ”The film is really set and test set and hence it will not result a very good good” in the testing corpus, a good opinion detection al- opinion detection rate. However, this is the baseline trans- gorithm should be able to detect the second sentence as fer learning used in our experiments since it is the simplest an opinion, however, without synonym information, it is not possible to be done. In the setting of Gaussian logis-tic regression, each entry in the prior covariance matrix can Another straightforward way of doing transfer learning is be represented as a linear interpolation of several indirect to also using word features from other domains, but, in- features, similar to ”target-opinoin” pair described above, stead using word features from just a single domain, us- whether two words are within the same Wordnet[4] synset ing common word features appearing in multiple external is also treated as a feature to reflect in the covariance ma- domains. The purpose is to find word features which is trix. More specificly, if two words appearing in the same common enough to appear in every opinion related corpus.
Wordnet synset of the first sense of either noun, verb or For example, in both movie reviews and product reviews, adjective, their corresponding feature values is checked to 1.
”good”, ”I like” will indicate a positive opinion, and ”dis-appointed”, ”hate” will indicate a negative opinion. If only By considering the two above word pair features, the fea- these common ”opinion”-related features are extracted and ture vector F discussed and appeared in equation 13 can be kept in the vocabulary, the severe bias existing in the above approach is resolved. This is the approach that we used in Fij = [1, COij, Sij, T Oij] our submission to TREC 2006 Blog track [6] and will be oneof the experiment option as well in later section.
where COij is the log cooccurence of two word i and j within The approach used in this paper is to get a common prior, sentences. Sij is 1 if two words i and j are in the sameWordnet synset), 0 otherwise. T O (http://control.ee.ethz.ch/ joloef/yalmip.php) package.
TREC 2006 Blog corpus is used in this research. It contains Baseline Logistic RegressionLogistic Regression w Feature Selection 3,201,002 blog articles (TREC reports 3,215,171), is posted during the period of December 2005 to February 2006. The blog posts and the comments are from Technorati, Bloglines,Blogpulse and other web hosts.
Passage retrieval is performed to retrieve top 5,000 (or less than 5,000 if there is no more than 5,000 passages in the cor-pus for a particular query) passages for each of the 50 TREC Blog Opinion Retrieval topics. The search engine used in this research is Lemur, which retrieves 132,399 passages intotal for 50 topics and 2,648 passages per topic in average.
The retrieved passages are then separated into sentences andeach sentence is classified as opinion or non-opinion sentenceby Gaussian logistic regression with non-diagonal prior co- Figure 1: Comparison of Different Settings of Logis- variance as we reported in the preview sections.
There are two external datasets used in this research astraining data. The first is a movie review dataset3, pre- Table 1: Mean Average Precision of Transfer Learn- pared by Pang and Lee from Cornell University. There are 10,000 movie review sentences in this dataset in total, and 5,000 of them are positive examples, 5,000 are non-opinions.
All the movie reviews are extracted from the Internet Movie The other external dataset is a product review dataset5, pre-pared by Hu and Liu from University of Illinois at Chicago.
gle one since based on what usually happens in non-transfer There are more than 4,000 product review sentences, among learning that the more the training data, the better the pre- them 2,034 are opinions, 2,173 are non-opinions. Those diction performance. Another example, since we do not di- product reviews are extracted from customer comments about rectly use word features in calculating the non-diagonal prior 2 brand digital cameras (Canon G3, Nikon coolpix 4300), covariance, what will be the good indirect features for cal- 1 brand jukebox (Creative Labs Nomad Jukebox Zen Xtra culating it? Is Wordnet synset feature is better than target- 40GB), 1 brand cellphone (Nokia 6610) and 1 brand DVD opinion feature (see section 4)? The experiments conducted player (Apex AD2600 Progressive-scan DVD player). As in this research will answer them in the following sections.
we can see here, they are mostly reviews about electronicproducts.
The evaluation metric used in the experiment are precisionat different recall level and mean average precision (MAP).
The answers are provided by TREC qrel, which gives the document numbers of those documents containing an opin- The main purpose of the experiments is to test whether ion and is related to the Blog opinion retrieval topics. Note the transfer learning approach used in this research is more that our system is developed for opinion detection at sen- effective on opinion detection than two other transfer learn- tence level, and an averaged score of all the sentences in ing methods. Given that we have no training data from the a retrieved passages, which is a part of a document, is re- blog corpus, it is not possible to have a ”real” baseline with turned as the final score. Therefore, to use TREC qrel to training on the blog dataset and test on the same dataset.
evaluate, we simply extract the unique document numbers Therefore, the baseline system used in the experiments is that appearing in our returned passages, which is ranked by a Gaussian logistic regression model trained on an exter- nal dataset and tested directly on the target dataset - blogdataset with zero mean, equal variance prior for regulariza- 6.1 Effects of Using Non-diagonal Covariance tion. This method is described in more details in section4.
This experiment compares the following three settings : Another purpose is to explore the effectiveness of differentsettings for using the current approach. For example, we * Baseline: Using movie reviews to train the Gaussian logis- know that transfer learning is helpful in the case of no train tic regression model with zero mean and equal variance. Vo- data in a certain domain, but how to choose a good exter- cabulary is unigram and bigrams from movie reviews. The nal dataset as the auxiliary domain? Do multiple external model is directly tested on blog review data without any datasets improve the prediction accuracy more than a sin- 3http://www.cs.cornell.edu/People/pabo/movie-review- * Simple feature selection: Using movie reviews and product reviews to train the Gaussian logistic regression model with zero mean and equal variance. Vocabulary is the common 5http://www.cs.uic.edu/˜liub/FBS/FBS.html unigram and bigrams from both domains. The model is test Transfer Learning by Using Wordnet Synset Transfer Learning by Using Target−Opinion Pair Transfer Learning by Using Product Review Table 2: Mean Average Precision of Transfer Learn-ing Using Different Features [11][10], by just using that, a 20%-40% improvement on text classification could be observed. Due to that opinion detec- tion is using text classification techniques, so that it should be able to observe the similiar effects. However, opinion de- tection is not purely text classification, it is not topic-wised classification, but a binary classification of opinions or non-opinions. Therefore, Wordnet synset feature may not effec-tive to our task. In section 4, we introduce a specific feature specially designed for the task of opinion detection, whichis ”Target-Opinion” word pairs. Each opinion is about a * The proposed approach: Using movie reviews to calculate certain target, and this target usually has its own custom- prior covariance, train the logistic regression model with the ary way to expression the opinion about it. There is a clear informative prior. Vocabulary is from the blog corpus and is relationship between the target and the opinion about it. Is different for each retrieval topic based on the unigram and bigrams in the 5,000 retrieved passages. The model is teston blog review data.
Figure 2 shows the results of an experiment which comparesthe three cases of using just Wordnet synset to create infor- Figure 1 shows the precision at each recall level for the tested mative prior, using just target-opinion pairs to create infor- three approaches. As we can see here, the approach used in mative prior and using both of them. It can be seen that ap- this research gives the best precision at all the 11-point recall plying the proposed approach with ”Target-opinion” pair as levels. The simple feature selection method also performs the single feature is doing better than using Wordnet synset better than the baseline system, which indicates that by alone. When both features are used to construct the infor- removing the bias introduced by a single domain of data, mative prior covariance, MAP reaches the best performance the prediction accuracy of transfer learning is improved. It which the current approach in this research can achieve. Ta- is also obvious that the current approach is a more advanced ble 6.2 shows that using target-opinion pair alone, there is way of learning task-related common knowledge than just a 27% improvement as compared to the baseline and 10% more improvement as compared to using Wordnet synsetalone. It proves that our hypothesis is correct. ”Target- Table 6.1 shows the non-interpolated mean average preci- opinion” feature is more suitable for the task of opinion sion of the 3 approaches. Based on previous research [11] detection. Wordnet synset feature also contributes to the reported, the proposed approach could achieve an improve- improvement of overall performance, but sometimes, for ex- ment of 20%-40% for text classification task. As for our task, ample at recall level 0.3 in Figure 2, there is no improvement we see an improvement of 32% on non-interpolated mean av- from baseline to using Wordnet synset alone. It is not saying erage precision from the baseline to the current approach.
that this is a bad feature, but give us a hint that sometimes, Both experiments in opinion detection and text classification Wordnet synset will not always be effective for the task of show that construct non-diagonal prior covariance matrix to incorporate the external knowledge is a good way to boostthe performance of gaussian logistic regression for transfer 6.3 Effects on External Dataset Selection In our TREC-2006 submission, we selected common un-igram and bi-gram features from both movie review and product review domains, with the belief that the intersec- Target-opinion word pairs and Wordnet synonyms are two tion part could capture the common features across different main features used in this project. It is reported that Word- domains as long as the task is the same, in this case, opin- net synset feature is very effective for text classification task ion detection. It is natural to extend this thought to apply prior covariance matrix is constructed by incorporating ex- blog topic category distribution
ternal knowledge of ”Target-Opinion” word pairs and Word-net synset information. The results shown in the experi- ments prove that this is an effective approach with the fact that it achieves an 32% mean average precision improvement There are two main contributions of this work to the gen- eral communities of machine learning and opinion detection: first, solve the problem of with no labelled training data how to performing opinion detection for certain domains, second, study and extend transfer learning to opinion detection andexplore important features for this task.
The future work will be a natural extension of the current work. In the experiment about the effect of different exter-nal datasets, we found that different datasets actually helpthe precision of opinion detection of different blog topics.
it into the approach used in this research, i.e., using both Therefore, if we do blog topic classification and then use movie reviews and product reviews to train the Gaussian different external datasets as training data for each topic logistic regression model and also using both of them to category, a greater improvement from the baseline should Figure 3 shows the mean average precision at 11-point recalllevel for applying current approach with different external datasets. Surprisingly, using movie domain alone gives the [1] L. L. B. Pang and S. Vaithyanathan. Thumbs up? best performance. Using product reviews to train the model sentiment classification using machine learning results a performance drop as compared with using both do- techniques. In proceedings of 2002 conference on mains, which not show an additive improvement as we ex- Empirical Methods in Natural Language Processing.
pected. In this case, the negative effect of transfer learning is observed. It tells us that even transfer learning is effec- [2] J. Baxter. A bayesian/information theoretic model of tive, but sometimes it will not help much if a bad external learning to lear via multiple task sampling. In Machine Learning. Machine Learning, 1997.
[3] L. L. Bo Pang. A sentimental education: Sentiment In our case, blog domain (target domain) covers more gen- analysis using subjectivity summarization based on eral topics as shown in Figure 4, movie domain (training do- minimum cuts. In proceedings of ACL 2004. ACL, main) talking about mainly movies, but also talking about the people, objects, organizations in the movie, and hence [4] R. C. B. J. F. M. B. M. C. C. F. J. G. S. H. M. A. H.
matches blog domain better. On the other hand, product G. H. D. A. J. R. K. K. T. K. S. L. C. L. G. A. M. K.
domain concentrates on customer reviews about several elec- J. M. D. M. N. N. U. P. P. R. D. S.-O. R. T. R. P. v.
tronic products, it only helps a certain type of topics in blog d. R. E. V. Christiane Fellbaum, Reem Al-Halimi.
opinion detection, not all of them. The experiment tells us WordNet: An Electronic Lexical Database. MIT Press, that selecting a good external dataset is very important to avoid negative effect of transfer learning.
[5] J. W. E. Riloff. Learning extraction patterns for subjective expressions. In proceedings of the 2003 conference on Empirical Methods in Natural LanguageProcessing. EMNLP, 2003.
This paper describes a transfer learning approach which in-corporates common knowledge for the same task from exter- [6] J. C. Hui Yang, Luo Si. Knowledge transfer and nal domains as a non-diagonal informative prior covariance opinion detection in the trec2006 blog track. In matrix. It brings a way to solve the problem of lacking of Notebook of Text REtrieval Conference 2006. TREC, enough training data or even no training data from the tar- [7] C. M. G. M. I. S. Iadh Ounis, Maarten De Rijke.
Overview of the trec-2006 blog track. In Notebook of The approach is adapted to the task of opinion detection, Text REtrieval Conference 2006. TREC, Nov 2006.
which is a very interesting research topic recently. In our [8] A. S. K. Yu, V. Tresp. Learning gaussian processes TREC-2006 system, opinion detection is separated into two from multipl tasks. In Proceedings of ICML 2005.
sub-tasks, passage retrieval and text classification. Passage retrieval engine searches passages related to the query top- [9] J. C. P. N. D. Lawrence. Learning to learn with the ics and return them by the confidence score. Text classifi- informative vector machine. In Proceedings of ICML cation is a binary classification problem, either opinion or non-opinion. Sentences are the unit to perform this classi- [10] T. Z. R. Ando. A framework for learning predictive fication. Gaussian Logistic Regression is used as the gen- structure from multiple tasks and unlabeled data.
eral framework. In the proposed approach, an informative ACM Jounal of Machine Learning Research, May [11] A. Y. N. Rajat Raina and D. Koller. Transfer learning by constructing informative priors. In Proceedings of the Twenty-second International Conference on Machine Learning. ICML, March 2006.
[12] K. T. T. F. S. Morinaga, K. Yamanishi. Mining product reputations on the web. In Proceedings ofSIGKDD 2002. SIGKDD, 2002.
[13] S. Thrun. Is learning the n-th thing any easier than learning the first? In Proceedings of NIPS 1996. NIPS,1996.
[14] P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification ofreviews. In Proceedings of ACL 2002. ACL, July 2002.
APPENDIXA. TREC-2006 BLOG TRACK TOPICSmarch of the penguinslarry summersstate of the union speechann coulterabramoff bushmacbook projon stewartsuper bowl adsletting india into the clubarrested developmentmardi grasblackberrynetflixcolbert reportbasquewhole foodscheney huntingjoint strike fightermuhammad cartoonbarry bondscindy sheehanbrokeback mountainbruce bartlettcoretta scott kingamerican idollife on marssonicjihadhybrid carnatalie portmanfox news reportseahawksheinekenqualcommshimanowest wingworld trade organizationaudiscientologyolympicsinteljim moranzyrtecboard chessoprah

Source: http://boston.lti.cs.cmu.edu/huiyang/11-742/report.pdf

Slide

In Vitro Comparison of Particle Size Distribution/Respirable Dose for LiteAire Spacer versus Misty Max – 10 Nebulizer Using Albuterol. Sunil Dhuper MD, Sanjay Arora MD, Aziz Ahmed MD, Alpana Chandra MD ,Cynthia Chong MD, Chang Shim MD, Hillel W. Cohen DrPH, Scott Foss, Sonia Choksi MD North Central Bronx Hospital, 3424 Kossuth Avenue, Bronx, New York 10467 An Affiliate of The Albert

Microsoft word - f04-5_e.doc

SAFETY DATA SHEET no. F04/5E Date of first issue: 10/10/95 Revision date: 29/05/09 NO-MIX ORTHODONTIC ADHESIVE PASTE 1. Identification of the preparation and of the company 1.1 Identification of the preparation No-Mix orthodontic adhesive, paste. 1.2 Use of the preparation For bonding orthodontic brackets, tubes and accessories to teeth, for fixed orthodontic appliances . 1.3