Opinion Detection by Transfer Learning
5000 Forbes Ave, Pittsburgh, PA, USA, 15213
who wrote it”. Given the great variety and complexity of
Opinion detection is the main task of TREC 2006 Blog track,
human language, opinion detection is a challenging job.
which identifies opinions from text documents in the TRECblog corpus. Given that it is the first year of the task, there is
In year 2006, Text REtrieval Conference (TREC) started a
no available training data provided. Using knowledge about
new track to study research topics in the blog domain, and
how people give opinions on other domains, for example,
opinion detection in blogs is the main task [7]. Since it is
movie review, product review and book review, is the best
the first year and blog data is pretty new in the research
available training data for opinion detection in blog domain.
community, there is a lack of training data. Given the lack
This work describes how to apply transfer learning in opin-
of training data from blog corpus, simple supervised learning
ion detection. A Bayesian logistic regression framework is
is not possible. How to transfer knowledge about opinions
used and knowledge from training data in other domains is
from other domains, which have labelled training data, is
captured by a non-diagonal prior covariance matrix. The
experimental results show that the approach is effective andachieve an improvement of 32% from baseline.
This paper gives a try to use techniques in transfer learning[2][8][9][10][11][13] to incorporate common features for opin-
ion detection across different domains to solve the problemof no training data. Bayesian Logistic Regression is the main
H.4 [Information Systems Applications]: Miscellaneous
framework used. The common knowledge is formed into anon-diagonal covariance matrix for the prior of regression
coefficients. The learned prior from movie and product re-
views is used to estimate whether a sentence is an opinionor not in the blog domain. Moreover, different from classic
text classification task, opinion detection has its own effec-tive features in the classification process. This paper also
describes ”Target-Opinion” word pairs and word synonymsand their effects on opinion detection.
1. INTRODUCTIONOpinion detection is an emerging topic that attracts more
The remainder of this paper is organized into several sec-
and more research interests from researchers in data mining
tions. Section 2 gives a brief literature review of transfer
and natural language processing [1][3][5][12][14]. Given a
learning, opinion detection and explains the existing work
document, opinion detection task identifies and extracts the
done during TREC 2006. Section 3 details the transfer
opinionated expressions for a certain topic. Some opinions
learning algorithm used in the opinion detection process.
expressed in a general way as in ”I really like this work”,
Section 4 explains feature selection for opinion detection.
hence words with sentiment polarity are playing an impor-
Section 5 describes the datasets used in this research. Sec-
tant role to recognizing the presence of an opinion. On the
tion 6 elaborates the evaluation and experimental results,
other hand, there are many opinions have its own way to ex-
also gives an analysis to the results. Section 7 concludes the
press, for example, ”Watching the film is like reading a times
paper. Appedix A lists the query topics evaluated in this
portrait of grief that keeps shifting focus to the journalist
2. RELATED WORK2.1 Opinion DetectionResearchers in Natural Language Processing (NLP) commu-nity are the pioneers for the opinion detection task. Turney[14] groups online words whose point mutual information isclose to two words - ”excellent” and ”poor”, and then usethem to detect opinions and sentiment polarity. Riloff andWiebe [5] use a high-precision classifier to get high quality
opinion and non-opinion sentences, and then extract surface
sentence. Therefore, sentence is selected as the basic unit
text patterns from those sentences to find more opinions and
non-opinions and repeat this process to bootstrap. Pang etal. [1] treated opinion and sentiment detection and as a text
The remaining task is to identify which sentences contain
classification problem and use classical classification meth-
opinion, which are not. It can be considered as a binary clas-
ods, like Naive Bayes, Maximum Entropy, Support Vector
sification problem. Baysian Logistic Regress is the frame-
Machines, with word unigram to predict them. Pang and
work used here. Each sentence is represented as X = [x1, x2, ., xn],
Lee [3] in their another work also tried to use Minicuts to
where n is the total number of word features xi. The entire
cluster sentences based on their subjectivity and sentiment
dataset is represented X = {X(1), X(2), ., X(m)}, where m
orientation. Researchers from data mining community also
is the total number of sentences. A class label for a sen-
study the problem of opinion mining. Morinaga et al. [12]
tence is either opinion or non-opinion, and is represented
used word polarity, syntactic pattern matching rules to ex-
by Y = {0, 1}.
tract opinions. They also use principal component analysisto create correspondence between the product names and
Logistic regression assumes sigmoid-like data distribution
keywords with the distance on a map showing the closeness.
and predicts the class label according to the following for-mula:
We participated in TREC-2006 Blog track evaluation. The
main task is opinion detection in blog domain. The system
P (Y = 1|X = x, θ) =
[6] is mainly divided into two parts: passage retrieval andopinion classification. During passage retrieval, the topicsprovided by NIST are parsed and query expansion is done
where θ is the regression coefficient. It usually is learned by
before sending the topics as queries to the Lemur search en-
coordinate descent, while a global optimum is guaranteed to
gine1. Documents in the corpus are segmented into passages
around 100 words and are the retrieval units for the searchengine. The top 5,000 passages returned by Lemur are then
However, logistic regression, like many other classification
sent into a binary text classification program to classified
/regression algorithms, suffers from overfitting. Usually when
into opinions and non-opinions based the average over their
large regression coefficients are observed, prediction accu-
sentence-level subjectivity score. The performance of the
racy is very sensitive to test data, and overfitting occurs. To
system is among top five participated groups.
avoid this problem, usually a multivariate Gaussian prior isadded on θ. For simplicity, zero mean and equal varianceare assumed. Hence the prior is N (0, σ2I) and the objective
Transfer learning is to learn from other related tasks andapply the learned model into the current task. The most
l(Y = 1|X; λ) =
general form of transfer learning is to learning the similar
if (xi)−log(1+exp(f (xi)))]− 2
tasks from one domain to another domain so that transfer
the ”knowledge” from one to another. In the early research
where f (x) = θT x. and Maximum A Posteriori (MAP) es-
of transfer learning, Baxter [2] and Thrun [13] both used hi-
erarchical Bayesian learning methods to tackle this problem. In recent years, Lawrence and Platt [9], Yu et al. [8] also
used hierarchical Bayesian models to learn hyper-parameters
where θ∗ is the maximum likelihood estimation for θ.
Ando and Zhang [10] proposed a framework for Gaussian
The above prior is the most common prior which used in
logistic regression with transfer learning for the task of clas-
many research problems. It assumes equal variances for all
sification and also provided a theoretical prove for transfer
the features, which is not valid in the real world settings.
learning in this setting. They learned from multiple tasks to
Hence, a general prior with non-diagonal covariance N (0, Σ)
form a good classifier and apply it onto other similar tasks.
is used in this research. The MAP estimation becomes:
Raina et al. [11] continued this approach and built informa-
tive priors for gaussian logistic regression. These informative
priors actually corresponds to the hyper-parameter in other
approaches. We follow closely with Raina et al.’s approachand adapt it into the opinion detection task.
To apply the above formula, it is required to get the value ofcov(θi, θj) for every pair of regression coefficients (θi, θj). By
the definition of covariance, it is the difference of expected
After retrieving 5000 paragraphs for each topic, sentence
joint probability of E[θiθj] and the product of individual ex-
segmentation is done for each paragraph. Though in TREC
pected probability E[θi] and E[θj]. The following equation
assessment, document is the evaluation unit, sentence, is
actually a more nature unit for the task of opinion detec-
cov(θi, θj) = E[θiθj] − E[θi]E[θj]
tion because different opinions could be present in the samedocument but much less possible to be present in the same
Given that the prior’s mean is 0, both of individual expected
values of θi equal to 0, i.e.,E[θi] = E[θj] = 0. Therefore, the
covariance of any two regression coefficients becomes:
individual covariance for each word pair is turned into theproblem of learning the correspondence between an under-
cov(θi, θj) = E[θiθj]
line common feature, which will be shared by many word
which is just the expected joint probability of those two
pairs, and a word pair itself. Mathematically, if the indirect
common features are defined as a feature vector Fij, and thesmall fraction of covariances are defined as C, in which all
3.1 MCMC for Covariance of Pair-wised Co-
the values are calculated by the method given in section 3.1and are represented by cij , the objective function to learn
the correspondence ψ is given in the following least squared
The covariance for pair-wised regression coefficients can be
obtained by Markov Chain Monte Carlo (MCMC) method. Instead of real covariance, which is not going to be achieved
but can be closely estimated by the sample covariance. MCMC
suggests to sample several small vocabularies with the two
where K is the set of words whose covariances are calculated
words corresponding to θi and θj. Each small vocabulary
is used as training data to train an ordinary logistic regres-sion model whose objective function is defined in equation 2.
By learning the correspondence of the word feature and indi-
The sample covariance is obtained by going through words
rect common features, i.e., by learning ψ, the entire covari-
in each training set and vocabulary.
ance matrix C can be estimated by computing its (i, j)thelement as :
θ(v,t)θ(v,t)
A valid covariance matrix needs to be positive semi-definite
where V is the number of vocabularies and T is the number
(PSD), which is a Hermitian matrix with all of its eigenval-
of training sets from each vocabulary.
ues nonnegative. In other words, it needs to be a square,self-adjoint matrix with nonnegative eigenvalues. Clearly,
Hence the covariance is due to both randomness of vocabu-
the individual pair-wise covariances obtained in section 3.1
laries and training sets. However, only the covariance due to
are not going to be such a matrix automatically. And the
vocabulary change is desired in our case. Hence a correction
covariance matrix obtained by equation 10 is not PSD ei-
step is performed through minus a bootstrap estimation of
ther. Hence, a projection from the original covariances to a
the covariance due to randomness of training set change.
PSD cone is necessary to make the matrix usable. There-fore, the covariance matrix C should be as close to a PSD
cov(θi, θj) = sample covariance(θi, θj) -
matrix Σ as possible, which is represented in the followingmean squared error objective function:
θ(v))(θ(v,t) − ¯
θ(v) are sample mean of regression coefficients
for each vocabulary across different training sets.
This can be related to the indirect common features by sub-stituting cij with ψT Fij, and the objective finction for get-
By doing the above calculation, the covariances of each pair
of regression coefficient is able to be obtained. However,
given that the number of regression coefficients is corre-
sponding to the number of word features, the total amountof computation is huge and not feasible. Therefore, a smarter
note that different from in equation 10, where ψ is the target
way of calculating just a small amount of pair-wise covari-
to be learned, ψ is a fixed values vector now.
ances is necessary.Moreover, individual pair-wise covariancescan only be used to estimate relationship between two words,
As we can see so far, for each concern of how to learn a
however, what is needed is to estimate relationship among
good covariance matrix, an objective function is found. To
all the words. In another word, a covariance matrix is the
solve the first and second in sequence is less effective and less
efficient than solve them as a combined objective functionsince at the first step, the learned covariance matrix C can
be highly indefinite, and hence at the second step, many en-tries need to be adjusted to satisfying the PSD constraints,
As pointed out in the previous section, it is extremely inef-
and the knowledge learned in the first step is wasted and has
ficient to calculate every pair of individual covariances for
to learned again. By combining two objective functions into
all word features. Instead, learning indirect common fea-
one, while learning ψ, the consideration of PSD constraints
tures and representing the word features as those features
is also effective. Therefore, the overall objective function be-
will dramatically reduce the amount of computations. In
comes a joint optimization problem and can be represented
this way, only a small fraction of word pairs need to be
calculated their pair-wise covariances. And the rest of wordpairs’ covariances can be estimated by a transformation from
(cij −ψT Fij)2 +(1 −λ)
their indirect features. Therefore, the problem of learning
where λ is the trade-off coefficient between the two sub ob-
which carries the common knowledge embedded in differ-
jectives. As λ goes to 0, only the PSD constraints are taken
ent opinion related corpora, for logistic regression coeffi-
care of, and as λ goes to 1, only the word pair relationship
cients. The prior is represented as a Gaussian distribution
constraints are taken care of. We set λ = 0.6 in this research,
with non-diagonal covariance, which can be used to repre-
which is a good trade-off coefficient learned empirically.
sent word to word relationship which is absent in the abovetwo approaches, which treat each word features are identi-
The joint optimization problem in equation 13 can be solved
cally independent distributed (i.i.d.). The third approach
in an minimization-minimization procedure by fixing one
is described in section 3, which forms the word to word re-
argument and minimizing on another. In our case, alter-
lationship as a function of indirect common features across
natively, ψ is minimized over when Σ is fixed, and Σ is
different opinion related corpora. What are the good indi-
minimized over when ψ is fixed. When minimizing over
rect features for opinion detection is investigated. ψ, quadratic programming (QP) is sufficient. There aremany QP sovlers2 available and can be easiliy obtained.
One prominent phenomenon in opinion and also one of the
When minimizing over Σ, this is a special semi-definite prob-
difficult part of opinion detection is that people are not al-
lem (SDP), and can be easily done by performing eigen-
ways using ”blah blah is good”, ”awesome blah blah!” to
decomposition and keeping the nonnegative eigenvalues, which
express opinions, instead, different opinion targets relate to
can be done in any standard SDP solvers.
their own customary opinion expressions. For example, weusually say ”A person is knowledgeable” and ”A computer
Since equation 13 is convex, which can be proved, there is
processor is fast”, not ”A person is fast” and ”A computer
a global minimum existing. Therefore, the minimization-
processor is knowledgable”. Target-specific opinions are not
minimization procedure repeats the two minimization steps
to be well-identified with simple word polarity test either.
and continues until a guaranteed convergence.
For example, ”A computer processor is running like a horse”. There is no positive or negative adjectives available in the
sentence and polarity test will say this is not an opinion even
Given that there is no training data available in the target
domain, transfer learning is the only choice besides manuallytagging a corpus. The most naive way of transfer learning
To model the correspondence of a target and its custom-
will be training a model on some external domain’s data,
ary opinion expression, a feature, which is a pair of (tar-
which is handy, and using the external domain’s vocabu-
get,opinion), is designed to explicitly formulate this corre-
lary, creating unigram or bi-gram features, testing on the
spondence and kept in the prior covariance matrix. To do
test corpus, with the hope that some unigram and bi-gram
so, in the training corpus, extract ”subject and object” pair,
features are also present in the test corpus. Since different
”subject and predicate” pair, ”modifier and subject” pair
features play different roles in different domains, for exam-
from a positive sentence (opinion). In the testing corpus,
ple, ”movie” is a key word feature and appearing in many
if one such pair is observed, the corresponding feature value
opinion sentences within the movie review domain, while it is
definitely not a key feature for opinion detection in productreview, since it has low probability that a sentence talking
Another important feature is word synonyms. This is be-
about movie is an opinion about some product, for example,
cause that if only ”This movie is good” is observed in the
Canon camera. There is obvious bias between the training
training corpus, and has a sentence says ”The film is really
set and test set and hence it will not result a very good
good” in the testing corpus, a good opinion detection al-
opinion detection rate. However, this is the baseline trans-
gorithm should be able to detect the second sentence as
fer learning used in our experiments since it is the simplest
an opinion, however, without synonym information, it is
not possible to be done. In the setting of Gaussian logis-tic regression, each entry in the prior covariance matrix can
Another straightforward way of doing transfer learning is
be represented as a linear interpolation of several indirect
to also using word features from other domains, but, in-
features, similar to ”target-opinoin” pair described above,
stead using word features from just a single domain, us-
whether two words are within the same Wordnet[4] synset
ing common word features appearing in multiple external
is also treated as a feature to reflect in the covariance ma-
domains. The purpose is to find word features which is
trix. More specificly, if two words appearing in the same
common enough to appear in every opinion related corpus.
Wordnet synset of the first sense of either noun, verb or
For example, in both movie reviews and product reviews,
adjective, their corresponding feature values is checked to 1.
”good”, ”I like” will indicate a positive opinion, and ”dis-appointed”, ”hate” will indicate a negative opinion. If only
By considering the two above word pair features, the fea-
these common ”opinion”-related features are extracted and
ture vector F discussed and appeared in equation 13 can be
kept in the vocabulary, the severe bias existing in the above
approach is resolved. This is the approach that we used in
Fij = [1, COij, Sij, T Oij]
our submission to TREC 2006 Blog track [6] and will be oneof the experiment option as well in later section.
where COij is the log cooccurence of two word i and j within
The approach used in this paper is to get a common prior,
sentences. Sij is 1 if two words i and j are in the sameWordnet synset), 0 otherwise. T O
(http://control.ee.ethz.ch/ joloef/yalmip.php) package.
TREC 2006 Blog corpus is used in this research. It contains
Baseline Logistic RegressionLogistic Regression w Feature Selection
3,201,002 blog articles (TREC reports 3,215,171), is posted
during the period of December 2005 to February 2006. The
blog posts and the comments are from Technorati, Bloglines,Blogpulse and other web hosts.
Passage retrieval is performed to retrieve top 5,000 (or less
than 5,000 if there is no more than 5,000 passages in the cor-pus for a particular query) passages for each of the 50 TREC
Blog Opinion Retrieval topics. The search engine used in
this research is Lemur, which retrieves 132,399 passages intotal for 50 topics and 2,648 passages per topic in average.
The retrieved passages are then separated into sentences andeach sentence is classified as opinion or non-opinion sentenceby Gaussian logistic regression with non-diagonal prior co-
Figure 1: Comparison of Different Settings of Logis-
variance as we reported in the preview sections.
There are two external datasets used in this research astraining data. The first is a movie review dataset3, pre-
Table 1: Mean Average Precision of Transfer Learn-
pared by Pang and Lee from Cornell University. There are
10,000 movie review sentences in this dataset in total, and
5,000 of them are positive examples, 5,000 are non-opinions.
All the movie reviews are extracted from the Internet Movie
The other external dataset is a product review dataset5, pre-pared by Hu and Liu from University of Illinois at Chicago.
gle one since based on what usually happens in non-transfer
There are more than 4,000 product review sentences, among
learning that the more the training data, the better the pre-
them 2,034 are opinions, 2,173 are non-opinions. Those
diction performance. Another example, since we do not di-
product reviews are extracted from customer comments about
rectly use word features in calculating the non-diagonal prior
2 brand digital cameras (Canon G3, Nikon coolpix 4300),
covariance, what will be the good indirect features for cal-
1 brand jukebox (Creative Labs Nomad Jukebox Zen Xtra
culating it? Is Wordnet synset feature is better than target-
40GB), 1 brand cellphone (Nokia 6610) and 1 brand DVD
opinion feature (see section 4)? The experiments conducted
player (Apex AD2600 Progressive-scan DVD player). As
in this research will answer them in the following sections.
we can see here, they are mostly reviews about electronicproducts.
The evaluation metric used in the experiment are precisionat different recall level and mean average precision (MAP). The answers are provided by TREC qrel, which gives the
document numbers of those documents containing an opin-
The main purpose of the experiments is to test whether
ion and is related to the Blog opinion retrieval topics. Note
the transfer learning approach used in this research is more
that our system is developed for opinion detection at sen-
effective on opinion detection than two other transfer learn-
tence level, and an averaged score of all the sentences in
ing methods. Given that we have no training data from the
a retrieved passages, which is a part of a document, is re-
blog corpus, it is not possible to have a ”real” baseline with
turned as the final score. Therefore, to use TREC qrel to
training on the blog dataset and test on the same dataset.
evaluate, we simply extract the unique document numbers
Therefore, the baseline system used in the experiments is
that appearing in our returned passages, which is ranked by
a Gaussian logistic regression model trained on an exter-
nal dataset and tested directly on the target dataset - blogdataset with zero mean, equal variance prior for regulariza-
6.1 Effects of Using Non-diagonal Covariance
tion. This method is described in more details in section4.
This experiment compares the following three settings :
Another purpose is to explore the effectiveness of differentsettings for using the current approach. For example, we
* Baseline: Using movie reviews to train the Gaussian logis-
know that transfer learning is helpful in the case of no train
tic regression model with zero mean and equal variance. Vo-
data in a certain domain, but how to choose a good exter-
cabulary is unigram and bigrams from movie reviews. The
nal dataset as the auxiliary domain? Do multiple external
model is directly tested on blog review data without any
datasets improve the prediction accuracy more than a sin-
3http://www.cs.cornell.edu/People/pabo/movie-review-
* Simple feature selection: Using movie reviews and product
reviews to train the Gaussian logistic regression model with
zero mean and equal variance. Vocabulary is the common
5http://www.cs.uic.edu/˜liub/FBS/FBS.html
unigram and bigrams from both domains. The model is test
Transfer Learning by Using Wordnet Synset
Transfer Learning by Using Target−Opinion Pair
Transfer Learning by Using Product Review
Table 2: Mean Average Precision of Transfer Learn-ing Using Different Features
[11][10], by just using that, a 20%-40% improvement on text
classification could be observed. Due to that opinion detec-
tion is using text classification techniques, so that it should
be able to observe the similiar effects. However, opinion de-
tection is not purely text classification, it is not topic-wised
classification, but a binary classification of opinions or non-opinions. Therefore, Wordnet synset feature may not effec-tive to our task. In section 4, we introduce a specific feature
specially designed for the task of opinion detection, whichis ”Target-Opinion” word pairs. Each opinion is about a
* The proposed approach: Using movie reviews to calculate
certain target, and this target usually has its own custom-
prior covariance, train the logistic regression model with the
ary way to expression the opinion about it. There is a clear
informative prior. Vocabulary is from the blog corpus and is
relationship between the target and the opinion about it. Is
different for each retrieval topic based on the unigram and
bigrams in the 5,000 retrieved passages. The model is teston blog review data.
Figure 2 shows the results of an experiment which comparesthe three cases of using just Wordnet synset to create infor-
Figure 1 shows the precision at each recall level for the tested
mative prior, using just target-opinion pairs to create infor-
three approaches. As we can see here, the approach used in
mative prior and using both of them. It can be seen that ap-
this research gives the best precision at all the 11-point recall
plying the proposed approach with ”Target-opinion” pair as
levels. The simple feature selection method also performs
the single feature is doing better than using Wordnet synset
better than the baseline system, which indicates that by
alone. When both features are used to construct the infor-
removing the bias introduced by a single domain of data,
mative prior covariance, MAP reaches the best performance
the prediction accuracy of transfer learning is improved. It
which the current approach in this research can achieve. Ta-
is also obvious that the current approach is a more advanced
ble 6.2 shows that using target-opinion pair alone, there is
way of learning task-related common knowledge than just
a 27% improvement as compared to the baseline and 10%
more improvement as compared to using Wordnet synsetalone. It proves that our hypothesis is correct. ”Target-
Table 6.1 shows the non-interpolated mean average preci-
opinion” feature is more suitable for the task of opinion
sion of the 3 approaches. Based on previous research [11]
detection. Wordnet synset feature also contributes to the
reported, the proposed approach could achieve an improve-
improvement of overall performance, but sometimes, for ex-
ment of 20%-40% for text classification task. As for our task,
ample at recall level 0.3 in Figure 2, there is no improvement
we see an improvement of 32% on non-interpolated mean av-
from baseline to using Wordnet synset alone. It is not saying
erage precision from the baseline to the current approach.
that this is a bad feature, but give us a hint that sometimes,
Both experiments in opinion detection and text classification
Wordnet synset will not always be effective for the task of
show that construct non-diagonal prior covariance matrix to
incorporate the external knowledge is a good way to boostthe performance of gaussian logistic regression for transfer
6.3 Effects on External Dataset Selection
In our TREC-2006 submission, we selected common un-igram and bi-gram features from both movie review and
product review domains, with the belief that the intersec-
Target-opinion word pairs and Wordnet synonyms are two
tion part could capture the common features across different
main features used in this project. It is reported that Word-
domains as long as the task is the same, in this case, opin-
net synset feature is very effective for text classification task
ion detection. It is natural to extend this thought to apply
prior covariance matrix is constructed by incorporating ex-
blog topic category distribution
ternal knowledge of ”Target-Opinion” word pairs and Word-net synset information. The results shown in the experi-
ments prove that this is an effective approach with the fact
that it achieves an 32% mean average precision improvement
There are two main contributions of this work to the gen-
eral communities of machine learning and opinion detection:
first, solve the problem of with no labelled training data how
to performing opinion detection for certain domains, second,
study and extend transfer learning to opinion detection andexplore important features for this task.
The future work will be a natural extension of the current
work. In the experiment about the effect of different exter-nal datasets, we found that different datasets actually helpthe precision of opinion detection of different blog topics.
it into the approach used in this research, i.e., using both
Therefore, if we do blog topic classification and then use
movie reviews and product reviews to train the Gaussian
different external datasets as training data for each topic
logistic regression model and also using both of them to
category, a greater improvement from the baseline should
Figure 3 shows the mean average precision at 11-point recalllevel for applying current approach with different external
datasets. Surprisingly, using movie domain alone gives the
[1] L. L. B. Pang and S. Vaithyanathan. Thumbs up?
best performance. Using product reviews to train the model
sentiment classification using machine learning
results a performance drop as compared with using both do-
techniques. In proceedings of 2002 conference on
mains, which not show an additive improvement as we ex-
Empirical Methods in Natural Language Processing.
pected. In this case, the negative effect of transfer learning
is observed. It tells us that even transfer learning is effec-
[2] J. Baxter. A bayesian/information theoretic model of
tive, but sometimes it will not help much if a bad external
learning to lear via multiple task sampling. In
Machine Learning. Machine Learning, 1997.
[3] L. L. Bo Pang. A sentimental education: Sentiment
In our case, blog domain (target domain) covers more gen-
analysis using subjectivity summarization based on
eral topics as shown in Figure 4, movie domain (training do-
minimum cuts. In proceedings of ACL 2004. ACL,
main) talking about mainly movies, but also talking about
the people, objects, organizations in the movie, and hence
[4] R. C. B. J. F. M. B. M. C. C. F. J. G. S. H. M. A. H.
matches blog domain better. On the other hand, product
G. H. D. A. J. R. K. K. T. K. S. L. C. L. G. A. M. K.
domain concentrates on customer reviews about several elec-
J. M. D. M. N. N. U. P. P. R. D. S.-O. R. T. R. P. v.
tronic products, it only helps a certain type of topics in blog
d. R. E. V. Christiane Fellbaum, Reem Al-Halimi.
opinion detection, not all of them. The experiment tells us
WordNet: An Electronic Lexical Database. MIT Press,
that selecting a good external dataset is very important to
avoid negative effect of transfer learning.
[5] J. W. E. Riloff. Learning extraction patterns for
subjective expressions. In proceedings of the 2003conference on Empirical Methods in Natural LanguageProcessing. EMNLP, 2003.
This paper describes a transfer learning approach which in-corporates common knowledge for the same task from exter-
[6] J. C. Hui Yang, Luo Si. Knowledge transfer and
nal domains as a non-diagonal informative prior covariance
opinion detection in the trec2006 blog track. In
matrix. It brings a way to solve the problem of lacking of
Notebook of Text REtrieval Conference 2006. TREC,
enough training data or even no training data from the tar-
[7] C. M. G. M. I. S. Iadh Ounis, Maarten De Rijke.
Overview of the trec-2006 blog track. In Notebook of
The approach is adapted to the task of opinion detection,
Text REtrieval Conference 2006. TREC, Nov 2006.
which is a very interesting research topic recently. In our
[8] A. S. K. Yu, V. Tresp. Learning gaussian processes
TREC-2006 system, opinion detection is separated into two
from multipl tasks. In Proceedings of ICML 2005.
sub-tasks, passage retrieval and text classification. Passage
retrieval engine searches passages related to the query top-
[9] J. C. P. N. D. Lawrence. Learning to learn with the
ics and return them by the confidence score. Text classifi-
informative vector machine. In Proceedings of ICML
cation is a binary classification problem, either opinion or
non-opinion. Sentences are the unit to perform this classi-
[10] T. Z. R. Ando. A framework for learning predictive
fication. Gaussian Logistic Regression is used as the gen-
structure from multiple tasks and unlabeled data.
eral framework. In the proposed approach, an informative
ACM Jounal of Machine Learning Research, May
[11] A. Y. N. Rajat Raina and D. Koller. Transfer learning
by constructing informative priors. In Proceedings ofthe Twenty-second International Conference onMachine Learning. ICML, March 2006.
[12] K. T. T. F. S. Morinaga, K. Yamanishi. Mining
product reputations on the web. In Proceedings ofSIGKDD 2002. SIGKDD, 2002.
[13] S. Thrun. Is learning the n-th thing any easier than
learning the first? In Proceedings of NIPS 1996. NIPS,1996.
[14] P. D. Turney. Thumbs up or thumbs down? semantic
orientation applied to unsupervised classification ofreviews. In Proceedings of ACL 2002. ACL, July 2002.
APPENDIXA. TREC-2006 BLOG TRACK TOPICSmarch of the penguinslarry summersstate of the union speechann coulterabramoff bushmacbook projon stewartsuper bowl adsletting india into the clubarrested developmentmardi grasblackberrynetflixcolbert reportbasquewhole foodscheney huntingjoint strike fightermuhammad cartoonbarry bondscindy sheehanbrokeback mountainbruce bartlettcoretta scott kingamerican idollife on marssonicjihadhybrid carnatalie portmanfox news reportseahawksheinekenqualcommshimanowest wingworld trade organizationaudiscientologyolympicsinteljim moranzyrtecboard chessoprah
In Vitro Comparison of Particle Size Distribution/Respirable Dose for LiteAire Spacer versus Misty Max – 10 Nebulizer Using Albuterol. Sunil Dhuper MD, Sanjay Arora MD, Aziz Ahmed MD, Alpana Chandra MD ,Cynthia Chong MD, Chang Shim MD, Hillel W. Cohen DrPH, Scott Foss, Sonia Choksi MD North Central Bronx Hospital, 3424 Kossuth Avenue, Bronx, New York 10467 An Affiliate of The Albert
SAFETY DATA SHEET no. F04/5E Date of first issue: 10/10/95 Revision date: 29/05/09 NO-MIX ORTHODONTIC ADHESIVE PASTE 1. Identification of the preparation and of the company 1.1 Identification of the preparation No-Mix orthodontic adhesive, paste. 1.2 Use of the preparation For bonding orthodontic brackets, tubes and accessories to teeth, for fixed orthodontic appliances . 1.3