Ceur-ws.org

Avoiding Deceptive Annotations in the Semantic Web
Semantic Web, annotations are easily abused. If we cannotresolve deceptive annotations, we may have negative expe- Deceptive annotations are becoming an important problem riences with Semantic Web applications due to the unsure as more and more people start to tag documents, and the problem has become an argument to against the Semantic In general, there are two opposing opinions to this prob- Web. Skeptics believe that developers make mistakes when lem.1 Some people believe that deceptive annotation is a annotating documents, and developers may even abuse an- type of cheating. In their definition, deceptive annotations notations from time to time. Due to the difficulty of detect- are false claims whose purpose is to mislead. Advocates of ing and resolving deceptive tags, these skeptics openly won- total freedom on the Internet, however, suggests that every- der whether semantic annotations may bring more trouble body has a right to say and write whatever is on their mind.
than benefit. In this paper we present a deception avoidance So, in essence no annotations are “deceptive,” but are only resolution method. By adding personal specifications about ontology concepts through instance recognition semantics, In the Semantic Web, deceptive annotations are the anno- Semantic Web users can avoid being deceived by improperly tations with instances that deviate from their commonly ex- annotated data. At the same time, our deception avoidance pected meanings. For example, if “UTAH” is annotated as a strategy also passively discourages annotators from falsely N AT ION , this annotation is deceptive because a N AT ION tagging documents by decreasing the profit they can gain is commonly understood as an independent country in the from deceptive annotations. Finally, our deception avoid- ance mechanism still preserves the right to annotate text At the same time, however, we must not prohibit the free- dom of people to annotate as they wish. Things can changeand new knowledge is discovered from time to time. For example, annotating “Montenegro” as a N AT ION before Deceptive annotations, or deceptive tags, are becoming June 3, 2006 would have been deceptive. But it is no longer more and more of a problem as people start to tag their deceptive after June 3, 2006, when Montenegro declared its documents. The problem has, in fact, become an argument independence. Moreover, people should have the freedom to against the Semantic Web. As an example, at a recent to annotate a document according to their own understand- conference in Boston, Peter Norvig, the Google Director of ing even if it is seen as deceptive by others. For example, Search and an AAAI Fellow, asked Tim Berners-Lee, the in- a Montenegro independence movement member may anno- ventor of the Web and the current director of W3C, a ques- tate “Montenegro” as a N AT ION even before June 3, 2006.
tion about deception in the Semantic Web [4]. Norvig said, This was what the person believed and expected although it “We deal every day with people who try to rank higher in would certainly have been a deceptive annotation as viewed the results and then try to sell someone Viagra when that’s by others. To the end that the web is designed to be an not what they are looking for. With less human oversight open and free space, a resolution to the deceptive annota- with the Semantic Web, we are worried about it being eas- tion problem should not override the freedom of tagging.
ier to be deceptive.” In this question, Norvig reveals one of There are three strategies we can apply to solve the decep- his concerns about the Semantic Web. Without question, tive annotation problem: deception protection, deception Internet deception is a severe problem. Particularly in the detection, or deception avoidance. A deception protection strategy would allow only trusted authorities to annotate all Written mainly while this author was on an extended visit web pages and would encrypt annotations so that no one canabuse them. Based on current Internet security technolo-gies, we can believe that the deceptive annotation problemcan be solved by deception protection methods. A problem Permission to make digital or hard copies of all or part of this work for with this resolution, however, is that it generally dismisses personal or classroom use is granted without fee provided that copies are the right of individual web developers to annotate their own not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to A deception detection strategy would check the correct- republish, to post on servers or to redistribute to lists, requires prior specific ness of mappings between annotated data and their annota- permission and/or a fee.
SAAW ’06 Athens, GA, USA Copyright 200X ACM X-XXXXX-XX-X/XX/XX .$5.00.
http://www.bloghop.com/tagview.htm?itemid=deceptive tions based on formal definitions of rules in ontologies. Such a process is usually expensive to execute, however. For ex- ample, to check whether “Montenegro” is a N AT ION , a process must at least compare the annotating date to the exception phrase: \s.*ba(th)?s?\b.*r(oo)?ms? independence date of Montenegro. Even worse, it could be context keyword: b(r|d)s? | bdrms? | bed(rooms?)? very difficult to construct these rules and agree on them.
Both defining rules as well as processing them would likely be costly. Researchers must first resolve all these sophisti-cate issues before we could really apply deception detection In this paper, we present a deception avoidance strat- egy. Rather than detecting false annotations, the deceptionavoidance strategy avoids looking for potentially deceptive of digits representing numbers between 1 and 20. Defined cases. Our method is based on two observations and as- auxiliary filtering specifications help to precisely identify an sumptions: (1) users need not care about whether an anno- instance. In Figure 1, we declare the left immediate con- tation is deceptive unless they are interested in the annota- text (left context phrase) to be a legal word boundary and tion; and (2) if users are interested in an annotation, they the right immediate context (right context phrase) to be the can avoid being deceived by explicitly and clearly express- regular expression “r(oo)?ms?” with possibly several other ing their interests about the annotation. We proffer instance words in between, e.g. “large room.” The exception phrase recognition semantics to allow Semantic Web users to specify excludes some negative phrases from the previously specified their personal interests to avoid deceptive annotations. The patterns, which is the right context phrase in our example.
degree of vulnerability to deceptive annotations depends on In our case, we exclude, for example, “bath room” to be a how precisely they have specified their instance recognition legal right context phrase. The context keywords are a care- semantics in ontologies. Moreover, our deception avoidance fully selected set of keywords that typically appear close to strategy also passively discourages annotators from falsely the concept locations. They are mainly for the purpose of tagging documents by decreasing the profit they can gain improving the accuracy of automated semantic annotation from deceptive annotations. At the same time, our decep- processes. Although this example somehow looks compli- tion avoidance method still preserves the right of people to cated, many times ISR declarations can be as simple as a list of potential instances, such as a list of country names To explain how our strategy works, we briefly introduce instance recognition semantics in Section 2. In Section 3, ISR augmentations to ontologies help separate the work we show how we use instance recognition semantics in our load between domain experts (who are individual annota- tors) and data-extraction engineers (who design and build data-extraction engine). This separation is key in our au-tomatic deception avoidance mechanism. Because ISR rules are declarative, domain experts can create instance recog-nition rules for domain concepts without having to do any Instance recognition semantics, which can also be called programming; and because ISR rules are embedded inside instance semantics recognizers (ISR),2 are formal specifi- of ontologies, domain experts need not be concerned about cations that identify instances of a concept C in ordinary mapping recognized concepts to domain ontologies.4 These text. The text may be unstructured, semi-structured, or two properties of ISR rules enable domain experts to cre- fully structured. For Semantic Web applications, the con- ate and update their ISR declarations without the need to cept C should be a lexical element of a formal ontology consult with data-extraction engineers. Since domain ex- (e.g. concepts such as date, time, place, location, name, perts know their domain best, their ISR declarations can telephone number, email address, various weights and mea- sures, etc.). Thus, instance recognition semantics of an on- Using ISR declarations, domain experts implicitly “per- tology concept (e.g. T elephone N umber) interpret instances sonalize” the meanings of specified ontology concepts. Here in a text fragment (e.g. the contact number in “Call me at personalize means that domain experts cast the recogni- 222-1234.”) to have the intensional meaning of the defined tion of a generally defined concept to their own expecta- Figure 2 illustrates this idea with a simple exam- Figure 1 shows a partial ISR declaration we have used ple. Without ISR declarations, an arbitrary positive inte- in an apartment-rental domain ontology for the concept ger number could be a legal instantiation of the concept BedroomCount.3 Although recognition patterns can be ex- BedroomCount,5 although in reality we rarely can find a pressed variously in different syntaxes, in our study we have single apartment with more than 4 or 5 bedrooms. With ISR used Perl-style regular expressions. In general, an ISR dec- declarations, we can restrict the instantiation of BedroomCount laration includes defined recognition patterns and auxiliary to be between 3 and 4, perhaps because we need an apart- filtering specifications. We specify recognition patterns in an ment with at least three bedrooms and we do not antici- external representation clause. In Figure 1 we specify that pate ever needing more than four bedrooms. Therefore, our any legal instantiation of BedroomCount should be a string BedroomCount with this ISR declaration becomes a spe- 2We avoid the acronym IRS (Internal Revenue Service) be- 4Mapping concepts to domain ontologies is a major concern cause instance recognition semantics are not tax collectors.
in current semantic annotation approaches [1, 2, 3].
3The ontology can be found in the DEG web site: 5In theory there is no restriction why one cannot build a online advertisement,7 and we have annotated it manually <Feature>Large</Feature> <BedroomCount>2.5 </BedroomCount> room apartment 70 qm available <AvailableDate>July 1</AvailableDate> Although we still may not know what the meaning of a “.5bedroom” is, somebody truly has expressed the number of Figure 2: Concept with IRS declarations equivalent to declare a special subclass to itself.
modify our external representation declaration so that it ac-cepts n.5 as a legal representation for room numbers, or wecan keep ignoring them and continue to treat them as de- cialization of the BedroomCount without an ISR declara- ceptive annotations because we do not like n.5 bedrooms.
tion or with a different (more generalized) ISR declaration.
Both choices are fine, and the decision totally depends on Hence the meaning of the concept BedroomCount is per- sonalized to our perspective. With personalized concepts, Using this same technique, we can resolve the problem ontologies become personalized, augmented by personalized that a deceiver falsely annotates “Viagra” as a F OOD in order to attract more readers to a Viagra-sales web page.
This deception may not be easy to detect through ontologyreasoning because Viagra is edible, which satisfies one of the crucial feature about F OOD. But we can avoid this prob- Deceptive annotations are harmless if users are not in- lem by applying our deception avoidance method. Based terested in them. For example, if “Viagra” is falsely anno- on different conditions, there are two ways to avoid this de- tated as a F OOD, users will not be deceived unless they ception. First, if users specify a list of F OOD items that are looking for F OOD. Therefore, users can automatically does not contain Viagra, straightforwardly they avoid this avoid deceptive annotations in which they are not inter- deceptive web page based upon unmatched interests. Sec- ested. Moreover, if users are interested in an annotation, ond, if users are open to trying new foods that they do not they can avoid being deceived by explicitly and clearly ex- know, they can simply leave the external representation of pressing their interests about the annotation. For example, their F OOD declaration blank, which means that they ac- if users are looking for F OOD, and they have clearly speci- cept whatever is annotated as a F OOD to be F OOD. Then fied that their F OOD consists of lists of breads, meats, and they will be deceived by this deceptive annotation the first vegetables, they can also avoid being deceived by “Viagra” time. But after they learn that this is a deception, they since it is not on their list. These two scenarios constitute can avoid it by simply adding an exception phrase “Viagra” the basis of our deception avoidance methodology.
for their external representation about F OOD. Hence they By augmenting ISR declarations, ontologies become per- would never be trapped in this deception again. This up- sonalized ontologies. Therefore, any annotations that con- date avoids not only this deceptive web page, but also all tradict specified personal interests can be automatically ig- the other web pages that play the same deceptive trick on nored. For example, the following house-rental advertise- ment is from a real online web site,6 and we have inten- In our deception avoidance method, we must emphasize tionally annotated it deceptively with our apartment-rental that the vulnerability of users to deceptive annotations de- pends very much on how carefully users build and improvetheir ISR declarations. It is fair, however. Just like in any <BedroomCount>3.5</BedroomCount> Bed, human society, humans who are too lazy to learn will be <BathroomCount>2.5</BathroomCount> Bath repeatedly deceived by the same trick. Only if they learn House with <Feature>Pool</Feature>, from previous experiences, i.e. only if they update their own <Feature>Large LCD HDTV</Feature>, ISR declarations by their experiences, can they avoid being <Feature>High speed internet</Feature> deceived again. When we continually update our knowledgeby our experiences, we become harder and harder to de- By applying the ISR declarations in Figure 1, however, ma- ceive. Hence our deception avoidance method is partly an chines can avoid being deceived by these deceptive anno- tations because “3.5” is not recognized as a data instance Since our method does not depend on annotations, but of interest by the specified external representation for the rather on recognizers, our method preserves total freedom concept BedroomCount. In this process, machines do not for annotators to tag whatever they want to any textual generate any logic rules from ontologies to detect the se- content. Our method is applied to the user side rather than mantic meaning of this annotated data; nor do machines the annotator side. While users have the power to avoiding perform any domain identification methods to verify the ap- what they believe to be deception, annotators can still an- plication domain for this advertisement. Machines avoid this notate everything freely. For example, our method does not deceptive case simply because of the ISR declaration in the prohibit annotators from tagging “Viagra” to be a F OOD.
If our deception avoidance methods were used extensively On the other hand, perhaps we begin to notice several n.5 on the web, deceptive annotators would find that they lose bedroom counts. The following example is also from a real much more than they gain by deceptive annotations. For 6http://www.villas2000.com/frbvo/homes/3345.php.
7http://berlin.craigslist.org/apa/173491092.html. Checked example, the reason deceivers falsely annotate “Viagra” asa F OOD is that they want to increase the hit rate of a webpage.
With our deception avoidance strategy, real food- seekers will soon learn that this is a deceptive web pageand thus avoid visiting it any more. At the same time, realViagra-seekers may look for annotations such as M EDI-CIN E rather than F OOD because they do not think Viagrais a F OOD. Even if deceivers annotate “Viagra” simulta-neously to be both F OOD and M EDICIN E, they still de-creases their own opportunities to have their real customersbecause the thought that Viagra is not F OOD overridesthe thought that Viagra is both F OOD and M EDICIN E.
Therefore, our mechanism not only provides an active decep-tion avoidance method for users, but also becomes a passivedeception avoidance strategy from an annotator’s perspec-tive.
Deceptive annotations are becoming a severe problem as more and more people start to tag web data.
has been used as an argument against the realization of theSemantic Web. In this paper we presented a new deceptionavoidance resolution. By augmenting ontologies with ISRdeclarations, our method not only provides active deceptionavoidance for users, but may also passively decrease the rateof deception by reducing the chances that deceivers mayobtain benefits from deceptive annotations. We expect thatour work may lead to more attention being paid to thisimportant and interesting research problem.
[1] Y. Ding, D.W. Embley, and S.W. Liddle. Automatic creation and simplified querying of semantic webcontent: An approach based on information-extractionontologies. In Proceedings of the first Asian SemanticWeb Conference (ASWC 2006), LNCS 4185, pages400–414, Beijing, China, September 2006.
[2] S. Handschuh, S. Staab, and F. Ciravegna. S-cream semi-automatic creation of metadata. In Proceedings ofthe European Conference on Knowledge Acquisition andManagement (EKAW-2002), pages 358–372, Madrid,Spain, October 2002.
[3] A. Kiryakov, B. Popov, I. Terziev, D. Manov, and D. Ognyanoff. Semantic annotation, indexing, andretrieval. Journal of Web Semantics, 2(1):49–79,December 2004.
[4] C. Lombardi. Google exec challenges Berners-Lee.
http://news.zdnet.com/2100-9588 22-6095705.html.

Source: http://ceur-ws.org/Vol-209/saaw06-short02-ding.pdf

Microsoft word - doctoral-seminar-riga-2004.doc

The Third International Conference on International Business in Transition Economies September 9-11, 2004, The Stockhlm School of Economics in Riga, Latvia Doctoral Seminar on The Theory and Methodology in International Entrepreneurship, Innovation and Competitiveness Research in the CEE Context September 7-8, 2004, Stockholm School of Economics in Riga, Latvia

Országos diákolimpiai - szabadidős jegyzőkönyv

Országos Diákolimpia, Szabó Ferenc Emlékverseny, Főiskolai és Egyetemi Bajnokság Szabadidős futamok jegyzőkönyve 2012.június 10. Sukoró K-1 férfi Főiskolás, Egyetemista 500m Szabadidős I. előfutam K-1 férfi Főiskolás, Egyetemista 500m Szabadidős II. előfutam K-1 férfi Főiskolás, Egyetemista 500m Szabadidős III. előfutam C-1 Főiskolás, Egyetemista

© 2008-2018 Medical News