Computing.dcu.ie

What is Text?
Content-based Structure
ATHENS, Greece (Ap) A strong earthquake shook theAegean Sea island of Crete on Sunday but caused no in- • Describe the strength and the impact of an juries or damage. The quake had a preliminary magni- tude of 5.2 and occurred at 5:28 am (0328 GMT) on the sea floor 70 kilometers (44 miles) south of the Cretanport of Chania. The Athens seismological institute said the temblor’s epicenter was located 380 kilometers (238 miles) south of the capital. No injuries or damage werereported.
What is Text?
Domain-dependent Text Structures
A product of structural relations (coherence) S1: A strong earthquake shook the Aegean Sea island ofCrete on Sunday Regina Barzilay
S2: but caused no injuries or damage.
S3: The quake had a preliminary magnitude of 5.2 March 1, 2003
Analogy with Syntax
Motivation
Extract a representative subsequence from a set ofsentences Domain-independent Theory of Sentence Structure • Fixed set of word categories (nouns, verbs, . . .) Find an answer to a question in natural language • Fixed set of relations (subject, object, . . .) Order a set of information-bearing items into a coherent Find the best translation taking context into account Rhetorical Structure
Two Approaches to Text Structure
Content-based models
Rhetorical models
Rhetorical Structure Theory (Next Class)
Argumentative Zoning
Motivation
• Scientific articles exhibit (consistent across Many of the recent advances in Question Answering have followed from the insight that systems can benefit fromby exploiting the redundancy in large corpora.
BACKGROUND
Brill et al. (2001) describe using the vast amount of OWN CONTRIBUTION
data available on the WWW to achieve impressive per- RELATION TO OTHER WORK
formance . . .
The Web, while nearly infinite in content, is not a com- • Automatic structure analysis can benefit: plete repository of useful information . . .
Q&A
In order to combat these inadequacies, we propose a summarization
strategy in which in information is extracted from . . .
citation analysis
Today: Domain-Specific Models
Argumentative Zoning
BACKGROUNDMany of the recent advances in Question Answering have followed from the insight that systems can benefit from by exploiting theredundancy Argumentative Zoning of Scientific Articles
Brill et al. (2001) describe using the vast amount of data available onthe WWW to achieve impressive performance . . .
Supervised (Duboue&McKeown, 2001)
The Web, while nearly infinite in content, is not a complete repositoryof useful information . . .
Unsupervised (Barzilay&Lee, 2004)
OWN CONTRIBUTIONIn order to combat these inadequacies, we propose a strategy in whichin information is extracted from . . .
Examples
Features
We have proposed a method of clustering words Section 2 describes three parsers which are . . .
Contrast
However, no method for extracting the relation-ship from superficial linguistic expressions was • Lexical Features (“other researchers claim that”) Approach
Kappa Statistics
(Siegal&Castellan, 1998; Carletta, 1999)Kappa controls agreement P (A) for chance agreement • Goal: Rhetorical segmentation with labeling Own work: aim, own, textual
Background
Other Work: contrast, basis, other
Supervised Content Modeling
Semantic Sequence
• Goal: Find types of semantic information characteristic to a domain and ordering constraints age, gender, pmh, pmh, pmh, pmh, med-preop, med-preop, med-preop, drip-preop, med-preop, • Approach: find patterns in a set of transcripts ekg-preop, echo-preop, hct-preop, procedure, . . .
Annotated Transcript
He is 58-year-old male. History is significant for Hodgkin’s disease, treated with . . . to his neck, back and chest. Hyperspadias, BPH, hiatal hernia and proliferative lymph edema in his right arm. No IV’spmh or blood pressure down in the left arm. Medications — Inderal, Lopid, Pepcid, nitroglycerine and heparin. EKG has PAC’s. . . .
med-preop drip-preop Example of Learned Pattern
Content Models
• Content models represent topics and their ordering Topics: “strength”, “location”, “casualties”, . . .
Order: “casualties” prior to “rescue efforts” • Assumption: Patterns in content organization are Pattern Detection
Evaluation
Analogous to motif detectionT1: A B C D F A A B F DT2: F C A B D D F F Similarity in Domain Texts
Computing Content Model
TOKYO (AP) A moderately strong earthquake with a preliminary magni- tude reading of 5.1 rattled northern Japan early Wednesday, the CentralMeteorological Agency said. There were no immediate reports of casual- ties or damage. The quake struck at 6:06 am (2106 GMT) 60 kilometers(36 miles) beneath the Pacific Ocean near the northern tip of the main • State-transitions represent ordering constraints island of Honshu. . . .
ATHENS, Greece (Ap) A strong earthquake shook the Aegean Sea islandof Crete on Sunday but caused no injuries or damage. The quake had a preliminary magnitude of 5.2 and occurred at 5:28 am (0328 GMT)on the sea floor 70 kilometers (44 miles) south of the Cretan port ofChania. The Athens seismological institute said the temblor’s epicenterwas located 380 k ilometers (238 miles) south of the capital. . . .
Similarity in Domain Texts
Narrative Grammars
TOKYO (AP) A moderately strong earthquake with a preliminary magni-tude reading of 5.1 rattled northern Japan early Wednesday, the CentralMeteorological Agency said. There were no immediate reports of casual-ties or damage. The quake struck at 6:06 am (2106 GMT) 60 kilometers • Propp (1928): fairy tales follow a “story grammar” (36 miles) beneath the Pacific Ocean near the northern tip of the mainisland of Honshu. . . .
• Barlett (1932): formulaic text structure facilities ATHENS, Greece (AP) A strong earthquake shook the Aegean Sea island of Crete on Sunday but caused no injuries or damage. The quake hada preliminary magnitude of 5.2 and occurred at 5:28 am (0328 GMT)on the sea floor 70 kilometers (44 miles) south of the Cretan port of • Wray (2002): texts in multiple domains exhibit Chania. The Athens seismological institute said the temblor’s epicenter was located 380 k ilometers (238 miles) south of the capital. No injuriesor damage were reported.
Initial Topic Induction
Estimating Emission Probabilities
Agglomerative clustering with cosine similarity measure (Iyer&Ostendorf:1996,Florian&Yarowsky:1999, Barzilay&Elhadad:2003) The Athens seismological institute said the temblor’s epicenter was lo-cated 380 kilometers (238 miles) south of the capital.
Seismologists in Pakistan’s Northwest Frontier Province said the temblor’s epicenter was about 250 kilometers (155 miles) north of the provincialcapital Peshawar.
• Estimation for the “insertion” state: The temblor was centered 60 kilometers (35 miles) northwest of theprovincial capital of Kunming, about 2,200 kilometers (1,300 miles) southwest of Beijing, a bureau seismologist said.
Model Construction
From Clusters to States
• Each large cluster constitutes a state • Agglomerate small clusters into an “insert” state • Determining states, emission and transition Viterbi re-estimation
Information Ordering: Algorithm
• Decode the training data with Viterbi decoding • Use the new clustering as the input to the parameter Estimating Transition Probabilities
Application: Information Ordering
Text summarization
Natural Language Generation
g(ci, cj) is a number of adjacent sentences (ci, cj) “get marry” prior to “give birth” (in some domains) Summarization: Algorithm
Baselines for Ordering
Input: source textTraining data: parallel corpus of summaries and sourcetexts (aligned) • “Straw” baseline: Bigram Language model • Employ Viterbi on source texts and summaries • “State-of-the-art” baseline: (Lapata:2003) • Compute state likelihood to generate summary represent a sentence using lexico-syntactic
compute pairwise ordering preferences
find optimally global order
• Given a new text, decode it and extract sentences Application: Summarization
Evaluation: Data
specify types of important information
use information extraction to identify this
• Domain-independent summarization: (Kupiec et represent a sentence using shallow features
use a classifier
Baselines for Summarization
Results: Summarization
• “Straw” baseline: n leading sentences • “State-of-the-art” Kupiec-style classifier: Sentence representation: lexical features and
Classifier: BoosTexter
Results: Ordering
Ordering: Learning Curve
Summarization: Learning Curve

Source: http://www.computing.dcu.ie/~ebicici/Week3/DomainTextStructure.pdf

Microsoft word - maglid_pil_be_d.doc

PACKUNGSBEILAGE FÜR DIE ÖFFENTLICHKEIT Bezeichnung Maglid Zusammensetzung Aluminii oxidum hydricum 200 mg - Magnesii hydroxydum 200 mg - Macrogolum 4000 - Magnesii stearas - Talcum - Menthae piperitae aetheroleum - Saccharum q.s. pro tabletta una. Pharmazeutische Form und Packung Tabletten zum Lutschen oder zum Kauen. Packung mit 48 Tabletten unter Blisterpackung. Abgab

poz.com

HIV and Heart HealtH It’s no secret that both HIV and antiretroviral treatment can cause problems that can increase the risk of cardiovascular disease, including heart attacks and strokes. However, QUICK TIPS there are many ways to protect your heart if you’re HIV positive, including selecting antiretrovirals carefully, monitoring your lipid levels, and doing your best to control class

© 2008-2018 Medical News