| » | FTHK 2008 Home |
Speaker Abstracts
Semantic Enrichment of the Biomedical Literature
Sophia Ananiadou
National Centre for Text Mining, University of Manchester
Abstract One of the bottlenecks of biological data
integration is linking available databases, ontologies, pathways
to evidence from the vast amount of scientific literature. Text
mining techniques such as named entity recognition, event and
relation extraction add layers of semantic annotation to
documents thus linking text to biological knowledge. Bio-text
mining applications such as biological information extraction,
semantic searching of large document collections, enrichment of
biological networks, etc., depend on the availability of text
mining tools and resources. Resources such as bio-lexica and
biologically annotated corpora provide the means of linking data
with literature. I will discuss examples of such bio-text mining
applications provided by the National Centre for Text Mining and
methodologies of enriching data from the literature.
Combining text mining with the 'omics: Between the Devil and the
Deep Blue Sea
Douglas Armstrong
Edinburgh Centre for Bioinformatics, University of Edinburgh
Abstract Advances in genome and proteome science offer
unprecedented opportunities to explore the nervous system. The
Genes to Cognition programme (G2C) was established to provide a
framework for studying genes, brain and behaviour in order to
link basic molecular research from genomes and experimental
genetic organisms with human clinical studies of cognition. We
have collected raw datasets from in-house platform technologies
and integrated data mined from the literature in the areas of
psychiatry, human and mouse psychology, cellular neurophysiology
and cell biology, proteomics and biochemistry, molecular
biology, human and mouse genetics and genomics.
Technological and Commercial Perspectives on the TXM Text Mining Programme
Alan Hale
ITI Life Sciences
Abstract As Big Pharma are constantly seeking to reduce
time to market, analysts (e.g., database curators) currently
spend much of their time searching through the ever expanding
volume of scientific papers (>12 million exist). ITI Life
Sciences has invested substantially over three years to address
the challenge of finding and interpreting information of
interest from the ever increasing volume of published material.
Our TXM Platform Technology facilitates the creation of
searchable databases, giving easy access to critical information
extracted from these papers. The technology uses Natural
Language Processing (NLP) techniques developed by the Language
Technology Group at the University of Edinburgh and
bioinformatics firm Cognia. The platform technology involves
NLP technology for workflow management from journal article to
database entry, including the identification of relevant
documents and the harvesting and collation of factual
information into easily searchable formats (i.e., databases).
BioPharma Information Needs and Production-oriented Literature
Informatics
William Hayes
Biogen Idec
Abstract Information needs and production capabilities of
text analytics and infrastructure drive the utility of text
analytics technologies in bioPharma as in every industry. It
takes a great deal of basic infrastructure to deal with large
document collections that continue to grow, continuously
evolving ontologies, and continous information streams (news
articles, etc). Beyond basic management of the raw material,
one needs text analytics technologies that fit into a framework
to allow for integration. Further, the results of text mining
are non-trivial to manage as regards information delivery that
is collaborative, re-usable and integratable. This presentation
will discuss the challenges and some of the successes found in
bioPharma as one example of a customer of text mining in the
biomedical community.
Automated Aides for Generating Scientific Insights
Lawrence Hunter
University of Colorado Denver School of Medicine
Abstract The profusion of high-throughput instruments and
the explosion of new results in the scientific literature,
particularly in molecular biomedicine, is both a blessing and a
curse to the bench researcher. Even knowledgable and
experienced scientists can benefit from computational tools that
help navigate this vast and rapidly evolving terrain. However,
effective design and implementation of computational tools that
genuinely facilitate the generation of novel and significant
scientific insights remains poorly understood. In this talk, I
will describe a set of efforts that combines natural language
processing for information extraction and graphical network
models for semantic data integration into a system that has
recently played a pivotal role in making a significant
discovery, and also discuss how it might be possible to compare
and evaluate such systems.
Text Mining at Thomson: Medical Litigator and BIOSIS
Peter Jackson
Thomson Corporation
Tim Miller
Thomson Scientific
Abstract This talk describes two different applications
of text mining at The Thomson Corporation, one in the field of
medical information and one in the field of biological
information. Medical Litigator is an internally developed
information service, now on Westlaw, which connects attorneys to
medical information by a combination of query processing and
automatic linking. Underlying technology employs the open
source GATE toolkit for text processing and freely available
metadata resources such as UMLS. In the field of scientific
information, we describe a project to index biological
information from the 1920 to the 1960s employing third party
software from Temis & MONDECA to apply post-coordinate indexing
to the data and open up valuable early material to researchers.
We conclude with an indication of the level of effort and
expertise required in house to deliver these two applications
successfully.
FlySlip: Integrating Text Mining with FlyBase Curation
Nikiforos Karamanis
NLIP group / Flybase, University of Cambridge
Abstract In this talk, I will present an overview of the
FlySlip project, which aimed to produce advanced technology for
biomedical text mining and integrate it with the FlyBase
curation paradigm. I will start by presenting the modules we
developed to mine the biomedical literature. Then, I will
discuss how a curation interface which makes use of this
technology was developed and evaluated under a user-centered
approach.
Biomedical Annotation and Information Extraction at the
University of Pennsylvania (Provisional Title)
Mark Liberman
Linguistic Data Consortium / University of Pennsylvania
Abstract
Biomedical Text Mining at the University of Edinburgh: From
research to reality
Michael Matthews
School of Informatics, University of Edinburgh
Abstract This presentation will start with an overview of
the current work taking place at the University of Edinburgh in
biomedical text mining. The talk with then look in more detail
at the TXM project, a three year collaborative R&D programme
funded by ITI Life sciences and carried out jointly with Cognia
EU Ltd, which aims to use text mining techniques for assisted
database curation. The conclusions highlight the need for
supplementing the standard intrinsic measures such as precision,
recall and F1 with equally important extrinsic measures designed
to evaluate system performance in real world tasks.
Terrier takes on Biomedical Texts
Iadh Ounis
Department of Computer Science, University of Glasgow
Abstract In this talk we report experiments conducted
using Terrier to identify whether standard Information Retrieval
techniques can be effectively used on biomedical texts. In
particular, we address the important functionality of
automatically expanding the queries for a better retrieval
performance. Experiments with the standard TREC Genomics test
collection, containing over 4.5 million of Medline abstracts,
demonstrates the robustness and effectiveness of the core
statistics-driven retrieval technology implemented by Terrier.
Applying Natural Language Processing to Clinical Delivery
John Pestian
Cincinnati Children's Hospital, University of Cincinnati
Abstract This presentation will review some of natural
language processing initiatives at Cincinnati Children's
Hospital Medical Center (CCHMC), Cincinnati OH. Three
particular initiatives that are in various stages of development
will be discussed: the development of artificial experts for
personalized medicine feature selection, the results of an
international shared-task in developing automatic label
assignment and discourse analysis of suicide notes. With more
than 350 pediatric faculty members, CCHMC is one of the top
three pediatric centers in the USA. Over 800,000 patients are
seen and approximately $1billion in pediatric care and research
is conducted annually.


