Speaker Abstracts


Semantic Enrichment of the Biomedical Literature
Sophia Ananiadou
National Centre for Text Mining, University of Manchester

Abstract One of the bottlenecks of biological data integration is linking available databases, ontologies, pathways to evidence from the vast amount of scientific literature. Text mining techniques such as named entity recognition, event and relation extraction add layers of semantic annotation to documents thus linking text to biological knowledge. Bio-text mining applications such as biological information extraction, semantic searching of large document collections, enrichment of biological networks, etc., depend on the availability of text mining tools and resources. Resources such as bio-lexica and biologically annotated corpora provide the means of linking data with literature. I will discuss examples of such bio-text mining applications provided by the National Centre for Text Mining and methodologies of enriching data from the literature.


Combining text mining with the 'omics: Between the Devil and the Deep Blue Sea
Douglas Armstrong
Edinburgh Centre for Bioinformatics, University of Edinburgh

Abstract Advances in genome and proteome science offer unprecedented opportunities to explore the nervous system. The Genes to Cognition programme (G2C) was established to provide a framework for studying genes, brain and behaviour in order to link basic molecular research from genomes and experimental genetic organisms with human clinical studies of cognition. We have collected raw datasets from in-house platform technologies and integrated data mined from the literature in the areas of psychiatry, human and mouse psychology, cellular neurophysiology and cell biology, proteomics and biochemistry, molecular biology, human and mouse genetics and genomics.


Technological and Commercial Perspectives on the TXM Text Mining Programme
Alan Hale
ITI Life Sciences


Abstract As Big Pharma are constantly seeking to reduce time to market, analysts (e.g., database curators) currently spend much of their time searching through the ever expanding volume of scientific papers (>12 million exist). ITI Life Sciences has invested substantially over three years to address the challenge of finding and interpreting information of interest from the ever increasing volume of published material. Our TXM Platform Technology facilitates the creation of searchable databases, giving easy access to critical information extracted from these papers. The technology uses Natural Language Processing (NLP) techniques developed by the Language Technology Group at the University of Edinburgh and bioinformatics firm Cognia. The platform technology involves NLP technology for workflow management from journal article to database entry, including the identification of relevant documents and the harvesting and collation of factual information into easily searchable formats (i.e., databases).


BioPharma Information Needs and Production-oriented Literature Informatics
William Hayes
Biogen Idec


Abstract Information needs and production capabilities of text analytics and infrastructure drive the utility of text analytics technologies in bioPharma as in every industry.  It takes a great deal of basic infrastructure to deal with large document collections that continue to grow, continuously evolving ontologies, and continous information streams (news articles, etc).  Beyond basic management of the raw material, one needs text analytics technologies that fit into a framework to allow for integration.  Further, the results of text mining are non-trivial to manage as regards information delivery that is collaborative, re-usable and integratable.  This presentation will discuss the challenges and some of the successes found in bioPharma as one example of a customer of text mining in the biomedical community.


Automated Aides for Generating Scientific Insights
Lawrence Hunter
University of Colorado Denver School of Medicine


Abstract The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. However, effective design and implementation of computational tools that genuinely facilitate the generation of novel and significant scientific insights remains poorly understood. In this talk, I will describe a set of efforts that combines natural language processing for information extraction and graphical network models for semantic data integration into a system that has recently played a pivotal role in making a significant discovery, and also discuss how it might be possible to compare and evaluate such systems.


Text Mining at Thomson: Medical Litigator and BIOSIS
Peter Jackson
Thomson Corporation
Tim Miller
Thomson Scientific


Abstract This talk describes two different applications of text mining at The Thomson Corporation, one in the field of medical information and one in the field of biological information. Medical Litigator is an internally developed information service, now on Westlaw, which connects attorneys to medical information by a combination of query processing and automatic linking. Underlying technology employs the open source GATE toolkit for text processing and freely available metadata resources such as UMLS. In the field of scientific information, we describe a project to index biological information from the 1920 to the 1960s employing third party software from Temis & MONDECA to apply post-coordinate indexing to the data and open up valuable early material to researchers. We conclude with an indication of the level of effort and expertise required in house to deliver these two applications successfully.


FlySlip: Integrating Text Mining with FlyBase Curation
Nikiforos Karamanis
NLIP group / Flybase, University of Cambridge


Abstract In this talk, I will present an overview of the FlySlip project, which aimed to produce advanced technology for biomedical text mining and integrate it with the FlyBase curation paradigm. I will start by presenting the modules we developed to mine the biomedical literature. Then, I will discuss how a curation interface which makes use of this technology was developed and evaluated under a user-centered approach.


Biomedical Annotation and Information Extraction at the University of Pennsylvania (Provisional Title)
Mark Liberman
Linguistic Data Consortium / University of Pennsylvania

Abstract


Biomedical Text Mining at the University of Edinburgh: From research to reality
Michael Matthews
School of Informatics, University of Edinburgh


Abstract This presentation will start with an overview of the current work taking place at the University of Edinburgh in biomedical text mining. The talk with then look in more detail at the TXM project, a three year collaborative R&D programme funded by ITI Life sciences and carried out jointly with Cognia EU Ltd, which aims to use text mining techniques for assisted database curation. The conclusions highlight the need for supplementing the standard intrinsic measures such as precision, recall and F1 with equally important extrinsic measures designed to evaluate system performance in real world tasks.


Terrier takes on Biomedical Texts
Iadh Ounis
Department of Computer Science, University of Glasgow

Abstract In this talk we report experiments conducted using Terrier to identify whether standard Information Retrieval techniques can be effectively used on biomedical texts. In particular, we address the important functionality of automatically expanding the queries for a better retrieval performance. Experiments with the standard TREC Genomics test collection, containing over 4.5 million of Medline abstracts, demonstrates the robustness and effectiveness of the core statistics-driven retrieval technology implemented by Terrier.


Applying Natural Language Processing to Clinical Delivery
John Pestian
Cincinnati Children's Hospital, University of Cincinnati

Abstract This presentation will review some of natural language processing initiatives at Cincinnati Children's Hospital Medical Center (CCHMC), Cincinnati OH.  Three particular initiatives that are in various stages of development will be discussed: the development of artificial experts for personalized medicine feature selection, the results of an international shared-task in developing automatic label assignment and discourse analysis of suicide notes. With more than 350 pediatric faculty members, CCHMC is one of the top three pediatric centers in the USA. Over 800,000 patients are seen and approximately $1billion in pediatric care and research is conducted annually.