Tutorial Abstracts

Morning Session
AM1 Demonstration Projects in Clinical Informatics
AM2 Introduction to Factor Graphs and the Sum-product Algorith: Applications to Genome Tiling Microarrays and Gene Interaction Networks
AM3 Novel Visualization and Quantitative Analysis Methods in BioImaging
AM4 RNA-interference: the Short and Long of It
AM5 Computational Methods in MS-based Proteomics
Afternoon Session
PM6 Introduction to the Semantic Web for Bioinformatics
PM7 Structure Based Methods for Identifying Protein Function
PM8 Pattern Discovery in Sequences and Structures
PM9 Organizing and Understanding the Biological Data Deluge through Phylogenetics
PM10 Combinatorial and Statistical Approaches to Analyzing Biological Networks


Demonstration Projects in Clinical Informatics
Carol Cain, PhD
Agency for Healthcare Research and Quality

Expected goals and objectives:
Tutorial participants will gain a broad overview of current issues in applied clinical informatics, spanning detailed efforts such as innovative decision support systems to large-scale collaborative efforts to build regional health information exchanges. In addition, we will discuss some of the environmental drivers of such efforts, including new reimbursement models, legislation, regulation, and organizational change.

Intended audience:
This introductory tutorial is intended for a general audience of informatics researchers who would like to become familiar with the landscape of clinical informatics, particularly from a federal perspective. We will begin will the current environment and pressures that healthcare organizations face, such as the Medicare Modernization Act of 2003.

The tutorial will then describe clinical informatics projects across America. These include clinical decision support systems, bringing appropriate information to the point of care, patient-centered interventions, maintaining information, and the needs of non-traditional settings such as rural clinics. We conclude with issues which arise when linking large regional networks into a data exchange, including challenges of data matching, privacy, and standardization.

Although these projects vary widely in scope and content, the tutorial will be structured around research themes which arise when conducting applied research projects in highly complex environments. We will discuss challenges of study design, randomization, and evaluation. What are the appropriate metrics of success, and how should they be measured? How can the quality of data be verified? How do organizational factors influence subjects’ interactions with technology? How would future bioinformatics activities fit in? And finally, what are the open questions in clinical informatics where research activity is needed?

Carol Cain is a PhD graduate of Stanford’s biomedical informatics program. Her research interests include computational simulation of medical workflow, decision-theoretic cost-effectiveness analysis, and the impact of new technology on organizations. She is currently a health IT portfolio manager at the Agency for Healthcare Research and Quality (AHRQ), overseeing projects that demonstrate the value of health IT. She has served as a graduate TA, an ESL teacher, and frequently gives presentations as a representative of AHRQ.

Return to indexReturn to Program


Introduction to Factor Graphs and the Sum-Product Algorithm: Applications to Genome Tiling Microarrays and Gene Interaction Networks
Brendan J. Frey, PhD
Associate Professor, University of Toronto

While dynamic programming has proved to be a powerful tool in the analysis of low-complexity sequence data, many of the most compelling problems in molecular biology involve large numbers of long-range interactions. Recently, a generalization of dynamic programming, called the sum-product algorithm, has been used to solve long-standing, fundamental information processing problems, including Shannon-limit coding on the Gaussian channel and random satisfiability. The sum-product algorithm works by passing messages on edges in a "factor graph," which represents the potential interactions in the system of interest. Other graph-based models, including Bayesian networks and Markov random fields, can be represented and learned more efficiently using factor graphs.

In this tutorial, I will review factor graphs and the sum-product algorithm, and describe in detail how this method has been used to achieve leading-edge results on genome tiling microarray analysis and inference of biomolecular interaction networks. These two problems exemplify a common, difficult challenge in molecular biology: Revealing hidden variables that explain observed data.

Return to indexReturn to Program


Novel Visualization and Quantitative Analysis Methods in BioImaging
David Knowles, PhD
Scientist, Life Sciencs Division, Lawrence Berkeley National Laboratory

Recent developments in and increased availability of fluorescence microscopy are providing biologists with powerful new tools for studying cellular and macromolecular events under a new light. These new technologies beckon for novel developments in visualization and quantitative image analysis to aid in the extraction of the information hidden in the enormous amounts of high resolution, three-dimensional data generated. The goal of this tutorial is to present novel visualization and image analysis techniques currently being developed for two ongoing multidisciplinary projects in the Life Science Division of the Lawrence Berkeley National Lab.

The tutorial will provide an essential refresher on the underlying physical optics which link the physical and frequency domains and set the theoretical limit to image fidelity. It will cover recent developments in fluorescence microscopy including new fluorescence probes (QDots, nanoparticles & GFPs), and confocal techniques including 2-photon excitation, emission spectral analysis (Zeiss Meta Device), spinning disk techniques (Yokogawa’s CSU-10), line scanning acquisition (Zeiss LSM 5 Live), and Wilson’s grating imager (Zeiss ApoTome). The tutorial will then focus on a range of novel image analysis techniques. Techniques which provide automated segmentation of cells and nuclei from 3D images of tissue-structures and entire Drosophila embryo will be presented. Visualization techniques which are essential for qualitative understanding of the 3D data sets and the quantitative evaluation and development of segmentation techniques and feature extraction techniques will be presented. Model-based feature extraction methods which allow the quantification of gene expression and the spatial distribution of nuclear components will be presented. Shape-context registration techniques which allow multiple embryo images to be registered and overlaid will be presented.

The tutorial will conclude by presenting some of the latest biological findings resulting from the application of these techniques. Recent accomplishments that have shed light on how the nuclear organization within breast epithelia changes during the nonmalignant / malignant transformation and how gene expression analysis at cellular resolution is untangling the early transcriptional network of Drosophila, will be presented.

Dr. Knowles’ Teaching Experience:
  • Lecturer, from 1987 to 1989, in the Pathology Department at the University of British Columbia, of a third year level course in microscopy (Pathology 305, session 1). The course involved 2 hours of lectures and one laboratory period per week. Dr. Knowles was responsible for the course syllabus, the introduction of a laboratory section to the course, writing the exams and evaluation of the students.
  • Sessional Lecturer, 1992, in the Physics Department at the University of British Columbia, of a first year level course (Physics 110). The course involved 2 hours of lectures and 1 hour of tutorial per day, and 2 laboratory periods and 1 exam per week. Dr. Knowles was responsible for writing the lectures and exams, incorporating numerous demonstrations during the lectures, instructing and over-seeing the laboratory session and for all student marking and assessment.
  • Teaching Assistant, from 1987 to 1992, in the Physics Department at the University of British Columbia in teaching laboratories and tutorials for a variety of courses including, 1st year physics (Physics 110 and 115), 3rd year optics laboratory (Physics 307) and 4th year continuum mechanics (Physics 406).
  • Physics Instructor, from October 1993 to December 1993 I had a temporary appointment at the University College of the Cariboo in Kamloops B.C. I taught three degree level courses in physics (4th year Atomic physics, 3rd year Electricity and Magnetism and 3rd year Optics laboratory) as well as a first year laboratory. I was responsible for writing all the lectures, incorporating demonstrations during lectures, instruction during laboratory sessions and all student marking and assessment.
  • Student Statements from the 1992 Physics 110 Teacher Evaluation, University of British Columbia:
    "David is very energetic and sophisticated and manages to synthesize Physics 110 into an exciting course. His demonstrations, standing on tables, bouncing off the chalk boards and moving to create effect were particularly useful and stimulating."
    "I was extremely happy with the way this course was taught. Finally, physics makes sense and it was fun and interesting."
  • Student Statements from the 1993 Physics Teacher Evaluation, University College of the Cariboo:
    "Dave is very energetic, precise, respectable and honest. He explains what is necessary to be done to succeed and offers all assistance to his students."
    "David relates very well to people - talks to people at their level. He also shows the relations between various branches of physics."
    "David is an all round excellent instructor."

Return to indexReturn to Program


RNA-interference: the Short and Long of It
Michele Markstein, PhD
Postdoctoral Fellow, Department of Genetics/Howard Hughes Medical Institute, Harvard Medical School

RNA interference (RNAi) is becoming one of the most widely used methods in both academia and the biotech-industry to dissect the roles of every gene in the human, mouse, fly, and worm genomes. In short, RNAi works by preventing the flow of genetic information from messenger RNA (mRNA) to protein. The RNAi process is initiated by double-stranded RNAs (dsRNAs), which are recognized by the cell, complexed with proteins and cleaved into short RNAs on the order of 20 basepairs (bp). The 20 bp stretches of dsRNA are then unwound into single-stranded RNAs that target specific cellular mRNAs by the pattern recognition process of complimentary basepairing. dsRNAs have been designed against each gene in several animal genomes and used to study what happens when the expression of each corresponding gene is knocked down by the process of RNAi. Such whole genome RNAi approaches have been applied to many biological problems including studies of cancer, neurodegeneration, cell death, signaling networks, immunology, and the RNAi process itself.

While whole genome RNAi studies are very promising and powerful, there is an underlying computational problem in designing dsRNAs with specificity against single genes. As explained above, the recognition process involves basepairing between stretches of RNA on the order of about 20 basepairs. However, as with many biological processes, the pattern recognition process of RNAi is relaxed and allows for mismatches. Thus, many designed dsRNAs turn out to have several off-targets, making it impossible to interpret the outcomes of experiments with these dsRNAs. Although the pattern recognition rules of RNAi are not fully understood, there may be some clues in a class of small RNAs called microRNAs, which the cell produces to negatively regulate its own signaling networks. microRNAs are single-stranded RNAs that fold-back on themselves to create dsRNAs which then become incorporated into the RNAi pathway to terminate the expression of specific genes. By matching microRNAs with their targets, it is becoming possible to discern some of the finer rules of pattern recognition.

This tutorial will provide an overview of RNA biology and will focus on active research in: (1) predicting microRNAs and their targets, (2) applications of whole-genome RNAi in academia and industry, and (3) methods for designing highly specific dsRNA libraries.

Intended Audience:
This is an introductory RNA tutorial. Participants should be familiar with basic biological principles, such as how information flows from DNA to RNA to protein, and how DNA and RNA molecules recognize each other by complimentary basepairing. This background can be reviewed at the NCBI science primer website:

Michele Markstein is a postdoctoral fellow in the laboratory of Dr. Norbert Perrimon at Harvard Medical School. Michele is using the whole-genome RNAi facility
(http://flyrnai.org/) pioneered by the Perrimon lab to study a class of DNA regulatory elements called insulators, with the ultimate goal of understanding how genomes become partitioned into specific units of gene activity. The RNAi facility allows for high-throughput RNAi screening in
Drosophila tissue culture cells. While this proves to be a highly efficient and effective means for sorting out gene functions, it is limited by the somewhat artificial biology of cultured cells. It would therefore be useful to also apply RNAi to cells within their natural context, in whole living organisms. Toward this end, Michele is developing a new technology in whole living fruit flies, to systematically knock down the expression of every gene by RNAi.

Return to indexReturn to Program


Computational Methods in MS-based Proteomics
Bobbie-Jo Webb-Robertson, PhD
Senior Research Scientist, Pacific Northwest National Laboratory

In recent years, the advance of high-throughput (HTP) technologies and platforms for data-intensive computing are yielding more and more completed genomes. While the genome remains largely unchanged, the proteins in any particular cell change dramatically as genes are turned off and on in response to its environment. As proteins provide the structural and functional framework for cellular life, understanding the dynamic nature of their expression and interaction is a necessity in order to attain a comprehensive representation of biological systems. Traditionally, proteomics used two-dimensional polyacrylamide gel electrophoresis to generate protein maps of expression and/or quantitation. However, this approach only focuses on a small number of proteins and is low-throughput. Currently, technologies employing mass spectrometry (MS) have revolutionized proteomics by offering a platform on which to make HTP measurements that increase sensitivity and specificity at a global scale. This approach theoretically allows the full proteome (all proteins expressed by the genome at a given time) to be measured concurrently. Thus, this technology is fueling the current revolution in proteomics that is advancing the scope of biological research from a simple biochemical analysis of a protein to the characterization of the expression, function, and interaction of proteins on a global scale. However, this new HTP era of proteomics requires computational methods that can make inferences from both the raw and processed experimental data sources.

Return to indexReturn to Program


Introduction to the Semantic Web for Bioinformatics
Kenneth Baclawski, PhD
Associate Professor of Computer Science, College of Computer and Information Science, Northeastern University

Biologists heavily use the web, but the web is geared much more toward human interaction than automated processing. While the web gives biologists access to information, it does not allow them to easily integrate different data sources or to incorporate additional analysis tools. The Semantic Web addresses these problems by annotating web resources and by providing reasoning and retrieval facilities from heterogeneous sources.

This tutorial introduces the basic languages of the Semantic Web from the point of view of the life sciences, especially bioinformatics. The objective is to cover the major web ontology languages, what they mean and how they are used. The emphasis will be on pragmatic application issues. The goal is for participants to have a understanding of the Semantic Web sufficient for them to be able to make decisions about whether and how to use the Semantic Web.

Ken Baclawski is an Associate Professor of Computer Science at Northeastern University. He is also affiliated with the Division of Preventive Medicine of Brigham and Women’s Hospital at the Harvard Medical School. His primary research area is formal ontologies, and he has been actively working in the area of biomedical ontologies since 1992. Prof. Baclawski has been active in the development of the Semantic Web since it was first proposed, being part of the team that developed the DAML+OIL language, later renamed the Web Ontology Language (OWL).

Prof. Baclawski and Prof. Tianhua Niu of the Harvard Medical School have written a book on the subject of the proposed tutorial, titled
Ontologies for Bioinformatics. This book has been accepted for publication by the MIT Press as part of their series on Computational Molecular Biology. The book is scheduled to appear in June, 2005.

Return to indexReturn to Program


Structure Based Methods for Identifying Protein Function
Mike Liang, PhD candidate
Biomedical Informatics Training Program, Stanford University
D. Rey Banatao, PhD
NSF Postdoctoral Fellow, Department of Chemistry and Biochemistry, University of California, Los Angeles

Atomic resolution structures of biomolecules (proteins and nucleic acids) provide great insight into the chemistry that allows proteins to function and interact. Structural genomics initiatives are aimed at the high-throughput determination of 3D protein structures. Increasingly available 3D data can provide significant insight in determining the molecular function of proteins as well as identifying important functional sites that could be useful for drug targeting. This large volume of 3D structures requires new computational methods to provide rapid analysis and functional annotation of the data.

This tutorial will provide the following background:

  • Brief review of basic principles in 3D structure of proteins
  • Brief overview of 3D structure and function data sources
  • Brief presentation on 3D molecular visualization tools for both web-based and off-line analysis
The majority of the tutorial will focus on methods for inferring protein functional sites from 3D structure including those based on:
  • distance
  • orientation
  • surface geometry
  • physicochemical properties

Participants will leave with a solid understanding of the basic concepts of 3D protein structures, the available data sources for structure-function analysis, the tools available for analysis, and the basic principles behind the tools.

Mike Liang is a Ph.D. candidate in the Biomedical Informatics Program at Stanford University. His research interests lie primarily in annotating likely functional sites in protein structures. His current research is on automatic identification of conserved physicochemical properties around functional sites in 3D structures of proteins (http://feature.stanford.edu/). Liang received his B.S. in Computer Science with a minor in Chemistry from University of California, San Diego.

Dr. Rey Banatao is an NSF Postdoctoral Fellow in the Yeates Lab in the Department of Chemistry and Biochemistry and the California Nanosystems Institute at the University of California Los Angeles. His research interests are in protein design using computational and experimental methods with possible applications in biomaterials and nanotechnology. Dr. Banatao recieved his B.A. in Biochemistry and Molecular Biology from U.C. Berkeley and his Ph.D. in Biological and Medical Informatics from U.C. San Francisco.

Return to indexReturn to Program


Pattern Discovery in Sequences and Structures
Giri Narasimhan, PhD
Bioinformatics Research Group (BioRG), Florida International University

Many fundamental problems in bioinformatics can be cast as a problem of pattern discovery. Pattern discovery can be supervised or unsupervised. Here we will survey existing techniques for pattern discovery and discuss several bioinformatics applications. The techniques to be discussed will include:

  • Basic string algorithms
  • Profiles and profile HMMs
  • Gibbs Sampling
  • Combinatorial approaches
  • Data mining approaches
Applications include:
  • Motif discovery in proteins
  • Detecting regulatory elements in DNA sequences
Since supervised pattern discovery requires the design of a training set, we will discuss implications of the choice of training sets, both positive and negative. Finally, we will discuss approaches to do pattern discovery in protein structures and the concept of sequence-structure patterns.

Giri Narasimhan heads the Bioinformatics Research Group (BioRG) and is a Professor in the School of Computer Science at Florida International University, Miami. He received his B-Tech in Electrical Engineering from the Indian Institute of Technology in Bombay, India, and his PhD in 1989 from the University of Wisconsin – Madison. From 1989 to 2001 he held a faculty position at the University of Memphis, Tennessee. He has written over 60 research articles. His research interests include Geometric and graph algorithms, and problems in machine learning, biotechnology and bioinformatics. For more information, visit the URL: http://biorg.cs.fiu.edu

Return to indexReturn to Program


Organizing and Understanding the Biological Data Deluge through Phylogenetics
Indra Neil Sarkar, PhD
Bioinformatics Associate, Division of Invertebrate Zoology, American Museum of Natural History

Advancements in computational and sequencing techniques have led to the availability of massive amounts of biological information pertaining to organisms that span the entire tree of life. However, the organization, representation, and annotation of these volumes of data (available from disparate resources in a wide range of forms) pose a significant challenge for the research community. This tutorial will discuss the various phylogenetic methods that are used for the organization and annotation of biological information. Additionally, there will be discussion about how to use phylogenetic techniques to organize a range of disparate data types ranging from genotypic to phenotypic information. There will be an overview of many of the available data types that can be used for phylogenetic inferencing. Finally, applied phylogenetic approaches, such as correlative hypothesis generation and heuristic character-based approaches for phylogenetic classification, will be discussed. The significant challenges in the design and use of phylogenetic methods will be an underlying theme, posing rich theoretical research questions throughout the tutorial.

Return to indexReturn to Program


Combinatorial and Statistical Approaches to Analyzing Biological Networks
Eric Xing, PhD
Assistant Professor, School of Computer Science, Carnegie Mellon University
Roded Sharan, PhD
Senior Lecturer, School of Comptuer Science, Tel-Aviv University

High-throughput technologies enable the systematic assaying of transcript and protein abundance, physical, regulatory and genetic interactions among proteins, and the biochemical, morphological and epigenetic states of the cell. These measurements promise detailed mechanistic pictures of complex cellular processes, challenging conventional biostatistical and computational methods for comprehending, manipulating, and querying such vast body of data from diverse sources. The multi-aspect, genome-wide data of biological signals underlying regulatory and signaling circuitry can be naturally modeled by a graph, or a network. Rich information regarding the dependencies, interactions, function and conservation of bio-molecules can be extracted from such data based on combinatorial and graph theoretic analyses. Furthermore, recent developments of graphical models---a formalism that exploits the conjoined talents of graph theory and probability theory---provide a powerful language to define expressive distributions of the data, and a systematic computational framework for probabilistic inference.

In this tutorial we will review the emerging field of network biology and survey recent graph-theoretic and statistical machine learning approaches to dissecting protein networks and microarray data, including graph detection algorithms, inference and learning algorithms for Bayesian networks and Markov random fields, and techniques for data integration.

We will demonstrate the application of these methods to analyzing protein-protein interaction networks and transcriptional regulatory networks.

Intended audience:
Researchers in computational biology, systems biology, sequence analysis, machine learning, combinatorial optimization and Bayesian statistics. A graduate level knowledge of computer algorithms, and probability/statistical theory would be helpful but not required for most of the material to be covered.

Return to indexReturn to Program