Archived Content
Cambridge Healthtech Institute’s Third Annual
Sequencing Data Storage and Management
IT Infrastructures to Support Data Intensive Science
March 7-8, 2012| Hilton San Diego Resort | San Diego, California
Day 1 | Day 2 | Download Brochure
THURSDAY, MARCH 8
Sponsored by
7:30 am Breakfast Presentation: A Scalable Unified Architecture for NGS Data Storage and Management
Jose L. Alvarez, World Wide Director Life Sciences, Data Direct Networks, Inc.
DDN will highlight a scalable, high performance unified data storage and management solution that will simplify and accelerate Genomic research. Instruments data ingestion, sequencing pipeline performance and intelligent archive and sharing of research data in a geo-distributed model will be discussed. DDN is partnering with NGS industry leaders to deliver an array of flexible, highly scalable and easy to manage storage solutions that are helping research groups around the world accelerate their time to discovery.
8:00 Successful Sequencing Discussion Groups
These focused groups are designed for conference attendees to discuss important and interesting topics related to sequencing and genomic tools. These are moderated discussions with brainstorming and interactive problem solving, allowing conference participants from diverse areas to exchange ideas, experiences, and develop future collaborations around a focused topic. Complimentary coffee is included.
View Successful Sequencing Discussion Groups
8:45 Chairperson's Remarks
Krishna Sankhavaram, Director, Research Information Systems and Technology, University of Texas MD Anderson Cancer Center
» 8:50 Featured Speaker:
SDSC's Next-Generation Cyberinfrastructure for "Big Data" Applications
Michael L. Norman, Ph.D., Director and Chief Scientific Officer, San Diego Supercomputer Center
The San Diego Supercomputer Center (SDSC) at the University of California, San Diego is deploying a range of new computing, storage, and networking resources to cope with the exponential growth of research data in many disciplines. Among these are the flash-based Gordon data-intensive supercomputer (http://gordon.sdsc.edu), and SDSC Cloud—a high performance data preservation and sharing cloud (http://cloud.sdsc.edu). These and other resources at SDSC are designed for "Big Data" applications of any kind, including NGS. I describe our next-generation CI and its potential applications to NGS based on our work with genomics researchers at UCSD and TSRI.
|
9:25 G-SQueeZ™: Genomic Sequence-Quality Data Compression and Access
Waibhav "Amol" Tembe, Ph.D., Head, Bioinformatics Center,The Translational Genomics Research Institute
G-SQueeZ, a Huffman coding-based sequencing-reads specific lossless representation approach, compresses sequence reads and provides selective access via indexing without altering the relative order. Data compression from 68% to 81% has been obtained on benchmark datasets, and we have internally successfully tested integration with B-FAST, a popular open-source short sequencing reads alignment software. While G-SQueeZ can function as standalone software, the true impact of G-SQueeZ™ can be realized by integrating into other analysis workflows as a linkable library.
Sponsored by
10:00 The X-factor for Sequencing in the Clouds: Trust
Sanjay Joshi, Solutions Architect, Life Sciences, EMC Isilon Storage Division
In spite of wishful thinking that all humans worldwide would readily give up their genome map to share without strings attached, security and personally identifiable information contained within the genome will remain a technology and regulatory issue for generations. We will present a Cloud Trust Framework for Sequencing in the Clouds – this would include public and private clouds and its hybrid models.
10:15 Networking Coffee Break in the Exhibit Hall with Poster Viewing
10:45 File Transfer Capabilities with Globus Online
Ian Foster, Ph.D., Director, Computation Institute, Argonne National Lab
11:20 Storing and Sharing NGS Data in a Medical Setting
Krishna Sankhavaram, Director, Research Information Systems and Technology, University of Texas MD Anderson Cancer Center
» 11:55 Featured Speaker:
Managing Research and Clinical NGS Data in a Decentralized Biomedical Environment
Brent Richter, Ph.D., Director, Enterprise Research IS, Partners Healthcare
Next-Generation sequencing data and its analysis has been a burden for Academic Medical Centers (AMC) for several years. As the research organizations and core facilities have adjusted to keep pace with the introduction and expansion of these technologies, new business drivers have developed to continue to place pressure on these same organizations. From the acquisition and integration of additional "big" data sets, such as functional and clinical data, within research to the clinical "translation" of these systems for applications involving clinical samples, the challenges for the AMC are myriad. After a brief introduction to the environment and drivers of these technologies at the hospitals of Massachusetts General, Brigham and Women's and Mclean, the challenges, ranging from technology management to service levels, will be explored in detail.
|
Sponsored by
12:30 pm Luncheon Presentation
High-Speed Data Movement for Effective Global Collaboration in Genomic ResearchDaniel Kumi, Director, New Market Development, AsperaIn order to collaborate effectively, international scientific organizations need to implement large-scale computing and networking infrastructures (on-premise, cloud or hybrid), select appropriate network-attached storage systems and integrate the necessary high-speed transport technologies to power the collection and distribution of terabytes of genomic sequencing data to researchers globally. In this session, learn about best practices, requirements and challenges of such IT infrastructure designs and how Asperapowers ultra high-speed data movement in support of global research efforts.
1:45 Chairperson's Remarks
Tom Schwei, Vice President and General Manager, DNASTAR, Inc.
1:50 Metagenomics, from the Bottom Up
Shunsheng (Cliff) Han, Ph.D., Team Leader, Research and Development, Bioscience Division, Los Alamos National Laboratory
The DOE Joint Genome Institute at Los Alamos National Laboratory has a specialized role to assemble and analyze both single microbial genomes and metagenomes. Our goal for this project was to integrate tools and techniques to probe the dynamics of microbial communities. We have developed, integrated, and systematically applied two complementary technologies to explore spatial heterogeneity and temporal response to changes in temperature and precipitation at an experimental field site in Utah and in manipulated experiments in the laboratory. Our novel analysis technique used 20 million signatures, each with 10 contiguous amino acids from reference genomes to reconstruct phylogenetic and functional profiles from the metagenomic DNA sequences.
2:25 Approaches for Scaling De Novo Assembly of Metagenomic Sequencing
Adina Howe, Ph.D., Research Scientist, Center for Microbial Ecology, Michigan State University
Short-read sequencing technologies are providing unprecedented opportunities to deeply sequence microbial communities in order to characterize complex environments. De novo metagenome assembly, however, is limited by both variable abundance of source organisms within the environment and scalability. We present a compressible de Bruijn graph representation that enables us to explore graph structures and scale de novo assembly by up to 10x in memory and time.
3:00 Selected Poster Presentation: Evaluation of Deep Exome Sequencing for the Discovery of Novel SNPs and De Novo Mutations by Using the Mouse Model
Yoichi Gondo, Ph.D., Team Leader, Mutagenesis and Genomics Team, RIKEN BioResource Center
3:15 Networking Refreshment Break, Last Chance for Poster and Exhibit Viewing
Sponsored by
4:00 Poster Awards
4:15 Bayesian Assembly of Reads from High-Throughput Sequencing
Jonathan Laserson, Research Scientist in Koller Lab, Department of Computer Science, Stanford University
The high-throughput sequencing revolution allows us to take millions of noisy short reads from the DNA in a sample. To recover the true genomes, these reads are assembled by algorithms exploiting their high coverage and overlap. I focus on two scenarios for sequence assembly. The first is de novo assembly, where the reads come from an unknown and diverse population of genomes. The second is variant assembly, where the reads come from short but clonally related genomes, only slightly mutated from each other. In both cases I use the same principled Bayesian approach to design an algorithm that uncovers the composition of the genomic sequences that produced the reads. I will demonstrate the algorithms' performance on real data from various metagenomic environments, and from immune system cells from both healthy and malignant tissues.
4:50 Panel Discussion with Afternoon Speakers
5:30 Close of Conference
Day 1 | Day 2 | Download Brochure