Archived Content

Cambridge Healthtech Institute’s Third Annual
Sequencing Data Storage and Management
IT Infrastructures to Support Data Intensive Science
March 7-8, 2012| Hilton San Diego Resort | San Diego, California 

Day 1 | Day 2 | Download Brochure 


Sponsored by
7:30 am Breakfast Presentation: A Scalable Unified Architecture for NGS Data Storage and Management

Jose L. Alvarez, World Wide Director Life Sciences, Data Direct Networks, Inc.

DDN will highlight a scalable, high performance unified data storage and management solution that will simplify and accelerate Genomic research. Instruments data ingestion, sequencing pipeline performance and intelligent archive and sharing of research data in a geo-distributed model will be discussed. DDN is partnering with NGS industry leaders to deliver an array of flexible, highly scalable and easy to manage storage solutions that are helping research groups around the world accelerate their time to discovery.

8:00 Successful Sequencing Discussion Groups

These focused groups are designed for conference attendees to discuss important and interesting topics related to sequencing and genomic tools. These are moderated discussions with brainstorming and interactive problem solving, allowing conference participants from diverse areas to exchange ideas, experiences, and develop future collaborations around a focused topic. Complimentary coffee is included.

View Successful Sequencing Discussion Groups 

Data Transfer/Storage 

8:45 Chairperson's Remarks

Krishna Sankhavaram, Director, Research Information Systems and Technology, University of Texas MD Anderson Cancer Center


» 8:50 Featured Speaker: 

SDSC's Next-Generation Cyberinfrastructure for "Big Data" Applications

Michael L. Norman, Ph.D., Director and Chief Scientific Officer, San Diego Supercomputer Center

The San Diego Supercomputer Center (SDSC) at the University of California, San Diego is deploying a range of new computing, storage, and networking resources to cope with the exponential growth of research data in many disciplines. Among these are the flash-based Gordon data-intensive supercomputer (, and SDSC Cloud—a high performance data preservation and sharing cloud ( These and other resources at SDSC are designed for "Big Data" applications of any kind, including NGS. I describe our next-generation CI and its potential applications to NGS based on our work with genomics researchers at UCSD and TSRI.

9:25 G-SQueeZ™: Genomic Sequence-Quality Data Compression and Access

Waibhav "Amol" Tembe, Ph.D., Head, Bioinformatics Center,The Translational Genomics Research Institute

G-SQueeZ, a Huffman coding-based sequencing-reads specific lossless representation approach, compresses sequence reads and provides selective access via indexing without altering the relative order. Data compression from 68% to 81% has been obtained on benchmark datasets, and we have internally successfully tested integration with B-FAST, a popular open-source short sequencing reads alignment software. While G-SQueeZ can function as standalone software, the true impact of G-SQueeZ™ can be realized by integrating into other analysis workflows as a linkable library.

Sponsored by
10:00 The X-factor for Sequencing in the Clouds: Trust

Sanjay Joshi, Solutions Architect, Life Sciences, EMC Isilon Storage Division

In spite of wishful thinking that all humans worldwide would readily give up their genome map to share without strings attached, security and personally identifiable information contained within the genome will remain a technology and regulatory issue for generations. We will present a Cloud Trust Framework for Sequencing in the Clouds – this would include public and private clouds and its hybrid models.

10:15 Networking Coffee Break in the Exhibit Hall with Poster Viewing

10:45 File Transfer Capabilities with Globus Online

Ian Foster, Ph.D., Director, Computation Institute, Argonne National Lab

11:20 Storing and Sharing NGS Data in a Medical Setting

Krishna Sankhavaram, Director, Research Information Systems and Technology, University of Texas MD Anderson Cancer Center


» 11:55 Featured Speaker: 

Managing Research and Clinical NGS Data in a Decentralized Biomedical Environment

Brent Richter, Ph.D., Director, Enterprise Research IS, Partners Healthcare

Next-Generation sequencing data and its analysis has been a burden for Academic Medical Centers (AMC) for several years.  As the research organizations and core facilities have adjusted to keep pace with the introduction and expansion of these technologies, new business drivers have developed to continue to place pressure on these same organizations.  From the acquisition and integration of additional "big" data sets, such as functional and clinical data, within research to the clinical "translation" of these systems for applications involving clinical samples, the challenges for the AMC are myriad.  After a brief introduction to the environment and drivers of these technologies at the hospitals of Massachusetts General, Brigham and Women's and Mclean, the challenges, ranging from technology management to service levels, will be explored in detail.

Sponsored by
12:30 pm Luncheon Presentation 
High-Speed Data Movement for Effective Global Collaboration in Genomic Research
Daniel Kumi, Director, New Market Development, AsperaIn order to collaborate effectively, international scientific organizations need to implement large-scale computing and networking infrastructures (on-premise, cloud or hybrid), select appropriate network-attached storage systems and integrate the necessary high-speed transport technologies to power the collection and distribution of terabytes of genomic sequencing data to researchers globally. In this session, learn about best practices, requirements and challenges of such IT infrastructure designs and how Asperapowers ultra high-speed data movement in support of global research efforts.

Speeding Up Data Intensive De Novo Assembly

1:45 Chairperson's Remarks

Tom Schwei, Vice President and General Manager, DNASTAR, Inc.

1:50 Metagenomics, from the Bottom Up

Shunsheng (Cliff) Han, Ph.D., Team Leader, Research and Development, Bioscience Division, Los Alamos National Laboratory  

The DOE Joint Genome Institute at Los Alamos National Laboratory has a specialized role to assemble and analyze both single microbial genomes and metagenomes. Our goal for this project was to integrate tools and techniques to probe the dynamics of microbial communities. We have developed, integrated, and systematically applied two complementary technologies to explore spatial heterogeneity and temporal response to changes in temperature and precipitation at an experimental field site in Utah and in manipulated experiments in the laboratory. Our novel analysis technique used 20 million signatures, each with 10 contiguous amino acids from reference genomes to reconstruct phylogenetic and functional profiles from the metagenomic DNA sequences.

2:25 Approaches for Scaling De Novo Assembly of Metagenomic Sequencing 

Adina Howe, Ph.D., Research Scientist, Center for Microbial Ecology, Michigan State University

Short-read sequencing technologies are providing unprecedented opportunities to deeply sequence microbial communities in order to characterize complex environments. De novo metagenome assembly, however, is limited by both variable abundance of source organisms within the environment and scalability. We present a compressible de Bruijn graph representation that enables us to explore graph structures and scale de novo assembly by up to 10x in memory and time. 

3:00 Selected Poster Presentation: Evaluation of Deep Exome Sequencing for the Discovery of Novel SNPs and De Novo Mutations by Using the Mouse Model

Yoichi Gondo, Ph.D., Team Leader, Mutagenesis and Genomics Team, RIKEN BioResource Center


3:15 Networking Refreshment Break, Last Chance for Poster and Exhibit Viewing

Sponsored by
4:00 Poster Awards

4:15 Bayesian Assembly of Reads from High-Throughput Sequencing

Jonathan Laserson, Research Scientist in Koller Lab, Department of Computer Science, Stanford University

The high-throughput sequencing revolution allows us to take millions of noisy short reads from the DNA in a sample. To recover the true genomes, these reads are assembled by algorithms exploiting their high coverage and overlap. I focus on two scenarios for sequence assembly. The first is de novo assembly, where the reads come from an unknown and diverse population of genomes. The second is variant assembly, where the reads come from short but clonally related genomes, only slightly mutated from each other. In both cases I use the same principled Bayesian approach to design an algorithm that uncovers the composition of the genomic sequences that produced the reads. I will demonstrate the algorithms' performance on real data from various metagenomic environments, and from immune system cells from both healthy and malignant tissues.

4:50 Panel Discussion with Afternoon Speakers

5:30 Close of Conference


Day 1 | Day 2 | Download Brochure 

Japan-Flag  Korea-Flag  China-Simplified-Flag  China-Traditional-Flag  

Final Days to Register 

Download Brochure

2013 XGEN Congress Download Form
2013 Final Brochure 

Stay Connected




View All Sponsors 

View Media Partners 

XGN Genome Bottle  

NGS for Drugs, Patients and Clinical Trials  

NGS Leaders