SC12 - The International Conference for High Performance Computing, Networking, Storage and Analysis

Salt Lake City, Utah, USA

10-16 November 2012

Exhibition Dates: 12-15 November 2012

Papers By Session

ACM Gordon Bell Prize I
Toward Real-Time Modeling of Human Heart Ventricles at Cellular Resolution: Simulation of Drug-Induced Arrhythmias
Billion-Particle SIMD-Friendly Two-Point Correlation on Large-Scale HPC Cluster Systems
Extreme-Scale UQ for Bayesian Inverse Problems Governed by PDEs
ACM Gordon Bell Prize II
The Universe at Extreme Scale - Multi-Petaflop Sky Simulation on the BG/Q
4.45 Pflops Astrophysical N-Body Simulation on K Computer - The Gravitational Trillion-Body Problem
Analysis of I/O and Storage
Demonstrating Lustre over a 100Gbps Wide Area Network of 3,500km
Characterizing Output Bottlenecks in a Supercomputer
A Study on Data Deduplication in HPC Storage Systems
Auto-Diagnosis of Correctness and Performance Issues
Parametric Flows - Automated Behavior Equivalencing for Symbolic Analysis of Races in CUDA Programs
MPI Runtime Error Detection with MUST - Advances in Deadlock Detection
Novel Views of Performance Data to Analyze Large-Scale Adaptive Applications
Autotuning and Search-Based Optimization
Portable Section-Level Tuning of Compiler Parallelized Applications
A Multi-Objective Auto-Tuning Framework for Parallel Codes
PATUS for Convenient High-Performance Stencils: Evaluation in Earthquake Simulations
Big Data
On Distributed File Tree Walk of Parallel File Systems
Design and Analysis of Data Management in Scalable Parallel Scripting
Usage Behavior of a Large-Scale Scientific Archive
Breadth First Search
Large-Scale Energy-Efficient Graph Traversal - A Path to Efficient Data-Intensive Supercomputing
Direction-Optimizing Breadth-First Search
Breaking the Speed and Scalability Barriers for Graph Exploration on Distributed-Memory Machines
Checkpointing
Alleviating Scalability Issues of Checkpointing Protocols
Design and Modeling of a Non-Blocking Checkpointing System
McrEngine - A Scalable Checkpointing System Using Data-Aware Aggregation and Compression
Cloud Computing
Cost- and Deadline-Constrained Provisioning for Scientific Workflow Ensembles in IaaS Clouds
Host Load Prediction in a Google Compute Cloud with a Bayesian Model
Scalia: An Adaptive Scheme for Efficient Multi-Cloud Storage
Communication Optimization
Mapping Applications with Collectives over Sub-Communicators on Torus Networks
Optimization Principles for Collective Neighborhood Communications
Optimizing Overlay-Based Virtual Networking through Optimistic Interrupts and Cut-Through Forwarding
Compiler-Based Analysis and Optimization
Compiler-Directed File Layout Optimization for Hierarchical Storage Systems
Tiling Stencil Computations to Maximize Parallelism
Bamboo - Translating MPI Applications to a Latency-Tolerant, Data-Driven Form
Cosmology Applications
Optimizing the Computation of N-Point Correlations on Large-Scale Astronomical Data
First-Ever Full Observable Universe Simulation
Hierarchical Task Mapping of Cell-Based AMR Cosmology Simulations
Datacenter Technologies
Measuring Interference between Live Datacenter Applications
T* - A Data-Centric Cooling Energy Costs Reduction Approach for Big Data Analytics Cloud
ValuePack - Value-Based Scheduling Framework for CPU-GPU Clusters
Direct Numerical Simulations
High Throughput Software for Direct Numerical Simulations of Compressible Two-Phase Flows
Hybridizing S3D into an Exascale Application Using OpenACC
DRAM Power and Resiliency Management
MAGE - Adaptive Granularity and ECC for Resilient and Power Efficient Memory Systems
RAMZzz: Rank-Aware DRAM Power Management with Dynamic Migrations and Demotions
Fast Algorithms
A Framework for Low-Communication 1-D FFT
Scalable Multi-GPU 3-D FFT for TSUBAME 2.0 Supercomputer
Parallel Geometric-Algebraic Multigrid on Unstructured Forests of Octrees
Fault Detection and Analysis
Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing
A Study of DRAM Failures in the Field
Fault Prediction under the Microscope - A Closer Look Into HPC Systems
GPU Programming Models and Patterns
Automatic Generation of Software Pipelines for Heterogeneous Parallel Systems
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Early Evaluation of Directive-Based GPU Programming Models for Productive Exascale Computing
Graph Algorithms
Parallel Bayesian Network Structure Learning with Application to Gene Networks
A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure
A Multithreaded Algorithm for Network Alignment via Approximate Matching
Grid Computing
ATLAS Grid Workload on NDGF Resources: Analysis, Modeling, and Workload Generation
On Using Virtual Circuits for GridFTP Transfers
On the Effectiveness of Application-Aware Self-Management for Scientific Discovery in Volunteer Computing Systems
Grids/Clouds Networking
High Performance RDMA-Based Design of HDFS over InfiniBand
Protocols for Wide-Area Data-Intensive Applications - Design and Performance Issues
Efficient and Reliable Network Tomography in Heterogeneous Networks Using BitTorrent Broadcasts and Clustering Algorithms
Linear Algebra Algorithms
Communication Avoiding and Overlapping for Numerical Linear Algebra
Managing Data-Movement for Effective Shared-Memory Parallelization of Out-of-Core Sparse Solvers
Communication-Avoiding Parallel Strassen - Implementation and Performance
Locality in Programming Models and Runtimes
Designing a Unified Programming Model for Heterogeneous Machines
Legion - Expressing Locality and Independence with Logical Regions
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Massively Parallel Simulations
Massively Parallel X-Ray Scattering Simulations
Peta-Scale Lattice Quantum Chromodynamics on a Blue Gene/Q Supercomputer
High Performance Radiation Transport Simulations - Preparing for TITAN
Maximizing Performance on Multi-Core and Many-Core Architectures
Unleashing the High Performance and Low Power of Multi-Core DSPs for General-Purpose HPC
A Scalable, Numerically Stable, High-Performance Tridiagonal Solver Using GPUs
Efficient Backprojection-Based Synthetic Aperture Radar Computation with Many-Core Processors
Memory Systems
What Scientific Applications Can Benefit from Hardware Transactional Memory
Hardware-Software Coherence Protocol for the Coexistence of Caches and Local Memories
Application Data Prefetching on the IBM Blue Gene/Q Supercomputer
Networks
Design of a Scalable InfiniBand Topology Service to Enable Network-Topology-Aware Placement of Processes
Design and Implementation of an Intelligent End-to-End Network QoS System
Looking under the Hood of the IBM Blue Gene/Q Network
New Computer Systems
Cray Cascade - A Scalable HPC System Based on a Dragonfly Network
SGI UV2 - A Fused Computation and Data Analysis Machine
GRAPE-8 - An Accelerator for Gravitational N-Body Simulation with 20.5GFLOPS/W Performance
Numerical Algorithms
High-Performance General Solver for Extremely Large-Scale Semidefinite Programing Problems
A Massively Space-Time Parallel N-Body Solver
A Parallel Two-Level Preconditioner for Cosmic Microwave Background Map-Making
Optimizing Application Performance
Compass - A Scalable Simulator for an Architecture for Cognitive Computing
Optimizing Fine-Grained Communication in a Biomolecular Simulation Application on Cray XK6
Heuristic Static Load-Balancing Algorithm Applied to the Fragment Molecular Orbital Method
Optimizing I/O For Analytics
Byte-Precision Level of Detail Processing for Variable Precision Analytics
Combining In-Situ and In-Transit Processing to Enable Extreme-Scale Scientific Analysis
Efficient Data Restructuring and Aggregation for IO Acceleration in PIDX
Performance Modeling
Aspen - A Domain Specific Language for Performance Modeling
Dataflow-Driven GPU Performance Projection for Multi-Kernel Transformations
A Practical Method for Estimating Performance Degradation on Multicore Processors and Its Application to HPC Workloads
Performance Optimization
Extending the BT NAS Parallel Benchmark to Exascale Computing
Optimization of Geometric Multigrid for Emerging Multi- and Manycore Processors
NUMA-Aware Graph Mining Techniques for Performance and Energy Efficiency
Resilience
Classifying Soft Error Vulnerabilities in Extreme-Scale Scientific Applications Using a Binary Instrumentation Tool
Containment Domains - A Scalable, Efficient, and Flexible Resiliency Scheme for Exascale Systems
Runtime-Based Analysis and Optimization
Critical Lock Analysis - Diagnosing Critical Section Bottlenecks in Multithreaded Applications
Code Generation for Parallel Execution of a Class of Irregular Loops on Distributed Memory Systems
Visualization and Analysis of Massive Data Sets
Data-Intensive Spatial Filtering in Large Numerical Simulation Datasets
Parallel Particle Advection and FTLE Computation for Time-Varying Flow Fields
Parallel IO, Analysis, and Visualization of a Trillion Particle Simulation
Weather and Seismic Simulations
Forward and Adjoint Simulations of Seismic Wave Propagation on Emerging Large-Scale GPU Architectures
A Divide and Conquer Strategy for Scaling Weather Simulations with Multiple Regions of Interest