ACM SIGMOD/PODS Conference: Vancouver, 2008
Program: Overview

(JUNE 8) SUNDAY AT A GLANCE
18:30- 20:30	PODS Reception (Currents Restaurant, Westin Main Lobby)

(JUNE 9) MONDAY AT A GLANCE
8:00	Breakfast (Foyer)
08:45- 10:00	PODS Opening and Keynote: Peter Buneman -- Curated Databases (Salon A and B)
10:00- 10:30	Coffee Break (Foyer)
10:30- 12:00	PODS Session 2: Schema Mappings (Salon A)
12:00- 13:15	Lunch Break (on your own)
13:15- 14:45	PODS Session 3: Best Newcomer Award Paper and Invited Tutorial 1 (Salon A)
14:45- 15:00	Coffee Break (Foyer)
15:00- 15:30	PODS Session 4: Alberto O. Mendelzon Test-of-Time Award (Salon A)
15:30- 17:00	PODS Session 5: Streaming Data and Best Paper Award (Salon A and B)
17:00- 17:15	Coffee Break (Foyer)
17:15- 18:30	PODS Session 6: Uncertain and Probabilistic Data (Salon A and B)	NO SESSION
18:30- 18:45		SIGMOD Reception and Undergrad Poster Competition (Canada Place, Oceanview Suites 1-4)
18:45- 21:00	NO SESSION
21:00	PODS Business Meeting (Seymour)

(JUNE 10) TUESDAY AT A GLANCE
8:00	Breakfast (Foyer)
08:45- 09:00	SIGMOD Welcome (Salon A, B and C)
09:00- 10:00	SIGMOD Keynote Talk 1: Sridhar Ramaswamy -- Extreme Data Mining (Salon A, B, and C)
10:00- 10:15	Coffee Break (Foyer)
10:15- 10:30	Coffee Break (Foyer)				SIGMOD Industrial 1 Query Optimization and Performance (Salon F)	Coffee Break (Foyer)
10:30- 12:00	PODS Session 7 Data Exchange (Mackenzie)	SIGMOD Research 1 Tracking Data in Space and Best Paper Award (Salon A)	SIGMOD Research 2 Ranking (Salon B)	SIGMOD Research 3 Privacy and Anonymiza- tion (Salon C)		SIGMOD Tutorial Session 1 Provenance and Scientific Workflows, Part 1 (Seymour)	NO SESSION
12:00- 13:30	Lunch Buffet (Foyer)
13:30- 15:00	PODS Session 8 Invited Tutorial 2 (Mackenzie)	SIGMOD Research 4 Streaming Filters (Salon A)	SIGMOD Research 5 Clustering in High Dimensions (Salon B)	SIGMOD Research 6 Skylines (Salon C)	SIGMOD Industrial 2 Database Programming and Performance (Salon F)	SIGMOD Tutorial Session 2 Provenance and Scientific Workflows, Part 2 (Seymour)	NO SESSION
15:00- 15:30	Coffee Break (Foyer)
15:30- 17:30	PODS Session 9 Searching and Clustering (Mackenzie)	SIGMOD Research 7 Special Platforms (Salon A)	SIGMOD Research 8 XML Query Processing (Salon B)	SIGMOD Research 9 Strings and Time (Salon C)	SIGMOD Industrial 3 Streams, Conversations and Verification (Salon F)	SIGMOD Demo Session 1 (Seymour)	SIGMOD Tutorial Session 3 Uncertain and Probabilistic Data (Salon E)
18:00- 20:30	New Researchers Symposium (Salon A)

(JUNE 11) WEDNESDAY AT A GLANCE
7:45	Breakfast (Foyer)
08:30- 09:30	PODS Session 10 Data and Services (Mackenzie)	SIGMOD Keynote Talk 2: Ben Shneiderman -- Extreme Visualization: Squeezing a Billion Records into a Million Pixels (Salon A, B, and C)
09:30- 10:00	Coffee Break (Foyer)
10:00- 12:00	PODS Session 11 XML and Hierarchical Data (Mackenzie)	SIGMOD Research 10 Graphs 1 (Salon A)	SIGMOD Research 11 Security, Privacy and Testing (Salon B)	SIGMOD Research 12 Query Optimization (Salon C)	NO SESSION	SIGMOD Tutorial Session 4 Object/Relational Mapping 2008 (Seymour)
12:00- 14:30	Lunch and SIGMOD Business Meeting (Salon D, E , and F)
14:30- 15:00	Coffee Break (Foyer)
15:00- 17:00	PODS Session 12 Query Processing and Optimization (Mackenzie)	SIGMOD Research 13 Graphs 2 (Salon A)	SIGMOD Research 14 Ordered Data (Salon B)	SIGMOD Research 15 Probabilistic 1 (Salon C)	SIGMOD Industrial 4 Data and Application Integration, Spatial Data (Salon F)	SIGMOD Tutorial Session 5 Information Fusion in Wireless Sensor Networks (Seymour)
17:00- 17:30
17:45- 18:15	Leaving for Museum, pickup from 17:45 to 18:15 at the Westin main entrance
19:00- 21:30	Banquet (Museum of Anthropology)

(JUNE 12) THURSDAY AT A GLANCE
7:45	Breakfast (Grand Ballroom Foyer)
08:30- 09:30	SIGMOD Keynote Talk 3: William O'Connell -- Extreme Streaming: Business Optimization Driving Algorithmic Challenges (Salon A, B, and C)
09:30- 10:00	Coffee Break (Foyer)
10:00- 12:00	SIGMOD Research 16 Transaction and Distribution, and Best Paper (Salon A)	SIGMOD Research 17 Database Integration (Salon B)	SIGMOD Research 18 Probabilistic 2 (Salon C)	SIGMOD Tutorial Session 6 Introduction to Recommender Systems, Part 1 (Salon E)	SIGMOD Products Session 1 (Salon F)	SIGMOD Demo Session 2 Group B (Seymour)
				NO SESSION
12:00- 13:30	Lunch Break (on your own)
13:30- 15:00	SIGMOD Awards and Talks (Salon A, B and C)
15:00- 15:30	Coffee Break (Foyer)
15:30- 17:30	SIGMOD Research 19 Keywords on Structure (Salon A)	SIGMOD Research 20 Tuning and Probing (Salon B)	SIGMOD Research 21 Provenance, Intergration, Extraction (Salon C)	SIGMOD Tutorial Session 7 Introduction to Recommender Systems, Part 2 (Salon E)	SIGMOD Products Session 2 (Salon F)	SIGMOD Demo Session 3 Group C (Seymour)

PODS Technical Sessions

PODS Session 2: Schema Mappings

Chair: Alin Deutsch (UC San Diego)

The Recovery of a Schema Mapping: Bringing Exchanged Data Back
Marcelo Arenas, Jorge Perez, Cristian Riveros (PUC Chile)
On the Complexity of Deriving Schema Mappings from Database Instances
Pierre Senellart (INRIA Saclay & Université Paris-Sud), Georg Gottlob (University of Oxford)
Towards a Theory of Schema-Mapping Optimization
Ronald Fagin, Phokion Kolaitis, Alan Nash, Lucian Popa (IBM Almaden)

PODS Session 3: Best Newcomer Award, and Tutorial 1

Chair: Leonid Libkin (University of Edinburgh)

Evaluating Rank Joins with Optimal Cost
Karl Schnaitter, Neoklis Polyzotis (UC Santa Cruz)
Invited Tutorial : Effective characterizations of tree logics
Mikolaj Bojanczyk (Warszaw University)

PODS Session 5: Streaming Data, and Best Paper Award

Chair: Foto Afrati (National Technical University of Athens)

Estimating PageRank on Graph Streams
Atish Das Sarma (Georgia Tech), Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)
A Generic Flow Algorithm for Shared Filter Ordering Problems
Zhen Liu, Srinivasan Parthasarathy, Anand Ranganathan and Hao Yang (IBM T.J. Watson)
Time-Decaying Aggregates in Out-of-order Streams
Graham Cormode, Philip Korn (AT&T Labs) and Srikanta Tirthapura (Iowa State University)

PODS Session 6: Uncertain and Probabilistic Data

Chair: Val Tannen (University of Pennsylvania)

Approximating Predicates and Expressive Queries on Probabilistic Databases
Christoph Koch (Cornell University)
Incorporating Constraints in Probabilistic XML
Sara Cohen, Benny Kimelfeld, Yehoshua Sagiv (Hebrew University)
Query Evaluation with Soft-Key Constraints
Abhay Jha , Vibhor Rastogi , Dan Suciu (University of Washington)

PODS Session 7: Data Exchange

Chair: Georg Gottlob (Oxford University)

Answering Aggregate Queries in Data Exchange
Foto Afrati (National Technical University of Athens), Phokion G. Kolaitis (IBM Almaden)
Data Exchange and Schema Mappings in Open and Closed Worlds
Leonid Libkin and Cristina Sirangelo (University of Edinburgh)
The Chase Revisited
Alin Deutsch (UC San Diego), Alan Nash (IBM Almaden) and Jeff Remmel (UC San Diego)

PODS Session 8: Data Quality (including Invited Tutorial 2) and Data Privacy

Chair: Dan Suciu (University of Washington)

Invited Tutorial2: Dependencies Revisited for Improving Data Quality
Wenfei Fan (University of Edinburgh & Bell Labs)
Epistemic Privacy
Alexandre Evfimievski, Ronald Fagin, David Woodruff (IBM Almaden)

PODS Session 9: Searching and Clustering

Chair: Christoph Koch (Cornell University)

On Searching Compressed String Collections Cache-Obliviously
Paolo Ferragina (Università di Pisa), Roberto Grossi (Università di Pisa), Rahul Shah (Butler University), Ankur Gupta (Louisiana State University), Jeffrey Scott Vitter (Purdue University)
Approximation Algorithms for Clustering Uncertain Data
Graham Cormode (AT&T Labs) and Andrew McGregor (UC San Diego)
Approximation Algorithms for Co-Clustering
Aris Anagnostopoulos, Anirban Dasgupta, and Ravi Kumar (Yahoo! Research)
The Power of Two Min-hashes in Similarity Search among Hierarchical Data Objects
Sreenivas Gollapudi, Rina Panigrahy (Microsoft Research)

PODS Session 10: Data and Services

Chair: Jianwen Su (U C Santa Barbara)

Static Analysis of Active XML Services
Serge Abiteboul (INRIA-Saclay & Universite Paris Sud), Luc Segoufin (INRIA & LSV - ENS Cachan), Victor Vianu (UC San Diego)
Complexity and Composition of Synthesized Web Services
Wenfei Fan (University of Edinburgh & Bell Labs), Floris Geerts (University of Edinburgh), Wouter Gelade, Frank Neven (Hasselt University & Transnational University of Limburg), Antonella Poggi (Sapienza - Università di Roma)

PODS Session 11: XML and Hierarchical Data

Chair: Marcelo Arenas (PUC Chile)

XPath Evaluation in Linear Time
Mikolaj Bojanczyk and Pawel Parys (Warszaw University)
XPath, Transitive Closure Logic, and Nested Tree Walking Automata
Balder ten Cate (Universiteit van Amsterdam), Luc Segoufin (INRIA & LSV - ENS Cachan)
Local Hoare Reasoning about DOM
Philippa A. Gardner, Gareth D. Smith, Mark J. Wheelhouse, Uri D. Zarfaty (Imperial College)
Annotated XML: Queries and Provenance
Nate Foster, T.J. Green, Val Tannen (University of Pennsylvania)

PODS Session 12: Query Processing and Optimization

Chair: Graham Cormode (AT&T Labs)

Near-Optimal Dynamic Replication in Unstructured Peer-to-Peer Networks
Mauro Sozio, Thomas Neumann, Gerhard Weikum (Max-Planck-Institut fuer Informatik)
Type Inference for Datalog, and its Application to Query Optimisation
Oege de Moor, Damien Sereni, Pavel Avgustinov, and Mathieu Verbaere (Semmle Ltd., Oxford)
Shape Sensitive Geometric Monitoring
Izchack Sharfman, Assaf Schuster, and Danniel Keren (Technion, Haifa)
Tree-width and functional dependencies in databases
Isolde Adler (Humboldt University)

SIGMOD Technical Sessions

SIGMOD Research Session 1: Tracking Data in Space, and Best Paper Award

Chair: Dennis Shasha (New York University)

Capacity Constrained Assignment in Spatial Databases
Leong Hou U (University of Hong Kong), Man Lung Yiu (Aalborg University), Kyriakos Mouratidis (Singapore Management University), Nikos Mamoulis (University of Hong Kong)
Self-Tunable Spatio-Temporal B+-tree for Moving Objects
Su Chen, Beng Chin Ooi, Kian-Lee Tan (National University of Sinagpore), Mario A. Nascimento (University of Alberta)
Scalable Network Distance Browsing in Spatial Databases
Hanan Samet, Jagan Sankaranarayanan, Houman Alborzi

SIGMOD Research Session 2: Ranking

Chair: Cong Yu (Yahoo!)

Discovering Bucket Orders from Full Rankings
Jianlin Feng (Huazhong University of Science and Technology), Qiong Fang, Wilfred Ng (The Hong Kong University of Science and Technology)
Ad-Hoc Aggregations of Ranked Lists in the Presence of Hierarchies
Nilesh Bansal, Sudipto Guha, Nick Koudas
ARCube: Supporting Ranking Aggregate Queries in Partially Materialized Data Cubes
Tianyi Wu (University of Illinois, Urbana-Champaign), Dong Xin (Microsoft Research), Jiawei Han (University of Illinois, Urbana-Champaign)

SIGMOD Research Session 3: Privacy and Anonymization

Chair: Anthony Tung (National University of Singapore)

Towards Identity Anonymization on Graphs
Kun Liu, Evimaria Terzi (IBM Almaden)
Dynamic Anonymization: Accurate Statistical Analysis with Privacy Preservation
Xiaokui Xiao, Yufei Tao (Chinese University of Hong Kong)
Private Queries in Location Based Services: Anonymizers are not Necessary
Gabriel Ghinita, Panos Kalnis (National University of Singapore), Ali Khoshgozaran, Cyrus Shahabi (University of Southern California), Kian-Lee Tan (National University of Singapore)

SIGMOD Research Session 4: Streaming Filters

Chair: Elke A. Rundensteiner (Worcester Polytechnic)

Near-Optimal Algorithms for Shared Filter Evaluation in Data Stream Systems
Zhen Liu, Srinivasan Parthasarathy, Anand Ranganathan, Hao Yang
Efficient Pattern Matching over Event Streams
Jagrati Agrawal, Yanlei Diao, Daniel Gyllstrom, and Neil Immerman (University of Massachusetts Amherst)
Scalable Regular Expression Matching on Data Streams
Anirban Majumder, Rajeev Rastogi, Sriram Vanama

SIGMOD Research Session 5: Clustering in High Dimensions

Chair: Jiawei Han (University Illinois Urbana Champaign)

CRD: A General Framework Fast Co-clustering on Large Datasets Utilizing Sample-Based Matrix Decomposition
Feng Pan, Xiang Zhang, Wei Wang (University of North Carolina at Chapel Hill)
Outlier-robust Clustering Using Independent Component Analysis
Christian Böhm (University of Munich) , Christos Faloutsos (Carnegie Mellon University), Claudia Plant (Technical University of Munich)
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction
Marc Wichterich, Ira Assent, Philipp Kranen, Thomas Seidl (RWTH Aachen University)

SIGMOD Research Session 6: Skylines

Chair: Yufei Tao (Chinese University of Hong Kong)

Monochromatic and Bichromatic Reverse Skyline Search over Uncertain Databases
Xiang Lian, Lei Chen (Hong Kong University of Science and Technology)
Angle-based Space Partitioning for Efficient Parallel Skyline Computation
Akrivi Vlachou, Christos Doulkeridis, Yannis Kotidis (Athens University of Economics and Business)
Categorical Skylines for Streaming Data
Nikos Sarkas (University of Toronto), Gautam Das (University of Texas at Arlington), Nick Koudas (University of Toronto), Anthony K. H. Tung (National University of Singapore)

SIGMOD Research Session 7: Special Platforms

Chair: Gustavo Alonso (ETH Zurich)

Building a Database on S3
Matthias Brantner, David Graf (28msec), Daniela Florescu (Oracle), Donald Kossmann (28msec & ETH Zurich), Tim Kraska (ETH Zurich)
A Peer-to-Peer DBMS's Path to Stardom: Calibrating the Potential
Mihai Lupu (Singapore-MIT Alliance, National University of Singapore), Ooi Beng Chin, Y. C. Tay, (National University of Singapore)
Just-In-Time Query Retrieval Over Partially Indexed Data on Structured P2P Overlays
Sai Wu (National University of Singapore), Jianzhong Li (Harbin Institute of Technology), Beng Chin Ooi, Kian-Lee Tan (National University of Singapore)
Efficient Storage Scheme and Query Processing for Supply Chain Management using RFID
Chun-Hee Lee, Chin-Wan Chung (KAIST)

SIGMOD Research Session 8: XML Query Processing

Chair: Jan Paredaens (University of Antwerp)

Relational-Style XML Query
Taro L. Saito, Shinichi Morishita (University of Tokyo)
Query Biased Snippet Generation in XML Search
Yu Huang, Ziyang Liu, Yi Chen (Arizona State University)
Cooperative XPath Caching
Kostas Lillis and Evaggelia Pitoura (University of Ioannina)
XML Query Optimization in the Presence of Side Effects
Giorgio Ghelli (Universita di Pisa), Nicola Onose (UCSD), Kristoffer Rose, Jerome Simeon (IBM)

SIGMOD Research Session 9: Strings and Time

Chair: Xiaofeng Meng (Remnin University)

Cost-Based Gram Selection for String Collections to Support Approximate Queries Efficiently
Xiaochun Yang, Bin Wang (Northeastern University, China), Chen Li (University of California, Irvine)
Approximate Embedding-based Subsequence Matching of Time Series
Vassilis Athitsos (University of Texas at Arlington), Panagiotis Papapetrou, Michalis Potamias, George Kollios (Boston University), Dimitrios Gunopulos (University of Athens)
Sampling Time-Based Sliding Windows in Bounded Space
Rainer Gemulla, Wolfgang Lehner (Technische Universität Dresden)
Mining Relationshps among Interval-based Events for Classification
Dhaval Patel, Wynne Hsu, Mong Li Lee (National University of Singapore)

SIGMOD Research Session 10: Graphs 1

Chair: Divy Agrawal (UC Santa Barbara)

Graphs-at-a-time: Query Language and Access Methods for Graph Databases
Huahai He and Ambuj K. Singh (UC Santa Barbara)
Graph Summarization with Bounded Error
Saket Navlakha (University of Maryland, College Park), Rajeev Rastogi (Yahoo! Labs, Bangalore, India), Nisheeth Shrivastava (Bell Labs Research, Bangalore, India)
Mining Significant Graph Patterns by Scalable Leap Search
Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (University of Illinois at Urbana-Champaign) Philip S. Yu (IBM T. J. Watson)
CSV: Visualizing and Mining Cohesive Subgraphs
Nan Wan, Srinivasan Parthasarathy, Kian-Lee Tan, Anthony K. H. Tung (National University Singapore)

SIGMOD Research Session 11: Security, Privacy and Testing

Chair: Johann-Christoph Freytag (Humboldt University of Berlin)

Privacy-MaxEnt: Integrating Background Knowledge in Privacy Quantification
Wenliang Du, Zhouxuan Teng, and Zutao Zhu (Syracuse University)
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data
Jiexing Li, Yufei Tao, Xiaokui Xiao (Chinese University of Hong Kong)
Stream Firewalling of XML Constraints
Michael Benedikt (Oxford University), Alan Jeffrey (Bell Labs, Alcatel-Lucent), Ruy Ley-Wild (Carnegie Mellon University)
Generating Targeted Queries for Database Testing
Chaitanya Mishra, Nick Koudas (University of Toronto), Calisto Zuzarte (IBM Toronto)

SIGMOD Research Session 12: Query Optimization

Chair: Naoko Kosugi (NTT, Japan)

Relational Joins on Graphics Processors
Bingsheng He (HKUST), Ke Yang (Zhejiang University), Rui Fang, Mian Lu, Naga K. Govindaraju (Microsoft), Qiong Luo, and Pedro V. Sander (HKUST)
Optimizing Complex Queries with Multiple Relation Instances
Yu Cao, Gopal C. Das, Chee-Yong Chan, Kian-Lee Tan (National University of Singapore)
Dynamic Programming Strikes Back
Guido Moerkotte (University of Mannheim), Thomas Neumann (Max-Planck Institute for Informatics)
Adding Magic to an Optimising Datalog Compiler
Damien Sereni, Pavel Avgustinov, Oege De Moor

SIGMOD Research Session 13: Graphs 2

Chair: Nisheeth Shrivastava (Bell Labs India)

Efficient Aggregation for Graph Summarization
Yuanyuan Tian (University of Michigan), Richard A. Hankins (Nokia Research Center), Jignesh M. Patel (University of Michigan)
Efficient Algorithms for Exact Ranked Twig Pattern Matching over Graphs
Gang Gou, Rada Chirkova (North Carolina State University)
Efficiently Answering Reachability Query on Very Large Directed Graphs
Ruoming Jin, Yang Xiang, Ning Ruan (Kent State University), Haixun Wang (IBM T.J Watson)
Minimization of Tree Pattern Queries with Constraints
Ding Chen, Chee-Yong Chan

SIGMOD Research Session 14: Ordered Data

Chair: Kyuseok Shim (Seoul National University)

Query-based Partitioning of Documents and Indexes for Information Lifecycle Management
Soumyadeb Mitra, Marianne Winslett, Windsor Hsu
Skippy: a New Indexing Method for Long-Lived Snapshots in the Storage Manager
Ross Shaull, Liuba Shrira and Hao Xu (Brandeis University)
OLAP on Sequence Data
Eric Lo (The Hong Kong Polytechnic University), Ben Kao, Wai-Shing Ho, Sau Dan Lee, Chun Kit Chui, David W. Cheung (The University of Hong Kong)
Improving Suffix Array Locality for Fast Pattern Matching on Disk
Ranjan Sinha (The University of Melbourne), Simon J. Puglisi (RMIT University), Alistair Moffat (The University of Melbourne), Andrew Turpin (RMIT University)

SIGMOD Research Session 15: Probabilistic I

Chair: Y. C. Tay (University of Singapore)

Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach
Ming Hua, Jian Pei (Simon Fraser University), Wenjie Zhang, Xuemin Lin (The University of New South Wales & NICTA)
MCDB: A Monte Carlo Approach to Managing Uncertain Data
Peter Haas, Mingxi Wu, Fei Xu, Ravinath Jampani, Christopher Jermaine, Lusi Perez
Query Efficiency in Probabilistic XML Models
Benny Kimelfeld, Yuri Kosharovsky and Yehoshua Sagiv (Hebrew University)
Event Queries on Correlated Probabilistic Streams
Christopher Christopher Re, Julie Letchner, Magdalena Balazinska, Dan Suciu (University of Washington)

SIGMOD Research Session 16: Transactions and Distribution, and Best Paper Award

Chair: Panos K. Chrysanthis (University of Pittsburgh)

Serializable isolation for snapshot databases
Michael Cahill (University of Sydney and Oracle), Uwe Röhm, Alan Fekete (University of Sydney)
Middleware-based Database Replication: The Gaps between Theory and Practice
Emmanuel Cecchet (EPFL), George Candea (EPFL & Aster Data Systems), Anastasia Ailamaki (EPFL & Carnegie Mellon University)
On Efficient Top-k Query Processing in Highly Distributed Environments
Akrivi Vlachou Christos Doulkeridis (Athens University of Economics and Business) Kjetil Noorvaeg (NTNU) Michalis Vazirgiannis (Athens University of Economics and Business)
Efficient Bulk Insertion into a Distributed Ordered Table
Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan

SIGMOD Research Session 17: Database Integration As You Go

Chair: Guy M. Lohman (IBM Almaden)

Interactive Generation of Integrated Schemas
Laura Chiticariu (UC Santa Cruz), Phokion G. Kolaitis, Lucian Popa (IBM Almaden)
Pay-as-you-go User Feedback for Dataspace Systems
Shawn R. Jeffery, Michael J. Franklin (UC Berkeley), Alon Y. Halevy (Google)
Bootstrapping Pay-As-You-Go Data Integration Systems
Anish Das Sarma (Stanford University), Xin Dong (AT&T Labs-Research), Alon Halevy (Google)
Supporting OLAP Operations over Imperfectly Integrated Taxonomies
Yan Qi, K. Selçuk Candan (Arizona State University), Junichi Tatemura, Songting Chen (NEC Laboratories America), Fenglin Liao (UC Santa Barbara)

SIGMOD Research Session 18: Probabilistic 2

Chair: Jun Yang (Duke)

Sampling Cube: A Framework for Statistical OLAP over Sampling Data
Xiaolei Li, Jiawei Han, Zhijun Yin, Jae-Gil Lee, Yizhou Sun (University of Illinois at Urbana-Champaign)
Querying Continuous Functions in a Database System
Arvind Thiagarajan, Samuel Madden (MIT CSAIL)
An efficient filter for approximate membership checking
Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Dong Xin (Microsoft Research)
Finding Frequent Items in Probabilistic Data
Qin Zhang, Feifei Li, Ke Yi

SIGMOD Research Session 19: Keywords on Structure

Chair: Berthold Reinwald (IBM Almaden)

SQAK: Doing More With Keywords
Sandeep Tata and Guy M. Lohman (IBM Almaden)
EASE: Efficient and Adaptive Keyword Search on Unstructured, Semi-structured and Structured Data
Guoliang Li (Tsinghua University), Beng Chin Ooi (National University of Singapore), Jianhua Feng, Jianyong Wang, Lizhu Zhou (Tsinghua University)
A Graph Method for Keyword-based Selection of the top-K Databases
Quang Hieu Vu (National University of Singapore), Beng Chin Ooi, Dimitris Papadias (Hong Kong University of Science and Technology), Anthony K. H. Tung (National University of Singapore)
Keyword Proximity Search in Complex Data Graphs
Konstantin Golenberg, Benny Kimelfeld and Yehoshua Sagiv (The Hebrew University of Jerusalem)

SIGMOD Research Session 20: Tuning and Probing

Chair: Ken Ross (Columbia University)

Configuration-Parametric Query Optimization for Physical Design Tuning
Nicolas Bruno (Microsoft Research), Rimma V. Nehme (Purdue University)
Automatic Virtual Machine Configuration for Database Workloads
Ahmed A. Soror, Umar Farooq Minhas, Ashraf Aboulnaga, Kenneth Salem (University of Waterloo), Peter Kokosielis, Sunil Kamath (IBM Toronto)
Column-Stores vs. Row-Stores: How Different Are They Really? Daniel J. Abadi (Yale University), Samuel R. Madden (MIT CSAIL), Nabil Hachem (AvantGarde Consulting, LLC)
OLTP Through the Looking Glass, And What We Found There
Stavros Harizopoulos, Daniel Abadi, Sam Madden, Michael Stonebraker

SIGMOD Research Session 21: Provenance, Integration, Extraction

Chair: Christoph Koch (Cornell)

Efficient Provenance Storage
Adriane Chapman, H.V. Jagadish (University of Michigan), Prakash Ramanan (Wichita State University)
Efficient Lineage Tracking For Scientific Workflows
Thomas Heinis, Gustavo Alonso (ETH Zurich)
Discovering Topical Structures of Databases Wensheng Wu, Berthold Reinwald, Yannis Sismanis, Rajesh Manjrekar (IBM Almaden)
Toward Best-effort Information Extraction
Warren Shen, Pedro DeRose (University of Wisconsin, Madison), Robert McCann (Microsoft Corp.), AnHai Doan (University of Wisconsin, Madison), Raghu Ramakrishnan (Yahoo! Research)

SIGMOD Industrial Sessions

SIGMOD Industrial Session 1: Query Optimization and Performance

Chair: Jose Blakeley (Microsoft Corporation)

Handling Data Skew in Parallel Joins in Shared-Nothing Systems
Yu Xu, Pekka Kostamaa, Xin Zhou (Teradata), Liang Chen (UCSD)
Efficient and Scalable Statistics Gathering for Large Databases in Oracle 10g
Sunil Chakkappen, Thierry Cruanes, Benoit Dageville, Linan Jiang, Uri Shaft, Hong Su, Mohamed Zait
Grouping and Optimization of XPath Expressions in DB2 pureXML
Andrey Balmin, Fatma Ozcan, Ashutosh Singh, Edison Ting
A Case for Flash Memory SSD in Enterprise Database Applications
Sang-Won Lee (Sungkyunkwan University), Bongki Moon (University of Arizona), Chanik Park (Samsung Electronics), Jae-Myung Kim (Altibase), Sang-Woo Kim (Sungkyunkwan University)

SIGMOD Industrial Session 2: Database Programming and Performance

Chair: Hui-I Hsiao (IBM China Research Lab)

.NET Database Programmability and Extensibility in Microsoft SQL Server
José A. Blakeley, Mat Henaire, Christian Kleinerman, Isaac Kunen, Adam Prout, Vineet Rao (Microsoft Corporation)
Pig Latin: A Not-So-Foreign Language for Data Processing
Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins (Yahoo! Research)
Supporting Table Partitioning By Reference in Oracle
George Eadon, Eugene Inseok Chong, Shrikanth Shankar, Ananth Raghavan, Jagannathan Srinivasan, Souripriya Das (Oracle)

SIGMOD Industrial Session 3: Streams, conversations, and verification

Chair: Ioana Stanoi (IBM Almaden)

SPADE: The System S Declarative Stream Processing Engine
Bugra Gedik, Henrique Andrade, Kun-Lung Wu (IBM T. J. Watson Research Center) Philip S. Yu (University of Illinois at Chicago) Myungcheol Doo (Georgia Institute of Technology)
Query-Aware Partitioning for Monitoring Massive Network Data Streams
Vladislav Shkapenyuk, Theodore Johnson, Oliver Spatscheck (AT&T Labs - Research), Muthu S. Muthukrishnan (Rutgers University)
Helping Satisfy Multiple Objectives during a Service Desk Conversation
Ullas Nambiar, Himanshu Gupta (IBM India Research Lab), Raju Balakrishnan (Arizona State University), Mukesh Mohania (IBM India Research Lab)
Oracle Database Replay
Leonidas Galanis, Supiti Buranawatanachoke, Romain Colle, Benoît Dageville, Karl Dias, Jonathan Klein, Stratos Papadomanolakis, Leng Leng Tan, Yujun Wang, Graham Wood (Oracle USA), Venkateshwaran Venkataramani (Facebook)

SIGMOD Industrial Session 4: Data and application integration, spatial data

Chair: Patrick O'Neil (University of Massachusetts, Boston)

Damia: Data Mashups for Intranet Applications
David E. Simmen, Mehmet Altinel, Volker Markl, Sriram Padmanabhan, Ashutosh Singh
Effective and Efficient Semantic Web Data Management over DB2
Li Ma, Chen Wang (IBM China Research Laboratory), Jing Lu (Shanghai JiaoTong University), Feng Cao, Yue Pan (IBM China Research Laboratory), Yong Yu (Shanghai JiaoTong University)
Multi-Tenant Databases for Software as a Service
Stefan Aulbach, Torsten Grust (Technische Universität München, Germany), Dean Jacobs (SAP AG, Walldorf, Germany), Alfons Kemper, Jan Rittinger (Technische Universität München, Germany)
Spatial Indexing in Microsoft SQL Server 2008
Yi Fang, Marc Friedman, Giri Nair, Michael Rys, Ana-Elisa Schmidt

SIGMOD Products Day #1

The SAP Transaction Model: Know Your Applications
Rainer Brendle, Shel Finkelstein, Dean Jacobs, Manfred Hirsch and Ulrich Marquard (SAP Labs)
Abstract: SAP's transaction processing model has been highly successful. Our transaction model is closely tailored to the requirements of ERP and other business applications. Characteristics of many of these applications include:
- Transactions may involve multiple conversational interactions with users.
- Most business data is read-only.
- Update operations are insert-mostly.
- Most database transactions are short, but there are some long-running (batch) transactions.
- For all updates, conflicts are rare in practice.
- Potential hotspots (e.g., inventory stock; sequence numbers that must be consecutive for legal reasons) must be addressed.
SAP uses databases as simple transactional stores that administrative staffs know how to manage. We have our own Application Database Interface Layer outside the database that handles caching, locking, collection of updates and multi-tenancy. Transaction commit on the actual database is handled using the recently rediscovered two-stage approach, where the first stage permits reads and rollbacks but not writes, and the second stage permits reads and writes but not rollbacks. This approach reduces load on the database, since transactions that update the database will commit with a single message into the database, and because of the two-stage approach, almost all transactions commit succeed. We'll also present scalability results showing the value of SAP's application-oriented transaction model.
Cassandra: A Structured Storage System on a P2P Network
Avinash Lakshman, Prashant Malik, and Karthik Ranganathan (Facebook)
Abstract: Cassandra is a distributed storage system for managing structured data while providing reliability at a massive scale. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of software systems.
Cassandra has been built to provide a flexible, high-performance storage system to applications. In this presentation we describe the simple data model provided by Cassandra, which enables clients to have dynamic control over the data model, and we describe the design and implementation of Cassandra.
Megastore: A Scalable Data System for User Facing Applications
JJ Furman, Jonas S Karlsson, Jean-Michel Leon, Alex Lloyd, Steve Newman, and Philip Zeyliger (Google Inc.)
Abstract: Megastore provides a rich model and API that facilitates implementation of user facing applications storing data in Bigtable. Our goal is to enable Google developers to quickly build and launch highly available applications at Google scale. We extend Bigtable to provide strong consistency guarantees and higher levels abstractions such as transactions, secondary indexes and synchronous replication. Megastore takes a practical approach to schema management, providing integrated declarative schemas with rich data extensions, such as logical data partitioning, which is key to achieve high performance querying and scalable massively parallel transactions.

SIGMOD Products Day #2

Polestar: Applying search techniques to computation and aggregation of high volume structured data
Sal Visca (SAP Business Objects)
Abstract: The increased adoption of analytics and Business Intelligence platforms generates new challenges like data explosion, growing by 50% per anum, increase of users and frequent needs for change. Traditional systems can become bottlenecks because many of them can't deliver fast response time to users and meet demand for changes. These systems face high data volumes challenges, like queries that involve access to many millions up to billions of records, as challenging response time for Service Level Agreements (predictable, stable response times for growing number of users). Advancements in technology and continued lowered cost of hardware are allowing a shift from traditional disk aggregation techniques to In-memory technology to optimize performance and flexibility when analyzing large amounts of data. In this presentation you will learn how the combination of search techniques through vertical data decomposition with business metadata and advanced data computation algorithms have opened new ways to easily and quickly navigate across large amounts of structured data. SAP BusinessObjects Polestar brings together the simplicity and speed of search with the trust and analytical power of Business Intelligence to provide immediate answers to business questions. Discover the challenges of extending such paradigm to new orders of magnitude through in-memory data engine and hardware appliances as well as the potential of enlarging the scope of the approach to unstructured data.
HP Neoview: Workload Scalability for Operational BI
Greg Battas, Awny al-omari, Bob Wehrmeister, and Hans Zeller (HP)
Abstract: Scalability has long been a topic in database but typically the focus has been on scalability of data size or of computing resources. As the business intelligence market adapts to embrace SOA, data warehouses are increasingly used to inline analytics into business processes and this shifts the focus of scalability into other areas. Hewlett-Packard has been developing the Neoview database system for this operational BI market and as such as characterized the requirement for scalability along three dimensions - Capacity, Concurrency and Complexity. As analytics become a part of operational business processes, the ability to scale concurrency becomes critical. One technique used to significantly scale the concurrency of the Neoview system is Adaptive Segmentation. Adaptive segmentation works by managing a massively parallel processing (MPP) machine as a collection of virtual segments. The number of virtual segments to be used for each query is determined automatically based on the resource requirements of the query. Each query is assigned to virtual segments for execution in a balanced way. This maximizes concurrency and overall system utilization and minimizes resource contention resulting in improved concurrent performance. Adaptive Segmentation has been demonstrated to quadruple (3.9X) the throughput of a sample workload from HP's internal data warehouse and double (1.98X) the throughput of one of our retail customer's benchmark. Adaptive segmentation is the first of several approaches being implemented by Neoview to concurrently execute a high volume of database access across a very large number of computing resources.

SIGMOD Demonstration Sessions

SIGMOD Demonstrations Session 1

SGL: A Scalable Language for Data-Driven Games
Robert Albright, Alan Demers, Johannes Gehrke, Nitin Gupta, Hooyeon Lee, Rick Keilty, Gregory Sadowski, Ben Sowell, Walker White (Cornell University)
The DBO Database System
Florin Rusu, Fei Xu, Luis Leopoldo Perez, Mingxi Wu, Ravi Jampani, Christopher Jermaine, Alin Dobra (University of Florida)
Stretch 'n' Shrink: Resizing queries to user preferences
Chaitanya Mishra, Nick Koudas (University of Toronto)
Incorporating String Transformations into Record Matching
Arvind Arasu, Surajit Chaudhuri, Kris Ganjam, Raghav Kaushik
SEMMO: A Scalable Engine for Massively Multiplayer Online Games
Nitin Gupta, Alan Demers, Johannes Gehrke
Orion 2.0: Native Support for Uncertain Data
Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne Hambrusch, Rahul Shah (Purdue University)
Building a Global Location Search Service
Vibhuti Sengar, Tanuja Joshi, Joseph Joy, Samarth Prakash (Microsoft Research India)
Freebase: A Collaboratively Created Graph Database For Structuring Human Knowledge
Kurt Bollacker, Timothy Sturge
Querying and Re-Using Workflows with VisTrails
Carlos E. Scheidegger, David Koop, Juliana Freire, and Claudio T. Silva (University of Utah)
HERMES: Aggregative LBS via a Trajectory DB Engine
Nikos Pelekis, Elias Frentzos, Nikos Giatrakos and Yannis Theodoridis (University of Piraeus)
Enriching Topic-Based Publish-Subscribe Systems with Related Content
Rubi Boim, Tova Milo (Tel Aviv University)

SIGMOD Demonstrations Session 2

SchemaScope: a System for Inferring and Cleaning XML Schemas
Geert Jan Bex, Frank Neven, Stijn Vansummeren (Hasselt University and Transnational University of Limburg)
DiMaC: A System for Cleaning Disguised Missing Data
Ming Hua, Jian Pei (Simon Fraser University)
An XML Index Advisor for DB2
Iman Elghandour, Ashraf Aboulnaga (University of Waterloo), Daniel C. Zilio (IBM Toronto Lab), Fei Chiang (University of Toronto), Andrey Balmin, Kevin Beyer (IBM Almaden Research Center), Calisto Zuzarte (IBM Toronto Lab)
Clip: a Tool for Visual Mapping of Hierarchical Schemas
Alessandro Raffio, Daniele Braga, Stefano Ceri (Politecnico di Milano), Paolo Papotti (Università di Roma Tre), Mauricio A. Hernández (IBM Almaden Research Center)
UQBE: Uncertain Query By Example for Web Service Mashup
Junichi Tatemura, Songting Chen (NEC Laboratories America), Fenglin Liao (UCSB), Oliver Po, K. Selcuk Candan, Divyakant Agrawal (NEC Laboratories America)
Muse: A System for Understanding and Designing Mappings
Bogdan Alexe, Laura Chiticariu, Renee Miller, Daniel Pepper, Wang-Chiew Tan
NAGA: Harvesting, Searching and Ranking Knowledge
Gjergji Kasneci, Fabian Suchanek, Georgiana Ifrim, Shady Elbassuoni, Maya Ramanath, Gerhard Weikum
The Spicy System: Towards a Notion of Mapping Quality
A. Bonifati (ICAR - CNR, Italy) G. Mecca, A. Pappalardo, S. Raunich, G. Summa (Universita della Basilicata, Italy)
XArch: Archiving Scientific and Reference Data
Heiko Mueller, Peter Buneman, Ioannis Koltsidas
LearnPADS: Automatic Tool Generation from Ad Hoc Data
Kathleen Fisher (AT&T Labs Research), David Walker, Kenny Q. Zhu (Princeton University)

SIGMOD Demonstrations Session 3

Borealis-R: A Replication-Transparent Stream Processing
Jeong-Hyon Hwang, Sanghoon Cha, Ugur Cetintemel, and Stan Zdonik (Brown University)
TinyCasper: A Privacy-Preserving Aggregate Location Monitoring System in Wireless Sensor Networks
Chi-Yin Chow, Mohamed F. Mokbel and Tian He (University of Minnesota)
The Demaq System: Declarative Development of Distributed Applications
Alexander Böhm, Erich Marth, and Carl-Christian Kanne (University of Mannheim)
ProSem: Joint Optimization of Processing and Dissemination in Wide-Area Publish/Subscribe
Badrish Chandramouli, Jun Yang, Pankaj K. Agarwal, Albert Yu, Ying Zheng (Duke University)
A Demonstration of Cascadia Through a Digital Diary Application
Nodira Khoussainova, Evan Welbourne, Magdalena Balazinska, Gaetano Borriello, Garrett Cole, Julie Letchner, Yang Li, Christopher Re, Dan Suciu, Jordan Walke (University of Washington)
From del.icio.us to x.qui.site: Recommendations in Social Tagging Sites
Sihem Amer-Yahia, Alban Galland, Julia Stoyanovich, Cong Yu
Distributed XQuery and Updates Processing with Heterogeneous XQuery Engines
Ying Zhang and Peter Boncz (CWI Amsterdam)
XQuery in the Browser
Ghislain Fourny, Donald Kossmann, Tim Kraska, Markus Pilman (ETH Zurich), Daniela Florescu (Oracle)
BibNetMiner: Mining Bibliographic Information Networks
Yizhou Sun, Tianyi Wu, Zhijun Yin, Hong Cheng, Jiawei Han (University of Illinois at UrbanaChampaign), Xiaoxin Yin (Microsoft Research), Peixiang Zhao (University of Illinois at UrbanaChampaign)

Welcome

Program at a Glance

Technical Sessions

ACM SIGMOD/PODS Conference: Vancouver, 2008 Program: Overview

ACM SIGMOD/PODS Conference: Vancouver, 2008
Program: Overview