Associate Professor
School of Software, Tsinghua University
Room 11-411, Main Building, Tsinghua University, Beijing, China
account@tsinghua.edu.cn (my account is sxsong)
News
I am currently looking for PhD students to work on data quality and time series database system. Please email me your CV if you are interested.
Research Interests
Data quality over heterogeneous, temporal, graph, uncertain data.
Time series database
- A system overview of Apache IoTDB. [SIGMOD 2023]
- A comparison of time series data encoding. [VLDB 2022]
Some overview slides on my research topics:
- Time series data cleaning. [slides]
- Event data quality. [slides]
- Data dependencies in the presence of difference. [slides]
Selected Publications
[My DBLP Entry]
- Shaoxu Song, Lei Chen. Integrity Constraints on Rich Data Types.
Springer 2023, ISBN 978-3-031-27176-2 [book]

- Chen Wang, Jialin Qiao, Xiangdong Huang, Shaoxu Song, Haonan Hou, Tian Jiang, Lei Rui, Jianmin Wang, Jiaguang Sun. Apache IoTDB: A Time Series Database for IoT Applications.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2023. [paper] [slides]
- Chenguang Fang, Shaoxu Song, Haoquan Guan, Xiangdong Huang, Chen Wang, Jianmin Wang. Grouping Time Series for Efficient Columnar Storage.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2023. [paper] [slides]
- Yunxiang Su, Gong Yikun, Shaoxu Song. Time Series Data Validity.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2023. [paper] [slides]
- Yunxiang Su, Wenxuan Ma, Shaoxu Song. Learning Autoregressive Model in LSM-Tree based Store.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, 2023. [paper] [slides]
- Yuanhui Qiu, Chenguang Fang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. TsQuality: Measuring Time Series Data Quality in Apache IoTDB.
International Conference on Very Large Data Bases, VLDB, 2023. [paper] [slides] [demo]
- Haoquan Guan, Ziling Chen, Shaoxu Song. CORE-Sketch: On Exact Computation of Median Absolute Deviation with Limited Space.
International Conference on Very Large Data Bases, VLDB, 2023. [paper] [slides]
- Haoyu Wang, Shaoxu Song. Frequency Domain Data Encoding in Apache IoTDB.
International Conference on Very Large Data Bases, VLDB, 2023. [paper] [slides]
- Tian Jiang, Xiangdong Huang, Shaoxu Song, Chen Wang, Jianmin Wang, Ruibo Li, Jincheng Sun. Non-Blocking Raft for High Throughput IoT Data.
IEEE International Conference on Data Engineering, ICDE, 2023. [paper] [slides]
- Chenguang Fang, Yinan Mei, Shaoxu Song. Matrix Factorization with Landmarks for Spatial Data.
IEEE International Conference on Data Engineering, ICDE, 2023. [paper] [slides]
- Yinan Mei, Shaoxu Song, Chenguang Fang, Ziheng Wei, Jingyun Fang, Jiang Long. Discovering Editing Rules by Deep Reinforcement Learning.
IEEE International Conference on Data Engineering, ICDE, 2023. [paper] [slides]
- Xiaojian Zhang, Hongyin Zhang, Shaoxu Song, Xiangdong Huang, Chen Wang, Jianmin Wang. Backward-Sort for Time Series in Apache IoTDB.
IEEE International Conference on Data Engineering, ICDE, 2023. [paper] [slides]
- Shaoxu Song, Fei Gao, Ruihong Huang, Chaokun Wang. Data Dependencies Extended for Variety and Veracity: A Family Tree.
IEEE Transactions on Knowledge and Data Engineering, TKDE, 2022. [paper][ICDE poster]
- Chenguang Fang, Shaoxu Song, Yinan Mei, Ye Yuan, Jianmin Wang. On Aligning Tuples for Regression.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, 2022. [paper] [slides]
- Jinzhao Xiao, Yuxiang Huang, Changyu Hu, Shaoxu Song, Xiangdong Huang, Jianmin Wang. Time Series Data Encoding for Efficient Storage: A Comparative Analysis in Apache IoTDB.
International Conference on Very Large Data Bases, VLDB, 2022. [paper] [slides]
- Chenguang Fang, Shaoxu Song, Yinan Mei. On Repairing Timestamps for Regular Interval Time Series.
International Conference on Very Large Data Bases, VLDB, 2022. [paper] [slides]
- Yu Sun, Zheng Zheng, Shaoxu Song, Fei Chiang. Confidence Bounded Replica Currency Estimation.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2022. [paper] [slides]
- Rui Kang, Shaoxu Song, Chaokun Wang. Conditional Regression Rules.
IEEE International Conference on Data Engineering, ICDE, 2022. [paper] [slides]
- Yuyuan Kang, Xiangdong Huang, Shaoxu Song, Lingzhe Zhang, Jialin Qiao, Chen Wang, Jianmin Wang, Julian Feinauer. Separation or Not: On Handing Out-of-Order Time-Series Data in Leveled LSM-Tree.
IEEE International Conference on Data Engineering, ICDE, 2022. [paper] [slides]
- Zhiwei Chen, Shaoxu Song, Ziheng Wei, Jingyun Fang, Jiang Long. Approximating Median Absolute Deviation with Bounded Error.
International Conference on Very Large Data Bases, VLDB, 2021. [paper] [slides]
- Shaoxu Song, Ruihong Huang, Yu Gao, Jianmin Wang. Why Not Match: On Explanations of Event Pattern Queries.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2021. [paper] [slides]
- Shaoxu Song, Fei Gao, Ruihong Huang, Yihan Wang. On Saving Outliers for Better Clustering over Noisy Data.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2021. [paper] [slides]
- Yinan Mei, Shaoxu Song, Chenguang Fang, Haifeng Yang, Jingyun Fang, Jiang Long. Capturing Semantics for Imputation with Pre-trained Language Models.
IEEE International Conference on Data Engineering, ICDE, 2021. [paper] [slides]
- Yu Sun, Shaoxu Song. From Minimum Change to Maximum Density: On S-Repair under Integrity Constraints.
IEEE International Conference on Data Engineering, ICDE, 2021. [paper] [slides][TKDE version]
- Chaokun Wang, Binbin Wang, Bingyang Huang, Shaoxu Song, Zai Li. FastSGG: Efficient Social Graph Generation Using a Degree Distribution Generation Model.
IEEE International Conference on Data Engineering, ICDE, 2021.
- Shaoxu Song, Yu Sun. Imputing Various Incomplete Attributes via Distance Likelihood Maximization.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, 2020. [paper] [slides]
- Yinan Mei, Shaoxu Song, Yunsu Lee, Jungho Park, Soo-Hyung Kim, Sungmin Yi. Representing Temporal Attributes for Schema Matching.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, 2020. [paper] [slides]
- Yu Sun, Shaoxu Song, Chen Wang, Jianmin Wang. Swapping Repair for Misplaced Attribute Values.
IEEE International Conference on Data Engineering, ICDE, 2020. [paper] [slides]
- Ruihong Huang, Shaoxu Song, Yunsu Lee, Jungho Park, Soo-Hyung Kim, Sungmin Yi. Effective and Efficient Retrieval of Structured Entities.
International Conference on Very Large Data Bases, VLDB, 2020. [paper] [slides]
- Aoqian Zhang, Shaoxu Song, Yu Sun, Jianmin Wang. Learning Individual Models for Imputation.
IEEE International Conference on Data Engineering, ICDE, 2019. [paper] [slides]
- Aoqian Zhang, Shaoxu Song, Jianmin Wang, Philip S. Yu. Time Series Data Cleaning: From Anomaly Detection to Anomaly Repairing.
International Conference on Very Large Data Bases, VLDB, 2017. [paper] [slides]
- Aoqian Zhang, Shaoxu Song, Jianmin Wang. Sequential Data Cleaning: A Statistical Approach.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2016. [paper] [slides] [VLDBJ verion]
- Shaoxu Song, Han Zhu, Jianmin Wang. Constraint-Variance Tolerant Data Repairing.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2016. [paper] [slides]
- Shaoxu Song, Yue Cao, Jianmin Wang. Cleaning Timestamps with Temporal Constraints.
International Conference on Very Large Data Bases, VLDB, 2016. [paper] [slides] [VLDBJ version]
- Weiguo Zheng, Lei Zou, Wei Peng, Xifeng Yan, Shaoxu Song, Dongyan Zhao. Semantic SPARQL Similarity Search Over RDF Knowledge Graphs.
International Conference on Very Large Data Bases, VLDB, 2016.
- Shaoxu Song, Chunping Li, Xiaoquan Zhang. Turn Waste into Wealth: On Simultaneous Clustering and Cleaning over Dirty Data.
ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, 2015. [paper] [slides]
- Shaoxu Song, Aoqian Zhang, Lei Chen, Jianmin Wang. Enriching Data Imputation with Extensive Similarity Neighbors.
International Conference on Very Large Data Bases, VLDB, 2015. [paper] [slides] [TKDE version]
- Shaoxu Song, Aoqian Zhang, Jianmin Wang, Philip S. Yu. SCREEN: Stream Data Cleaning under Speed Constraints.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2015. [paper] [slides] [TODS version]
- Weiguo Zheng, Lei Zou, Xiang Lian, Jeffrey Xu Yu, Shaoxu Song, Dongyan Zhao. How to Build Templates for RDF Question/Answering: An Uncertain Graph Similarity Join Approach.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2015.
- Jianmin Wang, Shaoxu Song, Xuemin Lin, Xiaochen Zhu, Jian Pei. Cleaning Structured Event Logs: A Graph Repair Approach.
IEEE International Conference on Data Engineering, ICDE, 2015. [paper] [slides] [TODS version]
- Shaoxu Song, Hong Cheng, Jeffrey Xu Yu, Lei Chen. Repairing Vertex Labels under Neighborhood Constraints.
International Conference on Very Large Data Bases, VLDB, 2014. [paper] [slides] [VLDBJ version]
- Shaoxu Song, Lei Chen, Hong Cheng. On Concise Set of Relative Candidate Keys.
International Conference on Very Large Data Bases, VLDB, 2014. [paper] [slides]
- Xiaochen Zhu, Shaoxu Song, Xiang Lian, Jianmin Wang, Lei Zou. Matching Heterogeneous Event Data.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2014. [paper] [slides] [TKDE version]
- Xiaochen Zhu, Shaoxu Song, Jianmin Wang, Philip S. Yu, Jiaguang Sun. Matching Heterogeneous Events with Patterns.
IEEE International Conference on Data Engineering, ICDE, 2014. [paper] [slides] [TKDE version]
- Jianmin Wang, Shaoxu Song, Xiaochen Zhu, Xuemin Lin. Efficient Recovery of Missing Events.
International Conference on Very Large Data Bases, VLDB, 2013. [paper] [slides] [TKDE version]
- Shaoxu Song, Lei Chen, Hong Cheng. Parameter-Free Determination of Distance Thresholds for Metric Distance Constraints.
IEEE International Conference on Data Engineering, ICDE, 2012. [paper] [slides] [TKDE version]
- Shaoxu Song, Lei Chen, Philip S. Yu. On Data Dependencies in Dataspaces.
IEEE International Conference on Data Engineering, ICDE, 2011. [paper] [slides] [VLDBJ version]
- Shaoxu Song, Lei Chen. Differential Dependencies: Reasoning and Discovery.
ACM Transactions on Database Systems, TODS, 2011. [paper]
- Shaoxu Song, Lei Chen, Mingxuan Yuan. Materialization and Decomposition of Dataspaces for Efficient Search.
IEEE Transactions on Knowledge and Data Engineering, TKDE, 2011. [paper]
- Shaoxu Song, Lei Chen, Jeffrey Xu Yu. Answering Frequent Probabilistic Inference Queries in Databases.
IEEE Transactions on Knowledge and Data Engineering, TKDE, 2011. [paper]
- Xiang Lian, Lei Chen, Shaoxu Song. Consistent Query Answers in Inconsistent Probabilistic Databases.
ACM SIGMOD International Conference on Management of Data, SIGMOD, 2010.
Professional Services
IEEE BigData 2022 PC Vice-Co-Chair
APWEB-WAIM 2017 Workshop Co-Chair, WAIM 2016 Workshop Co-Chair
VLDB 2019 Distinguished Reviewer
CIKM 2017 Outstanding Reviewer
PC Member
- VLDB 2024 / 2023 / 2022 / 2021 / 2019
- ICDE 2024 / 2023 / 2017
- KDD 2023 / 2022 / 2021 / 2020 / 2019 / 2018 / 2016 / 2015
- SIGIR 2023 / 2022 / 2021 / 2020
- CIKM 2023 / 2022 / 2021 / 2020 / 2019 / 2018 / 2017
- DASFAA 2023 / 2022 / 2021 / 2020 / 2019 / 2018 / 2017 / 2015
- WSDM 2023 / 2022 / 2021
- BigData 2021 / 2020 / 2019 / 2018 / 2017 / 2016 / 2015 / 2014 / 2013
- WISE 2022 / 2021 / 2018 / 2017 / 2016
- APWeb-WAIM 2023 / 2022 / 2021 / 2020 / 2019 / 2018
- APWeb 2014 / 2013
- IJCAI 2015, SDM 2022, Globecom 2019, ICPADS 2017 / 2014
- VLDB 2012 PhD track, ICDE 2016 TKDE poster track, EDBT 2016 poster track, …
Journal Associate Editor
- Journal of Computer Science and Technology (JCST), Editorial Board of Young Scientists
- Expert Systems with Applications (ESWA)
ACM JDIQ, BDMA Guest Editor
Reviewer for TODS, VLDBJ, TKDE, PVLDB, JDIQ, DAPD, WWWJ, INS, IEEE TSC, IEEE Intelligent Systems, ESWA, …
Group Members
- Songze Li, PhD, started in 2023
- Zijie Chen, PhD, started in 2023
- Haoquan Guan, PhD, started in 2022
- Jinzhao Xiao, PhD, started in 2022
- Yunxiang Su, PhD, started in 2021
- Rui Kang, PhD, started in 2020
- Chenguang Fang, PhD, started in 2019
- Yinan Mei, PhD, graduated in 2023, now at Huawei
- Yu Sun, PhD, graduated in 2022, now at NKU
- Ruihong Huang, PhD, graduated in 2022, now at FNU
- Fei Gao, PhD, graduated in 2021, now in government agency
- Aoqian Zhang, PhD, graduated in 2018, Postdoc at University of Waterloo, now at BIT
- Xiaochen Zhu, PhD, graduated in 2015, now at Baidu