现任必示科技算法研究员、产品部总监,清华大学计算机系博士,研究领域包括智能运维 (AIOps)、网络性能优化等,主要利用 AI 技 术赋能IT系统的运维监控、告警分析、故障定位、性能优化等方向,进一步提升 IT 系统稳定性和性能。总计发表论文 20 篇,其中在 JSAC、TON、KDD、ESEC/FSE 等 CCF A/B 类国际会议或期刊上发表 13 篇文章, 申请并授权国家发明专利 5 项,研制的智能运维系统在建设银行、中国移动、百度等 40 多家银行、证券、运营商、互联网等企业实施落地并取得良好效果。

🎓 学历

  • 2013.09 - 2019.09, 清华大学 计算机科学与技术系, 北京海淀, 博士, 导师: 赵有健教授和裴丹教授
  • 2009.09 - 2013.06, 吉林大学 计算机科学与技术系, 吉林长春, 本科

💻 工作经历

  • 2019.08 - 至今, 北京必示科技有限公司, 算法研究员、产品部总监,北京
  • 2019.10 - 2021.10, 清华大学计算机系NetMan实验室, 博士后,北京
  • 2015.4 - 2018.4, 百度在线网络技术有限公司, 百度统一前端 (BFE) 团队实习生,北京
  • 2014.1 - 2015.4, 百度在线网络技术有限公司, 智能运维 (IOP) 团队实习生, 北京

📝 论文


[1] Yiran Cheng, Bo Cheng, Pengxiang Jin, Yongqian Sun, Xiaohui Nie, Nengwen Zhao, Shenglin Zhang, Dan Pei. “Effective Attribute Selection for Multi-dimensional Root Cause Analysis.” 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2022. (CCF 推荐B类会议)

[2] Zeyan Li, Nengwen Zhao, Mingjie Li, Xianglin Lu, Lixin Wang, Dongdong Chang, Xiaohui Nie, Li Cao, Wenzhi Zhang, Kaixin Sui, Yanhua Wang, Xu Du, Guoqiang Duan, Dan Pei. “Actionable and interpretable fault localization for recurring failures in online service systems.” Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022. (CCF 推荐A类会议)

[3] Mingjie Li, Minghua Ma, Xiaohui Nie, Kanglin Yin, Li Cao, Xidao Wen*, Zhiyun Yuan, Duogang Wu, Guoying Li, Wei Liu, Xin Yang, Dan Pei. “Mining Fluctuation Propagation Graph Among Time Series with Active Learning.” Database and Expert Systems Applications: 33rd International Conference, DEXA 2022. (CCF 推荐C类会议)

[4] Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei. “Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition”, KDD 2022, Washington, DC, USA, Aug 14-18, 2022 (CCF 推荐A类会议)

[5] Xianglin Lu, Zhe Xie, Zeyan Li, Mingjie Li, Xiaohui Nie, Nengwen Zhao, Qingyang Yu, Shenglin Zhang, Kaixin Sui, Lin Zhu and Dan Pei. “Generic and Robust Performance Diagnosis via Causal Inference for OLTP Database Systems.” 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid). IEEE, 2022.(CCF 推荐C类会议)

[6] Canhua Wu, Nengwen Zhao, Lixin Wang, Xiaoqin Yang, Shining Li, Ming Zhang, Xing Jin, Xidao Wen, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei. “Identifying root-cause metrics for incident diagnosis in online service systems.” 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2021. (CCF 推荐B类会议)

[7] Minghua Ma, Shenglin Zhang, Junjie Chen, Jim Xu, Haozhe Li, Yongliang Lin, Xiaohui Nie, Bo Zhou, Yong Wang, Dan Pei.”Jump-starting multivariate time series anomaly detection for online service systems.” Proceedings of the 2021 USENIX Annual Technical Conference. 2021. (CCF 推荐A类会议)

[8] Zeyan Li, Junjie Chen, Rui Jiao, Nengwen Zhao, Zhijun Wang, Shuwei Zhang, Yanjun Wu, Long Jiang, Leiqin Yan, Zikai Wang, Zhekang Chen, Wenchi Zhang, Xiaohui Nie, Kaixin Sui, Dan Pei. “Practical root cause localization for microservice systems via trace analysis.” 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS). IEEE, 2021.(CCF 推荐B类会议)

[9] Yuchao Zhang, Xiaohui Nie*, Junchen Jiang, Wendong Wang, Ke Xu, Youjian Zhao, Martin J. Reed, Haiyang Wang, Guang Yao, Kai Chen. “BDS+: A Centralized Near-Optimal Network System for Inter-Datacenter Data Replication”. Transaction on Networking (ToN), 2021. (CCF 推荐A类期刊,SCI检索,影响因子: 2.186)

[10] Nengwen Zhao, Junjie Chen, Zhou Wang , Xiao Peng , Gang Wang, Yong Wu, Fang Zhou, Zhen Feng, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei. “Real-time incident prediction for online service systems.” Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2020. (CCF 推荐A类会议)

[11] Nengwen Zhao, Junjie Chen, Xiao Peng, Honglin Wang, Xinya Wu, Yuanzong Zhan, Zikai Chen, Xiangzhong Zheng, Xiaohui Nie, Gang Wang, Yong Wu, Fang Zhou, Wenchi Zhang, Kaixin Sui, Dan Pei. “Understanding and handling alert storm for online service systems.” Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Software Engineering in Practice. 2020. (CCF 推荐A类会议)

[12] Ping Liu, Yu Chen, Fuying Wang, Xiaohui Nie, Jing Zhu, Ming Zhang, Dan Pei. “FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Web Service Failure Mitigation”. The 30th International Symposium on Software Reliability Engineering (ISSRE), 2019. (CCF 推荐B类会议)

[13]Xiaohui Nie, Youjian Zhao, Zhihan li, Guo Chen, Kaixin Sui, Jiyang Zhang, Zijie Ye, Dan Pei. “Dynamic TCP Initial Windows and Congestion Control Schemes through Reinforcement Learning.” IEEE Journal on Selected Areas in Communications (JSAC), 2019. (CCF 推荐A类期刊,SCI检索,影响因子: 7.172)

[14] 邹磊,朱晶,聂晓辉,苏亚 ,裴丹,孙宇. ``基于聚类的多维数据热点发现算法”, 小型微型计算机系统,465-471 页,第40卷,第3期, 2019年3月.

[15] Xiaohui Nie, Youjian Zhao, Guo Chen, Kaixin Sui, Yazheng Chen, Dan Pei, Miao Zhang, Jiyang Zhang. “Reducing Web Latency through Dynamically Setting TCP Initial Window with Reinforcement Learning.” 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS) (pp. 1-10). IEEE. (CCF 推荐B类会议)

[16] Yuchao Zhang, Junchen Jiang, Ke Xu, Xiaohui Nie, Martin J. Reed, Haiyang Wang, Guang Yao, Miao Zhang, Kai Chen. “BDS: a centralized near-optimal overlay network for inter-datacenter data replication.” Proceedings of the Thirteenth EuroSys Conference. ACM, 2018. (CCF推荐B类会议)

[17] Yongqian Sun, Youjian Zhao, Ya Su, Dapeng Liu, Xiaohui Nie, Yuan Meng, Shiwen Cheng, Dan Pei, Shenglin Zhang, Xianping Qu, Xuanyou Guo. “Hotspot: Anomaly Localization for Additive KPIs with Multi-Dimensional Attributes.” `IEEE Access}, 2018, 6: 10909-10923. (SCI检索,影响因子:3.557})

[18] Xiaohui Nie, Youjian Zhao, Guo Chen, Kaixin Sui, Yazheng Chen, Dan Pei, Miao Zhang, Jiyang Zhang. “TCP WISE: One initial congestion window is not enough.” 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC). IEEE, 2017: 1-8.(CCF 推荐C类会议)

[19] Xiaohui Nie, Youjian Zhao, Kaixin Sui, Dan Pei, Yu Chen, Xianping Qu. “Mining causality graph for automatic web-based service diagnosis.” 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC). IEEE, 2016: 1-8. (CCF 推荐C类会议)

[20] Yuchao Zhang, Ke Xu, Guang Yao, Miao Zhang, Xiaohui Nie. “Piebridge: A Cross-DR Scale Large Data Transmission Scheduling System.” In Proceedings of the 2016 ACM SIGCOMM Conference (pp. 553-554). ACM. (CCF A类会议POSTER)

专利


  1. 邵敏依, 聂晓辉, 汤汝鸣,温希道,程世文,孙永谦. 一种基于调用链的根因定位方法、装置、设备及存储介质, 授权公告 号:CN116820826B, 授权公告日: 2023/11/24.
  2. 陈哲康, 温希道, 汤汝鸣, 聂晓辉, 程世文. 一种基于红蓝对抗的故障定位应用的评测方法与系统. 授权公告号: CN116302762B,授 权公告日:2023/08/18.
  3. 温希道, 曹立, 汤汝鸣, 聂晓辉, 程世文. 一种异常变更检测方法、装置、设备及存储介质. 中国发明专利号: CN115391160B,授权 公告日:2023/04/07.
  4. 汤汝鸣, 曹立, 聂晓辉, 刘大鹏. 一种网络故障排查方法与系统. 授权公告号: CN114785666B,授权公告日:2022/10/04.
  5. 聂晓辉, 章淼, 吴功伟, 张继洋. 一种 TCP 初始窗口优化方法和系统. 授权公告号: CN108111430B,授权公告日:2022/02/25.

项目经历

  1. 国家自然科学基金项目: 面向多模态监控数据的微服务无监督异常检测与定位,57 万元,2021/1/1-2024/12/31,第二参与者.
  2. 国家自然科学基金项目: 基于大数据分析的互联网服务性能管理体系结构研究,86 万元,2015/1/1-2018/12/31,第三参与者.
  3. 中国移动联合研究院合作项目:网络感知和诊断技术,212.52 万元,2021/7/6-2021/12/31,第四参与者.
  4. 深圳证券合作项目:深圳证券通信有限公司 AIOps 科研项目合作协议,89 万元,2020/8/27-2021/12/31,第二参与者.
  5. 中国银行合作项目:AIOPS 在网络运维中的研究与初步实践,84 万,2019/12/30-2020/12/30,第二参与者.
  6. 华为合作项目:针对 ADOTP 模板组合异常检测、ADOTS 模板序列异常检测的机器学习故障诊断技术项目,180 万, 2019/8/9-2020/8/8,第二参与者.