International Workshop
on Knowledge Graph:
Heterogeneous Graph Deep Learning and Applications

KDD 2021

10:00 AM - 2:00 PM and 3:00 PM - 7:00 PM(US Central Time), AUGUST 15, 2021

Confirmed Speakers and Panelists

Qi He


Jie Tang

Tsinghua University

Atlas Wang

The University of Texas at Austin

Nitesh Chawla

University of Notre Dame

Marinka Zitnik

Harvard Medical School

Yizhou Sun


Tatiana Erekhinskaya


Christos Boutsidis

Goldman Sachs

Keshav Pingali

The University of Texas at Austin

Maria Brbic

Stanford University

Ridho Reinanda


Pranav Rajpurkar

Stanford University


Knowledge graph has left its footprint almost everywhere, from virtual assistant at our home, online shopping, self-driving car, to stock prediction. Our daily activities have closely intermingled with various applications powered by knowledge graph. It even enters to our healthcare to facilitate clinical decision making and improve hospital efficiency.

Gartner has predicted that knowledge graph (i.e., connected data with semantically enriched context) applications and graph mining will grow 100% annually through 2022 to enable more complex and adaptive data science. Applying and developing novel deep learning methods on graphs is now one of the most heated topics with the highest demands from academia and industry. Graph convolutional neural networks, graph transformer, graph embedding and more have achieved great performance on various downstream tasks.

Knowledge graphs (KGs) are important resources for Artificial Intelligence (AI) solutions that seek to go beyond generating an insight, to interpreting the past, current and possible future contexts to which the insight applies. Not only can it be misleading to mine data without considering or providing context, disconnected insights can be of limited use in complex, real-world situations. To move beyond retail consumer applications and otherwise narrow-AI tasks, it is necessary to address several challenges. For example, few embedding methods can adequately deal with heterogeneous KGs which comprise different types of nodes and edges. However, this heterogeneity, if properly represented, has the potential to aid in the development of novel deep learning methods (e.g., by offering new ways for data augmentation, contrastive learning, and pre-training models).


10:00 AM - 2:00 PM and 3:00 PM - 7:00 PM(US Central Time), AUGUST 15, 2021

  • 10:00-10:30 AM › Keynote Jie Tang (Tsinghua University)
  • 10:30-11:00 AM › Keynote Yizhou SUN (UCLA)
  • 11:00-11:30 AM › Keynote Nitesh Chawla (U of Norte Dame)
  • 11:30-12:00 PM › Keynote Atlas Wang (UT Austin)
  • 12:00-12:30 PM › Keynote Marinka Zitnik (Harvard)
  • 12:30-13:00 PM › Keynote Pranav Rajpurkar (Stanford)
  • 13:00-14:00 PM › Panel: Graph Deep Learning: Challenges and Opportunities (Moderator: Dr. Ying Ding, Panelists: Jie Tang, Yizhou Sun, Nitesh Chawla, Atlas Wang, Marinka Zitnik, Pranav Rajpurkar)
  • 14:00-15:00 PM BREAK
  • 15:00-15:30 AM › Keynote Qi He (LinkedIN)
  • 15:30-16:00 AM › Keynote Ridho Reinanda (Bloomberg)
  • 16:00-16:30 AM › Keynote Christos Boutsidis (Goldman Sachs)
  • 16:30-17:00 PM › Keynote Tatiana Erekhinskaya (Lymba)
  • 17:00-17:30 PM › Keynote Keshav Pingali (Katana Graph)
  • 17:30-18:00 PM › Keynote Maria Brbic (Stanford)
  • 18:00-18:30 PM › Paper Presentation (each 10 mins)
  • › Paper 1: Jaspreet Singh Dhani, Ruchika Bhatt, Balaji Ganesan, Parikshet Sirohi and Vasudha Bhatnagar. Similar Cases Recommendation using Legal Knowledge Graphs
  • › Paper 2: Yongqi Zhang, Zhanke Zhou and Quanming Yao. AutoSF+: Towards Automatic Scoring Function Design for Knowledge Graph Embedding
  • › Paper 3: Yushi Hirose, Shimbo Masashi and Taro Watanabe. Transductive Data Augmentation with Relational Path Rule Induction for Knowledge Graph Embedding
  • 18:30-19:30 PM › Panel: Knowledge Graph in Industry (Moderator: Dr. Oshani Seneviratne, Panelists: Qi He, Keshav Pingali, Christos Boutsidis, Tatiana Erekhinskaya, Ridho Reinanda, Maria Barbic)

The workshop will be open for the whole conference. Each submitted paper will be evaluated by three reviewers from the aspects of novelty, significance, technique sound, experiments, and presentations. The reviewers will be program committee members or researchers recommended by the members.

All papers submitted should have a maximum length of 8 pages and demo papers should be no more than 4 pages. All must be prepared using the ACM camera-ready template. Authors are required to submit their papers electronically in PDF format.

Please submit your papers at

Keynote 1 - Pranav Rajpurkar

Extracting Clinical Entities and Relations from Radiology Reports

ABSTRACT: Extracting structured clinical information from free-text radiology reports can enable the use of radiology report information for a variety of critical healthcare applications. In this talk, I'll share our recent work on RadGraph, a dataset of entities and relations in full-text chest X-ray radiology reports based on a novel information extraction schema we designed to structure radiology reports. The talk will cover key lessons in the design and development of the dataset as a multi-disciplinary international collaboration, and cover strategies we used for validation of the quality and utility of the dataset.

Pranav Rajpurkar is an Assistant Professor at Harvard University in the Department of Biomedical Informatics. His research approaches problems in clinical medicine with a computational lens, developing algorithms and datasets that can drive AI technologies to support medical decision making. He co-hosts The AI Health Podcast and co-edits the Doctor Penguin AI Health Newsletter. He instructed the Coursera course series on AI for Medicine, and founded the AI for Healthcare Bootcamp Program. Previously, he completed his PhD co-advised by Andrew Ng and Percy Liang at Stanford, where he also received both his Bachelors and Masters Degrees in Computer Science.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 2 - Atlas Wang

Learning generalizable, transferable, and robust representations from unlabeled graphs via contrastive learning

ABSTRACT: Self-supervised learning on graph-structured data has drawn explosive interest. Among the recent advances, graph contrastive learning (GraphCL) has emerged with promising representation learning performance. In this talk, I will start by discussing its general framework designed for homogenous graphs. We first design four types of graph augmentations and systematically study the impact of their various combinations, on multiple datasets and in four different settings: semi-supervised, unsupervised, and transfer learning, as well as adversarial attacks. The results show that, even without tuning augmentation extents nor using sophisticated GNN architectures, GraphCL can produce graph representations of similar or better generalizability, transferability, and robustness compared to state-of-the-arts. I will then discuss our more recent progress on automatically selecting the data augmentation types per dataset. Our proposed minimax selections of augmentations are shown to be in general aligned with previous “best practices" observed from manual tuning -- yet now being fully automated, more flexible and versatile. I will lastly discuss how to generalize GraphCL from homogenous to heterogenous graph data, by injecting more knowledge priors via the form of meta path.

Professor Atlas Wang is currently an Assistant Professor of Electrical and Computer Engineering at UT Austin, leading the VITA research group ( He also holds a visiting researcher position at Amazon. He was an Assistant Professor of Computer Science and Engineering, at the Texas A&M University, from 2017 to 2020. He received his Ph.D. degree in ECE from UIUC in 2016, advised by Professor Thomas S. Huang; and his B.E. degree in EEIS from USTC in 2012. Prof. Wang has broad research interests in machine learning, computer vision, optimization, and their interdisciplinary applications. Most recently, he studies automated machine learning (AutoML), learning to optimize (L2O), robust learning, efficient learning, and graph neural networks. He has received many research awards and scholarships, including most recently an ARO Young Investigator award, an IBM Faculty Research Award, a J. P. Morgan Faculty Research Award, an Amazon Research Award (AWS AI), an Adobe Data Science Research Award, a Young Faculty Fellow of TAMU, and four research competition prizes from CVPR/ICCV/ECCV.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 3 - Maria Brbic

TrialNet: Knowledge Graph for Predicting Outcomes of Clinical Trials

ABSTRACT: Biomedical knowledge is naturally represented using networks; however, biological networks are often scattered across different resources and not linked with each other. The key to utilizing machine learning to advance new biological discoveries lies in grounding machine learning models in the background knowledge constructed by integrating existing biomedical databases. We present TrialNet, a massive clinical trials knowledge graph grounded in the underlying biology and chemistry of drug and disease mechanisms of action. TrialNet is built over the whole clinical trials database of more than 130,000 interventional clinical trials by automatically structuring clinical trial design entities and linking the extracted entities to the external biological and chemical databases. We present a graph neural network-based framework learnt over TrialNet that maps clinical trials to the low-dimensional embedding space, which effectively captures the trial design and its biological context. We show that our framework can be used to reason over clinical trials, their underlying biology and chemistry, and accurately predict the outcome of clinical trials, such as safety and adverse events. Furthermore, our model is able to capture subtle changes in trial design choices such as eligibility criteria, identifying the populations at risk of developing adverse events.

Maria Brbic is a postdoctoral fellow in Computer Science at Stanford University, where she is working with Prof. Jure Leskovec. Her research focuses on developing new machine learning methods applied for studying challenging problems in biology and biomedicine. She is involved in projects at Chan Zuckerberg Biohub and Stanford Neuro-omics Initiative. She received her PhD degree in Computer Science from University of Zagreb while also conducting research at University of Tokyo and Stanford University as a Fulbright Scholar.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 4 - Christos Boutsidis

The Anatomy of the Goldman Sachs Knowledge Graph.

ABSTRACT: In this talk, we will describe engineering and machine learning efforts towards building the Goldman Sachs Knowledge Graph. To this end, we combine multiple sources of structured (e.g., trades, transactions) and unstructured (e.g., text, voice) data into a single, highly heterogeneous, knowledge graph - our mission is to capture and understand all the firm relationships related to communication, trading, money transfers and more importantly the relationships between the firm and its clients. From a technology stack view point, and in order to enable efficient processing of billions of data points (e.g., emails, trades, etc.) on a daily basis, we rely on distributed computations and environments, specifically on (Hadoop, Map Reduce, HDFS, HBase, Java). From an algorithms perspective, the team develops scalable solutions for several graph mining problems such as vertex centrality (e.g., PageRank), vertex similarity, (shortest) paths, vertex deduplication, community detection, and graph embeddings, to name a few. Traditional Machine Learning approaches are also being used to solve (vertex, edge) classification problems at scale. Last but not least, from an applications point of view, our work is used across all Goldman Sachs divisions, for example: 1) within the Compliance Division, to enable the development of regulatory surveillances such as detection of insider trading and anti-money laundering; 2) within the Human Capital Management Division, to enable a contact tracing application very much needed with the firm’s Return to Office approach; 3) within the Investment Banking and Global Markets Divisions, to enable the next generation of Client Relationship Management (CRM) systems.

Christos Boutsidis currently leads Goldman Sach’s Knowledge Graph group. Before that, Christos held Research Scientist positions with Yahoo Research and IBM Research. Even before that, Christos earned a Ph.D. in Computer Science from Rensselaer Polytechnic Institute in May of 2011 and a BS in Computer Engineering from the University of Patras, in Greece in July of 2006. Dr Boutsidis has published over 30 articles in conferences and journals in algorithms, machine learning, and statistical data analysis.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 5 - Nitesh V. Chawla

Data, Content, Structure, and Order in Heterogeneous Graphs

Keynote 6 - Ridho Reinanda

Financial Knowledge Graph at Bloomberg: Applications and Challenges

ABSTRACT: The Bloomberg Knowledge Graph (bbKG) is a graph-based representation of entities and relationships in the financial world that connects cross-domain data from various sources within Bloomberg. Recent developments in machine learning, knowledge graphs, and language technology have enabled intelligent ways to uncover interesting patterns amongst data that reveal previously hidden insights. By leveraging the entity and relationship information in the knowledge graph, interesting potential applications emerge, especially when combined with other information such as market data and news stories. This talk details how Bloomberg uses the knowledge graph and semantic technologies to enable use cases across the organization, such as linking data across different domains, enriching news stories, and supporting financial analytics centered around entities. We will discuss how examining entity relationships among companies and industries enable us to gain insights when performing a financial analytics study around the impact of COVID-19.

Ridho Reinanda is an AI Research Scientist at Bloomberg who is currently leading the Knowledge Graph team. He obtained his Ph.D. in Information Retrieval at the University of Amsterdam, where he focused on leveraging knowledge graphs for information retrieval tasks and applying IR techniques for knowledge graph maintenance. Since joining Bloomberg, he has focused on building the Bloomberg Knowledge Graph and integrating it in various downstream applications. He recently published a survey in the Foundations and Trends in Information Retrieval book series, titled "Knowledge Graphs: an Information Retrieval Perspective" with collaborators at Bloomberg and the University of Amsterdam

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 7 - Marinka Zitnik

Few-Shot Learning for Network Biology

ABSTRACT: Prevailing methods for learning on knowledge graphs require abundant label information. However, labeled examples are scarce for problems at the scientific frontier, considerably limiting the methods' use for tasks that require reasoning about new phenomena, such as novel drugs in development, emerging pathogens, and patients with rare diseases. In this talk, I will describe algorithms that enable few-shot learning for network biology. At the core is the notion of local subgraphs that transfer information from one learning task to another, even when each task has only a handful of labeled examples. This principle is theoretically justified as we show that the evidence for a prediction can be found in the local subgraph surrounding target nodes or edges. I will illustrate few-shot learning methods on several problems, including modeling ultra high-order drug combinations and studying proteins across 1,840 species.

Marinka Zitnik is an Assistant Professor at Harvard University with appointments in the Department of Biomedical Informatics, Broad Institute of MIT and Harvard, and Harvard Data Science. Her research recently won best paper and research awards from the International Society for Computational Biology, International Conference of Machine Learning, the Bayer Early Excellence in Science Award, Amazon Faculty Research Award, Rising Star Award in EECS, and Next Generation Recognition in Biomedicine, being the only young scientist who received such recognition in both EECS and Biomedicine.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 8 - Yizhou Sun

Combining Representation Learning and Logical Rule Reasoning for Knowledge Graph Inference

ABSTRACT: Knowledge graph inference has been studied extensively due to its wide applications. It has been addressed by two lines of research, i.e., the more traditional logical rule reasoning and the more recent knowledge graph embedding (KGE). Recently, there is a trend to combine these two worlds. In this talk, we will introduce two recent developments in our group in this direction. First, we propose to leverage logical rules to bring in high-order dependency among entities and relations for KGE. In particular, we introduce probabilistic soft logic (PSL) to capture the semantic loss caused by logical rules, which is critical to address the uncertainty in KG. Second, by limiting the logical rules to be the definite Horn clauses, we are able to fully exploit the knowledge in logical rules and enable the mutual enhancement of logical rule-based reasoning and KGE in an extremely efficient way, without the need of sampling as done in the earlier methods.

Yizhou Sun is an associate professor at department of computer science of UCLA. She received her Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2012. Her principal research interest is on mining graphs/networks, and more generally in data mining, machine learning, and network science, with a focus on modeling novel problems and proposing scalable algorithms for large-scale, real-world applications. She is a pioneer researcher in mining heterogeneous information network, with a recent focus on deep learning on graphs/networks. Yizhou has over 150 publications in books, journals, and major conferences. Tutorials of her research have been given in many premier conferences. She is a recipient of Best Student Paper Award, ACM SIGKDD Doctoral Dissertation Award, Yahoo ACE (Academic Career Enhancement) Award, NSF CAREER Award, CS@ILLINOIS Distinguished Educator Award, Amazon Research Awards (twice), and Okawa Foundation Research Award.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Topics (not limit to)

Building KGs using NLP

Heterogeneous graph embedding, graph transformer, and graph convolutional neural network

Contrastive learning in graph mining

Graph deep learning for semantic reasoning

Visual searching and browsing of KGs

Industrial applications of KGs: banking, financing, retail, healthcare, medicine, pharma, etc

KGs in computer vision, medical imaging

KGs for explainable AI

KGs for AI ethics and misinformation

Important Dates

(all deadlines are midnight UTC)

June 1, 2021

Workshop paper submissions

June 10, 2021

Workshop paper notifications

July 2, 2021

Final submission of workshop program and materials

August 14-18, 2021

Workshop date


Ying Ding

University of Texas at Austin

Ching-Hua Chen


Haoyun Feng


Oshani Seneviratne


Bogdan Arsintescu


Francois Scharffe

Columbia University

Juan Sequeda