International Workshop
on Knowledge Graph:
Mining Knowledge Graph for Deep Insights

KDD CONFERENCE 2020

8:00 AM - 8:30 PM (Pacific Daylight Time). Virtual Conference. Aug 24, 2020

New! Due to the COVID-19 pandemic and many requests to extend the deadline, we would like to give authors more time to prepare their submssions. Our new deadline for submission will be June 5th, 2020.

Confirmed Speakers and Panelists

Melliyal
Annamalai

Oracle

Michael Atkin

Agnos.ai

Brad Bebee

Amazon

Kim Branson

GSK

Shih-Fu Chang

Columbia University

Anne Cocos

GSK

Jonathan Dry

Tempus

Jiawei Han

UIUC

James Hendler

Rensselaer
Polytechnic
Institute

Vivek Khetan

Accenture Labs

Ora Lassila

Amazon

Brian Martin

AbbVie

Tudor Oprea

The University of New Mexico

Tom Plasterer

AstraZeneca

Christopher Ré

Stanford University

Juan Sequeda

data.world

David Wild

Data2Discovery

Marinka Zitnik

Harvard University

Introduction

Knowledge graphs (KGs) are becoming the foremost driving force to enable the Artificial Intelligence (AI). Like human brains, knowledge graphs will become the brains for machines that can connect dots, perform cognitive inference, and most importantly, derive insights from the vast amount of heterogeneous data. The cutting-edge machine learning and deep learning algorithms can empower machines to detect hidden patterns and build strong memories beyond human imagination, but if the data is siloed (or disconnected), no matter how big it is, it is powerless. Knowledge graph is the necessary step to integrate disparate datasets and build machine processible knowledge to enable intelligent machine learning and deep learning.

Graph data model will replace the relational data model to become the prominent data model to realize the intelligence of AI. Because the relationships of data are critical to understand the complexity about organizations, people, biological entities, and financial transactions. Gartner predicted that knowledge graph application and graph mining will grow 100% annually through 2022 to enable more complex and adaptive data science. In regard to the black box nature of AI algorithms, explainable AI becomes indispensable for applications which demand transparent decision makings. Knowledge graph can play an essential role to decipher the hidden connections and complex contexts into traceable paths. Therefore, knowledge graphs have been widely applied in drug discovery, fraud detection, healthcare, financial intelligence, business intelligence, chatbot, virtual assistant, and robots.



Due to COVID-19 pandemic, we will work closely with KDD conference organizers to investigate feasible options to make this workshop successful.

Agenda

(Pacific Daylight Time, Aug 24)

The workshop will be open for the whole conference. Each paper will be evaluated by three reviewers from the aspects of novelty, significance, technique sound, experiments, and presentations. The reviewers will be program committee members or researchers recommended by the members.

Authors can submit either full papers of 8 pages in length or short papers of 4 pages length in the ACM format (https://www.acm.org/publications/proceedings-template), with the "sigconf" option. Since we plan to follow single-blind review process, there is no need to anonymize the author list.

Publication:   High quality submissions with substantial revisions will be invited to submit to Data Intelligence Journal published by MIT Press (https://www.mitpressjournals.org/loi/dint).

Please submit your papers at https://easychair.org/conferences/?conf=iwkg-kdd2020

Topics (not limit to)

Building KGs using NLP

Visual searching and intelligent browsing KGs

Industrial applications of KGs: banking, financing, retail, healthcare, medicine, pharma, etc.

KGs powered machine learning and deep learning

KGs in computer vision, medical imaging

KGs for AI ethics and misinformation




Machine learning, including deep learning, algorithms on KGs

Visualizing KGs

Inferencing on KGs

Intelligent services using KGs: chatbot, virtual assistant

KGs for explainable AI

Semantic web and KGs

Important Dates

(all deadlines are midnight Alofi Time)

June 5, 2020

Submissions due

June 15, 2020

Acceptance notifications

July 1, 2020

Camera-ready submission

Aug 24, 2020

Workshop date

Organizers

Ying Ding

University of Texas at Austin

Benjamin Glicksberg

Icahn School of Medicine at Mount Sinai

James Hendler

Rensselaer
Polytechnic
Institute

Edgar Meij

Bloomberg,
London,
United Kingdom

Francois Scharffe

Columbia University

Jie Tang

Tsinghua University

Fei Wang

Cornell University

Keynote 1 - James Hendler (RPI)


Knowledge Graph Semantics

ABSTRACT: Oh dear, there’s that word again - “semantics!” Isn’t that what doomed that Semantic Web thing and led to knowledge graphs instead? In this talk, I discuss how knowledge graphs, linked data and, yes, semantics are all critically linked and why the latter is still relevant to the growth and scaling of knowledge graphs into the future - and specifically to the ability to extract better data from them.

James Hendler is the Director of the Institute for Data Exploration and Applications and the Tetherless World Professor of Computer, Web and Cognitive Sciences at RPI. He also is acting director of the RPI-IBM Artificial Intelligence Research Collaboration and serves as a Chair of the Board of the UK’s charitable Web Science Trust. Hendler has authored over 400 books, technical papers and articles in the areas of Semantic Web, artificial intelligence, agent-based computing and high-performance processing. Hendler was the recipient of a 1995 Fulbright Foundation Fellowship, is a former member of the US Air Force Science Advisory Board, and is a Fellow of the AAAI, BCS, the IEEE, the AAAS and the ACM. He is also the former Chief Scientist of the Information Systems Office at the US Defense Advanced Research Projects Agency (DARPA) and was awarded a US Air Force Exceptional Civilian Service Medal in 2002. He is also the first computer scientist to serve on the Board of Reviewing editors for Science. In 2010, Hendler was named one of the 20 most innovative professors in America by Playboy magazine and was selected as an “Internet Web Expert” by the US government. In 2013, he was appointed as the Open Data Advisor to New York State and in 2015 appointed a member of the US Homeland Security Science and Technology Advisory Committee. In 2016, became a member of the National Academies Board on Research Data and Information and in 2018 became chair of the ACM’s US technology policy committee and was elected a Fellow of the National Academy of Public Administration.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 2 - Jiawei Han (UIUC)


Construction of Multi-Dimensional Knowledge Graphs

ABSTRACT: Knowledge graphs have been popularly constructed for information retrieval, question answering and other knowledge-based reasoning tasks. However, with the dynamics of data and the diversity of information need, we argue that “a knowledge graph in need is a good knowledge graph indeed” and consider another kind of knowledge-graphs: domain/problem-specific, multi-dimensional knowledge graphs. We discuss how to automatically construct such knowledge-graphs from massive unstructured text, based on the recent research on text mining.

Jiawei Han is Michael Aiken Chair Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He received ACM SIGKDD Innovation Award (2004), IEEE Computer Society Technical Achievement Award (2005), IEEE Computer Society W. Wallace McDowell Award (2009), and Japan's Funai Achievement Award (2018). He is Fellow of ACM and Fellow of IEEE and served as co-Director of KnowEnG, a Center of Excellence in Big Data Computing (2014-2019), funded by NIH Big Data to Knowledge (BD2K) Initiative and as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 3 - Christopher Ré (Stanford University)


Bootleg: Guidable Self-Supervision for Named Entity Disambiguation

ABSTRACT: Mapping textual mentions to entities in a knowledge graph is a key step in using knowledge graphs, called Named Entity Disambiguation (NED). A key challenge in NED is generalizing to rarely seen (tail) entities. Traditionally NED uses hand-tuned patterns from a knowledge base to capture rare, but reliable, signals. Hand-built features make it challenging to deploy and maintain NED--especially in multiple locales. While at Apple in 2018, we built a self-supervised system for NED that was deployed in a handful of locales and that improved performance of downstream models significantly. However, due to the fog of production, it was unclear what aspects of these models were most valuable. Motivated by this experience, we built Bootleg, a clean-slate, open-source, self-supervised system to improve tail performance using a simple transformer-based architecture. Bootleg improves tail generalization through a new inverse regularization scheme to favor more generalizable signals automatically. Bootleg-like models are used by several downstream applications. As a result, quality issues fixed in one application may need to be fixed independently in many applications. Thus, we initiate the study of techniques to fix systematic errors in self-supervised models using weak supervision, augmentation, and training set refinement. Bootleg achieves new state-of-the-art performance on the three major NED benchmarks by up to 3.3 F1 points, and it improves performance over BERT baselines on tail slices by 50.1 F1 points.

Bootleg is open-source at http://hazyresearch.stanford.edu/bootleg/.

Christopher Ré is an American computer scientist. He is currently employed by Stanford University, where he is an associate professor. He was awarded a MacArthur Fellowship in 2015. Ré specializes in big data analysis. He co-founded Lattice.io, a data mining and machine learning company that was acquired by Apple in May 2017. More recently, he cofounded SambaNova systems based, in part, on his work on accelerating machine learning.

Keynote 4 - Shih-Fu Chang (Columbia University)


Multimodal Knowledge Graph: Generation Methods, Applications, and Challenges

ABSTRACT: Knowledge graph (KG) has been shown to be a powerful representation that can be used to support critical applications such as reasoning, hypothesis discovery, and QA. However, most of current KG research has been limited to the text modality, missing the opportunity to capture rich information embodied in images, videos, and their correlation with text documents. In this talk, I will presentemerging research and opportunity in constructing multimodal KGs, such as scene graph generation, multimodal grounding, and common space for multimodal event and argument extraction. I will share successful initial experiences in using such results in tasks such as visual QA or visual commonsense reasoning. More importantly, I will discuss open research challenges such as vocabulary limitation and semantic gap between text and visual KGs.

Shih-Fu Chang's research is focused on computer vision, machine learning, and multimodal knowledge graph. His work on content-based visual search in the early 90's set the foundation of this vibrant area. Over the years, he has developed innovative solutions for image/video recognition, multimodal analysis, image forensics, and compact hashing for large-scale search. His work has been used in products and applications like image/video search engines, online human trafficking prevention, and brain machine interfaces. He was awarded the IEEE Signal Processing Society Technical Achievement Award, ACM Multimedia Special Interest Group Technical Achievement Award, the Honorary Doctorate from the University of Amsterdam, and the IEEE Kiyo Tomiyasu Award. He received the Great Teacher Award from the Society of Columbia Graduates. He served as Chair of ACM SIGMM during 2013-2017 and advisor/founder for several companies. In his current capacity as Senior Executive Vice Dean of Columbia Engineering, he plays a key role in the School's strategic planning, special initiatives, international collaboration, and faculty development. He is an Amazon Scholar, a Fellow of the American Association for the Advancement of Science (AAAS), ACM, and IEEE, and an elected Academician of Academia Sinica.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.

Keynote 5 - Jonathan Dry (Tempus)


Learning patient data graphs to drive biological discovery

ABSTRACT: Graphs can provide a powerful representation of biological data, reflecting the network oriented nature of biological entities and their interactions. By capturing patient data in graphs, we are able to take advantage of relationships between different entities to more fully understand patient similarity beyond directly overlapping measurements. By borrowing learning between connected nodes in a graph we can overcome some of the pitfalls common to biological data analysis such as small data, sparsity and heterogeneity. Furthermore, revealing graph sub-structures associated with phenotypes of interest can provide interpretable biological rationale, overcoming the lack of transparency of some machine learning approaches. We will describe a number of approaches to graph representation learning over patient data and networks. These approaches enable us to integrate prior knowledge with patient multi-omics, clinical information or drug data, towards the goal of predicting biomarkers and mechanisms associated with patient outcomes and drug response.

Jonathan Dry recently joined Tempus as VP of Scientific Discovery, following a long career leading Bioinformatics and Data Science teams for discovery and translational medicine at AstraZeneca. His work includes a development of a number of approaches to harness pre-clinical and clinical multi-omics data sets to reveal new disease understanding and biomarkers for precision medicine.

Keynote 6 - Anne Cocos (GSK)


Accelerating drug discovery with the GSK.ai knowledge graph

ABSTRACT: At GSK.ai we build predictive models of biology in order to discover drug targets more quickly and with higher likelihood of success. One persistent difficulty in working with genetic datasets is the ability to learn and generalize from limited data. By initializing any model with enriched representations of genes, tissues, cells, and other biological entities that encode prior knowledge, we can begin to overcome the challenges of data sparsity. That prior information may be encoded within millions of text documents, existing structured databases, and experimental datasets. The GSK.ai knowledge graph is a platform that enables GSK scientists to jointly leverage structured and unstructured data in order to discover, encode, and predict biomedical knowledge. In this talk I will describe how we are building our graph using a variety of open source frameworks, scaling it to hundreds of billions of triples, and applying it to exciting problems in drug discovery.

Anne Cocos is an AI/ML Engineer at GSK, where she leads the team responsible for developing the GSK.ai knowledge graph, and for building the information extraction services that supply the graph with insights from text at scale. She holds a Ph.D. from the University of Pennsylvania, where she was supported by the Google Ph.D. Fellowship and the AI2 Key Scientific Challenges Award. Before pivoting her career to NLP research, Anne was an end-user of NLP-powered technologies in the U.S. Navy.

Keynote 7 - Brad Bebee (Amazon)


Customer trends: Gaining deeper insights from knowledge graphs

ABSTRACT: Putting data into context, relating it to other information, enables us to answer new questions. We can understand not only what something is, but how it is connected to other things. Understanding these relationships creates new opportunities and enables business transformation by connecting data in new ways; insights are revealed by connecting the unconnected. There is information embedded in these connections within the knowledge graphs that can be revealed using machine and deep learning approaches. We see this broadly for in knowledge graphs, identity graphs, and fraud graphs, and we will present examples of how customers in the Health space are gaining deeper insights using these approaches.

Brad Bebee leads Amazon Neptune, Amazon Web Service’s (AWS) fully managed graph database service, and works closely with customers and developers to help them build graph-enabled solutions. Prior to joining AWS, he was the CEO of Blazegraph, where he focused on leveraging products for high performance graph databases and analytics into business and mission areas and was an active open source contributor on the Blazegraph platform. He is a subject matter expert in graph and knowledge representation with experience ranging from the precursors of DARPA's DAML program to more recent work with large-scale data analytics using the Hadoop ecosystem, Accumulo, and related technologies. He has extensive experience in architecture and software modeling methodologies, where he has lead and collaborated upon multiple publications receiving recognition for his research. In 2006, he was selected as a participant in the National Academy of Engineering’s U.S. Frontiers of Engineering Symposium. Over the course of his career, Brad has served as a CEO, CTO, CFO, managed operating divisions, and performed advanced technology development for commercial and public-sector customers. He holds a B.S. in Computer Science from the University of Maryland at College Park.

Keynote 8 - Marinka Zitnik (Harvard Medical School)


Learning actionable representations of interconnected biology

ABSTRACT: Networks pervade biomedical data—from the molecular level to the level of connections between diseases in a person and all the way to the societal level encompassing all human interactions. These interactions at different levels give rise to a bewildering degree of complexity, which is only likely to be fully understood through an integrated systems view and the study of combined, multi-modal networks. In this talk, I describe our efforts to expand the scope and ease the applicability of machine learning in biology and medicine. First, I outline our graph neural networks for knowledge graphs. We show how these methods enable the repurposing of drugs for emerging diseases. We deployed our algorithms to rank drugs for their expected efficacy against SARS-CoV-2. The algorithms predicted several dozens of drugs, which we experimentally verified in the wet lab. These drugs rely on network-based actions and cannot be identified using traditional, non-network strategies. We also discuss how the methods enable the discovery of dozens of drug combinations safe in patients with considerably fewer unwanted side effects than today's treatments. Lastly, I describe our efforts in learning actionable representations that allow users of our models to ask what-if questions and receive predictions that are accurate and can be interpreted meaningfully.

Marinka Zitnik is an Assistant Professor at Harvard with appointments in the Department of Biomedical Informatics, Blavatnik Institute, Broad Institute of MIT and Harvard, and Harvard Data Science Initiative. Dr. Zitnik is a computer scientist studying applied machine learning with a focus on challenges brought forward by data in science, medicine, and health. Her methods are used by major institutions, including Baylor College of Medicine, Karolinska Institute, Stanford Medical School, and Massachusetts General Hospital. She has recently been named a Rising Star in EECS by MIT and also a Next Generation in Biomedicine by the Broad Institute, being the only young scientist who received such recognition in both EECS and Biomedicine.

Keynote 9 - Tudor Oprea (The University of New Mexico)


Machine learning prediction and tau-based validation identifies novel Alzheimer’s disease genes relevant to immunity

ABSTRACT: To this day, there are no disease-modifying medicines for Alzheimer’s disease (AD) and related dementias (ADRD). With the daunting exponential increase of ADRD burden and related deaths, the necessity to find new lines of attack is vital. One technological development that may help pave the way for widespread use of artificial intelligence (AI) is accurate data science and machine learning (ML). Using ML may be advantageous in finding new targets for diagnosing, predicting disease onset or prognosis, for personalized medicine. ML and AI could also assist with AD research. Numerous reports suggest that pathological accumulation of tau is a better predictor of cognitive decline in ADRD patients. Here we report the development of a workflow to validate protein knowledge (PKG)/meta-path (m-p)/ML predicted genes based on their ability to alter Alzheimer’s disease related tau pathology in three different experimental systems (human inducible pluripotent stem cell derived neurons (iPSNs) from sporadic AD, human autopsy brain samples and neuronal cell culture model of inflammation-induced tau pathology). First, the MPxgb predicted twenty genes involved in various pathways including immune function. Second, mRNA levels of eleven of the twenty predicted genes significantly altered in the sporadic AD iPSNs compared to controls iPSNs. Third, nine of the twenty predicted proteins were upregulated in AD, and that five of them were also upregulated in AD iPSNs; siRNA-mediated knockdown of all two genes identified top seven genes, which could significantly reduce AD-relevant tau hyperphosphorylation on Ser199/Ser202 (AT8) and Thr231 (AT180) sites. Finally, three (PIBF1, LILRA3 and CRTAM) of the top seven genes showed the most significant effect on tau phosphorylation and two (CRTAM and LILRA3) novel genes are in the TREM2-TYROBP AD-risk loci, which are implicated in innate immune pathways.

Tudor Oprea, MD PhD is Professor of Medicine and Pharmaceutical Sciences, and Chief, Translational Informatics Division, at the Department of Internal Medicine, University of New Mexico School of Medicine in Albuquerque, New Mexico (USA). Author of over 200 publications and book chapters and 9 US patents, Dr. Oprea is currently the PI for the Illuminating the Druggable Genome Knowledge Management Center, a NIH Common Fund initiative. His current research is in the development of validated machine learning and artificial intelligence models for target and drug discovery, by combining numerical and free-text information to model human disease biology and therapeutics.

➧ Thanks to the speakers for sharing their slides! ↯ Get the document of this keynote by this link.