ACL 2016 Paper: Modeling concept dependencies in a scientific corpus

A key distinction in the TechKnAcq project is the notion of a “concept graph”. We describe what this is in this page and describe work presented at the ACL annual meeting in 2016 to automatically discover dependency relationships between concepts discovered from the technical corpus.

Abstract

Our goal is to generate reading lists for students that help them optimally learn technical material. Existing retrieval algorithms return items directly relevant to a query but do not return results to help users read about the concepts supporting their query. This is because the dependency structure of concepts that must be understood before reading material pertaining to a given query is never considered. Here we formulate an information-theoretic view of concept dependency and present methods to construct a “concept graph” automatically from a text corpus. We perform the first human evaluation of concept dependency edges (to be published as open data), and the results verify the feasibility of automatic approaches for inferring concepts and their dependency relations. This result can support search capabilities that may be tuned to help users learn a subject rather than retrieve documents based on a single query.

Paper

You can download a PDF.

Code and Data

The associated code and data for the paper are available for download here.