Research

Mining Script-like Structures from Large-scale Text Sources

Knowing the sequences of events in situations such as eating at a restaurant is an example of commonsense knowledge needed for a broad range of cognitive tasks (e.g., language understanding). This work outlines an approach to mine information about sequential, every day situations in a topic-driven fashion to produce declarative, script-like representations (c.f., Schank's scripts).

Given a topic such as eating at a restaurant, we produce graphs of temporally ordered events involved with the activity referenced by the topic. read more

A Probabilistic Framework for Text to Sketch

Text to Sketch (T2S) is a technology that attempts to produce visual "sketches" from textual accounts. A typical scenario involves a physical description of a scene, perhaps generated by a person lost in an unfamiliar location, which the T2S system uses to generate a sketch that may be registered against a map for accurate geolocation. Our work extends the T2S paradigm by adding a temporal dimension; in this case a sketch is a series of waypoints and routes on a map. There is tremendous uncertainty introduced by the language processing and ambiguity and vagueness in terms used for spatial description; our work provides a probabilistic framework that can facilitate useful sketches in the face of such uncertainty.

Extracting Hypothesis and Results from the Medical Literature

Most common and complex diseases, such as diabetes and cancer, are influenced at some level by variation in the genome. To truly address the goal of translational research, genetic variation must be taken into consideration. Research done in public health genetics, specifically in the area of single nucleotide polymorphisms (SNPs), is the first step to understanding human genetic variation. In addition, novel methods are needed to represent and to conduct text mining over textual genotypic data sources. We developed and evaluated in the context of a genetic study, a translational-informatics method that supports both machine-learning text mining and automated inference for identifying key concepts (e.g., hypotheses and results). While there have been biological text mining systems focusing on named-entity recognition, the development of tools for genetic studies focusing on hypotheses and results has been relatively rare.

Hypothesis: [...] we have examined how voluntary exercise can bypass the genetic predisposition for overeating, obesity, and increased serum insulin and leptin levels associated with type 2 diabetes found in the melanocortin-4 receptor (MC4R) knockout (KO) mouse model (1).

Result: Statistically significant differences in fat mass between the MC4R KO sedentary housed mice vs. the exercising MC4R KO mice emerged after wk 2 of the experiment, correlating with total body weight (r = 0.9, P < 0.001).