Discovering and Organizing Hidden-Web Sources

Juliana Freire (PI)
University of Utah

This project was completed in 2012.

Project Description

The problem of retrieving and integrating information available in online databases and services has received a lot of attention in both the research and industrial communities. This interest is driven both by the quality of the information and the growing number of online databases---it is estimated that there are several million online databases. Several specialized applications attempt to make the hidden information more easily accessible, including metasearchers, hidden-Web crawlers, database directories, and Web information integration systems. Our goal in this project is to build an infrastructure that automates, to a large extent, the process of discovering and organizing hidden-Web sources. This infrastructure will enable people and applications to more easily find the \emph{right} databases and consequently, the hidden information they are seeking on the Web. Another important goal of this project is to enable queries over structured data. Towards that end, we have been working on novel algorithms for Web-scale information integration.

Software and Sites

We have developed a hidden-Web crawler and a focused crawler. These tools are available upon request. We have also created a hidden-Web search engine (http://www.deeppeep.org). This site was hosted at the University of Utah, and when Professor Freire left, the site was taken offline.

Publications

Funding

This project was funded by NSF Award IIS-0713637.
Juliana Freire
Last modified: Wed Nov 21 21:45:40 EST 2012