Characterizing Web linking and usage with hierarchical models
by Wenwu Lou
Ph.D. Computer Science
x, 151 leaves : ill. ; 30 cm
After a decade of rapid growth the World Wide Web, or the Web for short, has become a center marketplace for information exchange for the mankind. The rise of the Web encouraged a surge of research activities that seek to measure and understand the structural properties, evolutionary dynamics and navigational patterns of the Web....[ Read more ]
After a decade of rapid growth the World Wide Web, or the Web for short, has become a center marketplace for information exchange for the mankind. The rise of the Web encouraged a surge of research activities that seek to measure and understand the structural properties, evolutionary dynamics and navigational patterns of the Web.
The primary goal of this work is to advance the existing studies of the Web using a graph model that incorporates the inherent hierarchical organization of the Web, a characteristic which has been often overlooked in previous studies. We start by proposing a colored graph model for the Web, which captures both the explicit hy-perlink structure and the latent hierarchical organization of the Web at the same time. We then present empirical findings that give evidence to the influence of the latent structure on the formation of the explicit hyperlink structure of the Web. We further provide theoretical explanations to these findings, using a new class of random graph models in which the evolution of the Web is related to the latent structures intrinsic to the Web itself. Finally, in the context of Web proxy mining, we show that the latent hierarchical structure of the Web also imposes regularities in Web users' navigational behavior, e.g., locality in Web reference, and creates new opportunities for improving effectiveness and scalability of Web usage-mining algorithms and applications.
The Web represents just one example of a wide variety of systems in the real world that have latent classes or hierarchical structures embedded in their web-like existence in nature. In this large context, this research establishes a clean framework for incorporating latent structures in measuring, understanding, and therefore simu-lating the structural properties and evolutionary dynamics of these systems, such as the Internet, citation networks and social networks.