Importance ordering in linked networks with link weights via reachability
In multiply linked networks (like the world wide web), a common task is to give a measure of importance to the entries, and to compare the relative importance of the elements of a given subset of the network. The famous Google algorithm (in its simplest form) solves this problem by letting a 'random walker' perform a journey on the network, where each step follows a random link from the place where the walker currently is. If the links of the network have weights, these can also be used to determine the probabilities of the next step. A site x is then considered more important than a site y if the walker visits x more often than y in the limit where the journey has very many steps.
Based on the publication [Betz/Le Roux] (see below), a new method for importance ordering can be derived. If comparing the importance of two sites x and y, instead of measuring the frequency of visits to x and y, the idea is to measure the reachability of x from y, i.e. the probability that a random walker starting from x will visit y before it returns to x. If the reachability of y from x is higher than the reverse reachability (x from y), then y is considered more important. It is shown in the cited reference that in the limit of long run times, a quantitative version of the above idea gives the same importance ordering as the Google algorithm. However, while the Google algorithm necessarily computes the importance of all elements of the network at the same time, the reachability algorithm only computes the relative importance of a pair of elements. This also means that when we are only interested in that pair (or a relatively small subset of the network), the algorithm can run much faster and offer the user the flexibility to implement their own link weights. This opens the way to possible applications, and the corresponding set of ideas is the subject of U.S. Patent Appl. No. 15/641,536.
The aim of the project is to test the practical applicability of the method in various situations. Practical problems include the acceleration of the escape of the walk from undesirable regions of the network, applications to realistic systems, parallelization, testing real world scaling properties of the algorithm with the system size, and quality control. The student will conduct numerical experiments, do systematic measurements of run time and result quality, and explore theoretical and practical ideas for improving the algorithm.
Pre-requisites or requirements for the project
good programming skill (preferably C),
good grasp of mathematics or theoretical cumputer science
Recommended literature and preparation
Betz/Le Roux in: Stochastic Processes and Their Applications, 3499-3526 (2017);
Langville/Meyer: Googles Page rank and Beyond, Princeton University Press (2006).
|Project available until||end of Dec 2020|