Accuracy and Sampling Trade-offs for Inferring Internet Router Graph

Accuracy and Sampling Trade-offs for Inferring Internet Router Graph Çigdem Gündüz Bülent Yener There has been an increasing interest on construction of router-level Internet graphs, using traceroute like measurement primitives. Furthermore, many metrics reflecting the properties of the Internet graph have been defined based on these measurements. However, many important questions remain: How much measurements (sampling) one needs to conduct over the Internet to get a good estimation of its properties? How accurately can these properties be computed? What is the impact of sampling techniques on the computation of these properties? This paper shows that most of the metrics are evasive: their exact values cannot be determined without visiting all links in the Internet graph. This indicates a fundamental difficulty: due to its size and dynamics, a complete Internet graph cannot be obtained, thus the metrics are computed under incomplete information. The contribution of this paper is three-fold: First it provides a meta-metric called ([gamma], [sigma])-evasiveness to determine if a metric can be estimated with at least 1-[sigma] accuracy by sampling percentage of data. Second, it demonstrates an important relationship between the kurtosis of measured data and the [sigma] value of sampling. Third, it provides a novel technique to compare different metrics and data collection methods by using the kurtosis of data and their [sigma] values. Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY cs-03-09

Accuracy and Sampling Trade-offs for Inferring Internet Router Graph

Çigdem Gündüz

Bülent Yener

There has been an increasing interest on construction of router-level Internet graphs, using traceroute like measurement primitives. Furthermore, many metrics reflecting the properties of the Internet graph have been defined based on these measurements. However, many important questions remain: How much measurements (sampling) one needs to conduct over the Internet to get a good estimation of its properties? How accurately can these properties be computed? What is the impact of sampling techniques on the computation of these properties? This paper shows that most of the metrics are evasive: their exact values cannot be determined without visiting all links in the Internet graph. This indicates a fundamental difficulty: due to its size and dynamics, a complete Internet graph cannot be obtained, thus the metrics are computed under incomplete information. The contribution of this paper is three-fold: First it provides a meta-metric called ([gamma], [sigma])-evasiveness to determine if a metric can be estimated with at least 1-[sigma] accuracy by sampling percentage of data. Second, it demonstrates an important relationship between the kurtosis of measured data and the [sigma] value of sampling. Third, it provides a novel technique to compare different metrics and data collection methods by using the kurtosis of data and their [sigma] values.

Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY

cs-03-09