By using SIAM Journals Online you agree to abide by the
Terms and Conditions of Use.

©  SIAM

 

SIAM Review

Previous Article
Education
When you think of searching for information on the Web, you probably think of search engines such as Google. But do you think of eigenvectors? You should! Faithful readers of the Education section...
Next Article
Book Reviews
This issue's Book Reviews section begins the new year on a positive note. Reviewers generally liked the broad variety of books under consideration, in contrast to the recent situation when certain bo...

You are logged in to this journal.

A Survey of Eigenvector Methods for Web Information Retrieval

SIAM Rev. Volume 47, Issue 1, pp. 135-161 (2005)

Issue Date: 2005
FULL TEXT OPTIONS   (FREE)
Download PDF (442 kB) View Cart

Web information retrieval is significantly more challenging than traditional well-controlled, small document collection information retrieval. One main difference between traditional information retrieval and Web information retrieval is the Web's hyperlink structure. This structure has been exploited by several of today's leading Web search engines, particularly Google and Teoma. In this survey paper, we focus on Web information retrieval methods that use eigenvector computations, presenting the three popular methods of HITS, PageRank, and SALSA.

©2005 Society for Industrial and Applied Mathematics

KEYWORDS and AMS

Keywords
AMS Subject Classifications
65F15, 65F10, 68P20, 15A18, 15A51, 65C40

PUBLICATION DATA

ISSN:
0036-1445 (print)   1095-7200 (online)
Publisher:
AIP is a member of CrossRef SIAM

REFERENCES (63)

  1. A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, PageRank computation and the structure of the Web: Experiments and algorithms, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
  2. A.-L. Barabasi, Linked: The New Science of Networks, Perseus, New York, 2003.
  3. Michael Berry, Zlatko Drmač, Elizabeth Jessup, Matrices, vector spaces, and information retrieval, SIAM Rev., 41 (1999), 335–362 [MathRev]
  4. M. W. Berry, P. Wang, and Y. Yang, Mining longitudinal Web queries: Trends and patterns, J. Amer. Soc. Inform. Sci. Tech., 54 (2003), pp. 743–758.
  5. K. Bharat and M. R. Henzinger, Improved algorithms for topic distillation in hyperlinked environments, in Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1998, pp. 104–111.
  6. K. Bharat, F. Maghoul, and R. Stata, The term vector database: Fast access to indexing terms for webpages, Computer Networks, 33 (2000), pp. 247–255.
  7. K. Bharat and G. A. Mihaila, When experts agree: Using non-affiliated experts to rank popular topics, ACM Trans. Inform. Systems, 20 (2002), pp. 47–58.
  8. M. Bianchini, M. Gori, and F. Scarselli, PageRank: A circuital analysis, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
  9. M. Bianchini, M. Gori, and F. Scarselli, Inside PageRank, ACM Trans. Internet Tech., 5 (2005), to appear.
  10. N. Blachman, E. Fredricksen, and F. Schneider, How to Do Everything with Google, McGraw-Hill, New York, 2003.
  11. P. Boldi, B. Codenotti, M. Santini, and S. Vigna, Structural properties of the African web, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
  12. A. Borodin, G. O. Roberts, J. S. Rosenthal, and P. Tsaparas, Finding authorities and hubs from link structures on the World Wide Web, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002, pp. 415–429.
  13. S. Brin and L. Page, The anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems, 33 (1998), pp. 107–117.
  14. S. Brin, L. Page, R. Motwami, and T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, Technical Report 1999-0120, Computer Science Department, Stanford University, Stanford, CA, 1999.
  15. A. Broder, R. Kumar, and M. Maghoul, Graph structure in the Web, in Proceedings of the Ninth International World Wide Web Conference, ACM, New York, 2000, pp. 309–320.
  16. S. Chien, C. Dwork, R. Kumar, and D. Sivakumar, Towards exploiting link evolution, in Proceedings of the Workshop on Algorithms and Models for the Web Graph, 2001.
  17. H. Choi and D. B. Szyld, Application of threshold partitioning of sparse matrices to Markov chains, in Proceedings of the IEEE International Computer Performance and Dependability Symposium IPDS'96, 1996, pp. 158–165.
  18. D. Cohn and H. Chang, Learning to probabilistically identify authoritative documents, in Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, 2000, pp. 167–174.
  19. D. Cohn, H. Chang, and A. McCallum, Creating customized authority lists, in Proceedings of the 17th International Conference on Machine Learning, Stanford, CA, 2000.
  20. D. Cohn and T. Hofmann, The missing link: A probabilistic model of document content and hyperlink connectivity, in Adv. in Neural Inform. Process. Systems 13, 2001.
  21. Tuğrul Dayar, William Stewart, Comparison of partitioning techniques for two-level iterative solvers on large, sparse Markov chains, SIAM J. Sci. Comput., 21 (2000), 1691–1705, Iterative methods for solving systems of algebraic equations (Copper Mountain, CO, 1998) [ISI] [MathRev]
  22. C. Ding, X. He, P. Husbands, H. Zha, and H. Simon, Link Analysis: Hubs and Authorities on the World Wide Web, Technical Report 47847, Lawrence Berkeley National Laboratory, Berkeley, CA, 2001.
  23. C. Ding, X. He, H. Zha, and H. Simon, PageRank, HITS and a unified framework for link analysis, in Proceedings of the 25th ACM SIGIR Conference, Tampere, Finland, 2002, pp. 353–354.
  24. S. T. Dumais, Improving the retrieval of information from external sources, Behavior Res. Methods Instruments and Computers, 23 (1991), pp. 229–236.
  25. M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power-law relationships of the internet topology, in Proceedings of SIGCOMM '99, 1999, pp. 251–262.
  26. A. Farahat, T. Lofaro, J. C. Miller, G. Rae, F. Schaefer, and L. A. Ward, Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records, in Proceedings of the ACM SIGIR Conference, 2001, pp. 444–445.
  27. A. Farahat, T. Lofaro, J. C. Miller, G. Rae, and L. A. Ward, Existence and uniqueness of ranking vectors for linear link analysis, SIAM J. Sci. Comput., submitted.
  28. T. H. Haveliwala, Efficient Computation of PageRank, Technical Report 1999-31, Computer Science Department, Stanford University, Stanford, CA, 1999.
  29. T. H. Haveliwala, Topic-sensitive PageRank, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
  30. T. H. Haveliwala and S. D. Kamvar, The Second Eigenvalue of the Google Matrix, Technical Report 2003-20, Stanford University, Stanford, CA, 2003.
  31. M. R. Henzinger, H. Marais, M. Moricz, and C. Silverstein, Analysis of a Very Large AltaVista Query Log, Technical Report 1998-014, Digital Systems Research Center, 1998.
  32. IBM, The Clever Project, IBM Almaden Research Center; available online at http://www.almaden.ibm.com/cs/k53/clever.html (accessed October 11, 2004).
  33. Elizabeth Jessup, Ilse Ipsen, Improving the accuracy of inverse iteration, SIAM J. Sci. Statist. Comput., 13 (1992), 550–572 [MathRev]
  34. S. D. Kamvar, T. H. Haveliwala, and G. H. Golub, Adaptive Methods for the Computation of PageRank, Technical Report 2003-26, Stanford University, Stanford, CA, 2003.
  35. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub, Exploiting the Block Structure of the Web for Computing PageRank, Technical Report 2003-17, Stanford University, Stanford, CA, 2003.
  36. S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub, Extrapolation methods for accelerating PageRank computations, in Proceedings of the Twelfth International World Wide Web Conference, ACM, New York, 2003.
  37. Jon Kleinberg, Authoritative sources in a hyperlinked environment, J. ACM, 46 (1999), 604–632 [MathRev]
  38. A. Langville and C. Meyer, A reordering for the PageRank problem, SIAM J. Sci. Comput., submitted.
  39. A. N. Langville and C. D. Meyer, Updating the Stationary Vector of an Irreducible Markov Chain, Technical Report crsc02-tr33, Mathematics Department, North Carolina State University, Raleigh, NC, 2002.
  40. A. N. Langville and C. D. Meyer, Deeper inside PageRank, Internet Math. J., to appear.
  41. C. P.-C. Lee, G. H. Golub, and S. A. Zenios, A Fast Two-Stage Algorithm for Computing PageRank and Its Extensions, Technical Report SCCM-2003-15, Scientific Computation and Computational Mathematics, Stanford University, Stanford, CA, 2003.
  42. R. Lempel and S. Moran, The stochastic approach for link-structure analysis (SALSA) and the TKC effect, in Proceedings of the Ninth International World Wide Web Conference, ACM, New York, 2000.
  43. M. Marchiori, The quest for correct information of the web: Hyper search engines, in Proceedings of the Sixth International World Wide Web Conference, ACM, New York, 1997.
  44. A. O. Mendelzon and D. Rafiei, Topic: Measuring Webpage Reputation; available online at http:// www.cs.toronto.edu/db/topic/about.html (accessed September 19, 2002).
  45. A. O. Mendelzon and D. Rafiei, An autonomous page ranking method for metasearch engines, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
  46. Carl Meyer, Matrix analysis and applied linear algebra, Society for Industrial and Applied Mathematics (SIAM), 2000xii+718, With 1 CD-ROM (Windows, Macintosh and UNIX) and a solutions manual (iv+171 pp.) [MathRev]
  47. V. Migallon, J. Penades, and D. B. Szyld, Experimental study of parallel iterative solutions of Markov chains with block partitions, in Numerical Solutions of Markov Chains (NSMC'99), B. Plateau, W. J. Stewart, and M. Silva, eds., Prensas Universitarias de Zaragoza, 1999, pp. 96–110.
  48. C. Moler, The world's largest matrix computation, Matlab News and Notes, October 2002, pp. 12–13.
  49. A. Y. Ng, A. X. Zheng, and M. I. Jordan, Link analysis, eigenvectors and stability, in Proceedings of the Seventh International Joint Conference on Artificial Intelligence, 2001.
  50. A. Y. Ng, A. X. Zheng, and M. I. Jordan, Stable algorithms for link analysis, in Proceedings of the 24th Annual International ACM SIGIR Conference, 2001.
  51. Gopal Pandurangan, Prabhakar Raghavan, Eli Upfal, Using PageRank to characterize Web structure, Lecture Notes in Comput. Sci., Vol. 2387, Springer, Berlin, 2002, 330–339 [MathRev]
  52. J. U. Quevedo, A. G. Covarrubias, and S. H. Huang, Improving retrieval by querying and examining prestige, in Proceedings of the Eleventh International World Wide Web Conference, ACM, New York, 2002.
  53. D. Rafiei and A. O. Mendelzon, What is this page known for? Computing webpage reputations, in Proceedings of the Ninth International World Wide Web Conference, ACM, New York, 2000, pp. 823–835.
  54. M. Richardson and P. Domingos, The intelligent surfer: Probabilistic combination of link and content information in PageRank, in Adv. in Neural Inform. Process. Systems 14, 2002, pp. 1441–1448.
  55. C. Ridings, PageRank Explained: Everything You've Always Wanted to Know about PageRank; available online at http://www.rankwrite.com/ (accessed May 22, 2002).
  56. M. Sobek, Google Dance: The Index Update of the Google Search Engine, eFactory: Internet-Agentur KG, http://dance.efactory.de/ (accessed October 11, 2004).
  57. William Stewart, Introduction to the numerical solution of Markov chains, Princeton University Press, 1994xx+539 [ZentralblattMath] [MathRev]
  58. W. J. Stewart and W. Wu, Numerical experiments with iteration and aggregation for Markov chains, ORSA J. Comput., 4 (1992), pp. 336–350.
  59. J. A. Tomlin, A new paradigm for ranking pages on the World Wide Web, in Proceedings of the Twelfth International World Wide Web Conference, ACM, New York, 2003.
  60. M. Totty and M. Mangalindan, As Google becomes Web's gatekeeper, sites fight to get in, Wall Street Journal, February 26, 2003.
  61. A. C. Tsoi, G. Morini, F. Scarselli, and M. Hagenbuchner, Adaptive ranking of webpages, in Proceedings of the Twelfth International World Wide Web Conference, ACM, New York, 2003.
  62. D. Zhang and Y. Dong, An efficient algorithm to rank web resources, Computer Networks, 33 (2000), pp. 449–455.
  63. X. Zhang, M. W. Berry, and P. Raghavan, Level search schemes for information filtering and retrieval, Inform. Process. Management, 37 (2001), pp. 313–334. [Inspec]