An algebraic method for calculating PageRank
DOI:
https://doi.org/10.15587/1729-4061.2018.131275Keywords:
site graph, page ranks, transition matrix, damping coefficient, teleportation matrixAbstract
An algebraic method is proposed for finding PageRank estimates for pages of websites. The amount of calculation in the proposed method does not depend on the value of the damping coefficient, which allows obtaining more accurate estimates of the rankings of PageRank in comparison with analogues. A distinctive feature of the proposed method is a step-by-step performance of calculations simultaneously with the work of the graph traversal algorithm. The comparative analysis of algorithms for traversing graphs has shown that, in contrast to the depth search algorithm, the breadth search algorithm gives a more orderly arranged matrix of transitions, which has the blockwise Hessenberg form. The use of this circumstance makes it possible to reduce significantly the amount of calculations by the proposed method. The resulting equations describing the proposed method have a block structure that allows efficient distribution of the entire volume of operations to parallel computational threads . Proceeding from the fact that the bulk of the calculations can be performed while the graph traversal algorithm is running, the study has determined the conditions under which the proposed method makes it possible to obtain PageRank values faster than the known iterative algorithms. The applicability area of the developed method is, first of all, its use in direct verification of the reliability of posting advertising materials on a relevant web resource; therefore, it is limited to specific Internet sites or segments with no more than 104 or 105 pages.
References
- Marciel, M., Cuevas, R., Banchs, A., González, R., Traverso, S., Ahmed, M., Azcorra, A. (2016). Understanding the Detection of View Fraud in Video Content Portals. Proceedings of the 25th International Conference on World Wide Web – WWW ’16. doi: 10.1145/2872427.2882980
- Alternative Ad Networks to Open Up New Channels of Growth in 2018. Available at: https://www.singlegrain.com/blog-posts/pay-per-click/44-ad-networks-will-help-open-new-channels-growth/
- Agency Trading Desks. Available at: https://adexchanger.com/Agency_Trading_White_Paper.pdf
- What’s the Difference Between Using a DSP and an Agency Trading Desk? Available at: http://weevermedia.com/uncategorised/whats-difference-using-dsp-agency-trading-desk/
- Thompson, J. What Is a DSP, SSP and Ad exchange, and how do they fit together? Available at: https://www.bannerconnect.net/what-is-a-dsp-ssp-and-ad-exchange/
- Scott, S. The $8.2 Billion Adtech Fraud Problem That Everyone Is Ignoring. TechCrunch. Available at: https://techcrunch.com/2016/01/06/the-8-2-billion-adtech-fraud-problem-that-everyone-is-ignoring/
- Reichow, J. (2017). TUNE Fraud Series: Types of advertising fraud. TUNE. Available at: https://www.tune.com/blog/types-of-advertising-fraud/
- Ricci, M. (2017). Why it’s Time to Take a Stand Against Ad Fraud. MartechAdvisor. Available at: https://www.martechadvisor.com/articles/ads/why-its-time-to-take-a-stand-against-ad-fraud/
- Leskovec, J., Rajaraman, A., Ullman, J. D. (2014). Mining of Massive Datasets. Cambridge University Press. doi: 10.1017/cbo9781139924801
- Cormen, T. H., Leiserson, C. E., Rivest, R. L., Stein, C. (2009). Introduction to Algorithms. MIT press and McGraw-Hill, 1313.
- Brin, S., Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30 (1-7), 107–117. doi: 10.1016/s0169-7552(98)00110-x
- Polyak, B. T., Tremba, A. A. (2012). Regularization-based solution of the PageRank problem for large matrices. Automation and Remote Control, 73 (11), 1877–1894. doi: 10.1134/s0005117912110094
- Del Corso, G. M., Gullí, A., Romani, F. (2005). Fast PageRank Computation via a Sparse Linear System. Internet Mathematics, 2 (3), 251–273. doi: 10.1080/15427951.2005.10129108
- PageRank. Wikipedia. Available at: https://en.wikipedia.org/wiki/PageRank
- Harris, L., Papanikolaou, I., Shi, J., Strott, D., Tan, Y., Zhong, C., Changhui, T. (2015). Eigensystems for Large Matrices. AMSC460.
- Sargolzaei, P., Soleymani, F. (2010). PageRank Problem, Survey And Future Research Directions. International Mathematical Forum, 5 (19), 937–956.
- Wu, G., Wei, Y. (2007). A Power–Arnoldi algorithm for computing PageRank. Numerical Linear Algebra with Applications, 14 (7), 521–546. doi: 10.1002/nla.531
- Kamvar, S. D., Haveliwala, T. H., Golub, G. H. (2003). Extrapolation methods for accelerating PageRank computations. Technique Report SCCM 03-02.2003. Stanford University.
- Wu, G., Wei, Y. (2010). An Arnoldi-Extrapolation algorithm for computing PageRank. Journal of Computational and Applied Mathematics, 234 (11), 3196–3212. doi: 10.1016/j.cam.2010.02.009
- Kamvar, S., Haveliwala, T., Golub, G. (2004). Adaptive methods for the computation of PageRank. Linear Algebra and Its Applications, 386, 51–65. doi: 10.1016/j.laa.2003.12.008
- Yin, J.-F., Yin, G.-J., Ng, M. (2011). On adaptively accelerated Arnoldi method for computing PageRank. Numerical Linear Algebra with Applications, 19 (1), 73–85. doi: 10.1002/nla.789
- Kim, M.-S. (2012). Towards Exploiting GPUs for Fast PageRank Computation of Large-Scale Networks. Proceeding of the third International Conference on Emerging Databases.
- Rungsawang, A., Manaskasemsak, B. (2012). Fast PageRank Computation on a GPU Cluster. 2012 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. doi: 10.1109/pdp.2012.78
- Liu, C., Li, Y. (2015). A Parallel PageRank Algorithm with Power Iteration Acceleration. International Journal of Grid and Distributed Computing, 8 (2), 273–284. doi: 10.14257/ijgdc.2015.8.2.24
- Srivastava, A. K., Srivastava, M., Garg, R., Mishra, P. K. (2017). Parallel PageRank Algorithms: A Survey. International Journal on Recent and Innovation Trends in Computing and Communication, 5 (5), 470–473.
- Choudhari, P., Baikampadi, E., Patil, P., Gadekar, S. (2015). Parallel and Improved PageRank Algorithm for GPU-CPU Collaborative Environment. International Journal of Computer Science and Information Technologies, 6 (3), 2003–2005.
- Yang, P., Zhou, L. (2016). Research on PageRank Algorithm parallel computing Based on Hadoop. Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering. doi: 10.2991/mmme-16.2016.40
- Gleich, D., Zhukov, L., Berkhin, P. Fast Parallel PageRank: A Linear System Approach. Available at: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.592.64&rep=rep1&type=pdf
- Gantmakher, F. (2004). Theory of matrices. Moscow: Nauka, 581.
- Wright, G. Probability, linear algebra, and numerical analysis: the mathematics behind Google'sTM PageRankTM. Available at: http://math.boisestate.edu/~wright/courses/m297/google_talk.pdf
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Vladislav Vlasyuk, Oleg Galchonkov, Alexander Nevrev
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.