Technology & Science

Becoming popular on web is hard, staying popular harder: study

Only slightly more than half of the most popular web pages right now would have been on a similar list a year ago, according to a Canadian researcher, a finding that underlies the constantly changing structure of the World Wide Web.

On the surface, the World Wide Web appears to be a static structure, with establishment web sites like Google, Yahoo and Wikipedia consistently among the most visited. But take a closer look at individual web pages instead of sites and measure popularity by the number of other pages that link to it, and a different picture appears; as in high school or high-fashion, popularity on the web is a fickle thing, according to a Canadian researcher.

Only slightly more than half of the most popular web pages right now would have been on a similar list a year ago, according to University of Regina assistant professor Nima Sarshar, a finding that underlies the constantly changing structure of the web.

Sarshar, working with two colleagues from the University of California, Los Angeles, looked at how talent, experience and the deletion of web pages affected the overall popularity of a page, where a page's popularity is measured not in page views, but by the number of links it attracts.

Publishing their findings in the recent issue of the Proceedings of the National Academy of Sciences of the U.S.A., Sarshar and lead author Vwani Roychowdhury and Joseph Kong from UCLA expected to find that experience plays an important role, since the longer a web page has been around the more links it will accumulate.

And in a case of the rich getting richer, the more links a page accumulates, the more likely it will pop up in a search engine like Google.

But what role "talent" played was less clear, so to measure it they tracked which pages had accumulated 1,000 links during a 12-month period.

A web page that started the survey of 22 million web pages with more than 1,000 links was defined as an "experienced" page, while one that finished the 12-month survey with more than 1,000 links was defined as a "winner."

What they found was that even with search engines more likely to send readers to established pages, 48 per cent of the winners at the end of the 12 months were not experienced.

The result is surprising, they say, since it is extremely rare for a less-established web page to receive 1,000 links.

Constant growth

Part of the reason for this, they say, is that the web is constantly growing, with the number of new pages growing at a rate that could be as high as 35 per cent annually. At the same time, they say, pages are constantly being deleted at a rate of about 10 per cent per month. And every time a page is deleted, the pages they pointed to die a little in popularity, too.

The net result is that  for every new page created, 0.77 are deleted.

"Web page structure evolves in a matter of weeks," said Sarshar. "It really tells us that within a year or two, the web is a totally new web."

Translating these numbers to the actual number of web pages is a difficult task, since no one can say for certain how many pages exist on the web at any given moment.

Sarshar, Roychowdhury and Kong put the total number at over 12 billion, but acknowledge that is just a best estimate.

In July, Google software engineers, posting on the company blog, said their systems that process links found one trillion unique URLs on the web at once. But they admit addresses are not equivalent to web pages, and suggest that strictly speaking, the number of web pages is infinite, since a web calendar could have a "next day" link that could send the user to a new "page" theoretically forever.

Regardless of which estimate one looks at, the 22 million pages the authors looked at represents a tiny fraction of the total size of the web, Sarshar acknowledges. He said a longer-term study that casts a wider net would give greater insight into the changing shape of the web. But he and his colleagues see parallels between the way web pages connect to each other and existing societal models.

"The balancing act between experience and talent on the web allows newly introduced pages with novel and interesting content to grow quickly and surpass older pages," the authors write.

"In this regard, it is much like what we observe in high-mobility and meritocratic societies: People with entitlement continue to have access to the best resources, but there is just enough screening for fitness that allows for talented winners to emerge and join the ranks of the leaders."