Google’s Gary Illyes said on the last Search Off The Record Podcast that Google in 2022 is looking to make crawling more efficient and environmentally friendly. And while Google is investigating ways to do that with IndexNow, Gary said it wont be done in a way “that people expect” with that.
Here is the podcast embed, this starts at about 2:30 into the podcast:
He said one way of doing this is for Google to look at the refresh crawls and see about a way of crawling some pages and URLs less often for those refresh crawls. Discovery calls are for new URLs that Google has yet to index but refresh crawls are recrawls of URLs Google has already crawled before but to see if the page has been updated and has new signals. So Google might crawl old URLs less often or maybe more efficiently.
Gary said “how can we reduce even more Googlebots and other crawlers, Google crawlers’ footprint on the Internet, on the environment. And then, if you think about it, one thing that we do and we might not need to do that much is refresh crawls. Which means that once we discovered a document, a URL, then we go, we crawl it, and then, eventually, we are going to go back and revisit that URL. That is a refresh crawl. And then every single time we go back to that one URL, that will always be a refresh crawl. Now, how often do we need to go back to that URL?”
Gary added, “you could say that, for example, if you take the CNN or Wall Street Journal homepage, which is changing every five seconds, then we do need to go back very often. But then the About page of either of these news outlets, they don’t change too often. So you don’t have to go back there that much. And often, we can’t estimate this well, and we definitely have room for improvement there on refresh crawls, because sometimes, it just seems wasteful that we are hitting the same URL over and over again. Sometimes we are hitting 404 pages, for example, for no good reason or no apparent reason. And all these things are basically stuff that we could improve on and then reduce our footprint even more.”
Gary then commented on IndexNow saying they are experimenting with it but if Google does do something with it, it won’t be in the format we are thinking about. He said “but IndexNow could be something that might be useful, and we are running some experiments to see if that’s the case. Probably, it’s not going to be in the form that people expect, but we’ll see. But I can definitely see that it might prove useful in some cases at least.”
They spoke about XML sitemaps and the issues with it for improving discover and crawling. Basically people change the last mod date in the XML sitemap when the URL has not changed. So Gary said “we are just not going to use it” because people don’t generate these sitemaps correctly and accurately.
Crawling and Ranking
And there is a misconception that crawling more leads to more and higher rankings, that is not true Gary and John said. John said “So I guess that’s kind of also a misconception that people have in that they think if a page gets crawled more, it’ll get ranked more. Is that correct that that’s a misconception, or is that actually true?” Gary said “it’s a misconception.”
Google might be more transparent about how crawling works in 2022, so be ready for that.
Forum discussion at Twitter.