Meta robots, canonical and robots.txt – This is why they are

Meta robots, canonical and robots.txt – This is why they are Mar 11, 2024 7:41:56 GMT

Quote

Post by amirmukaddas on Mar 11, 2024 7:41:56 GMT

I would like to say something important right away: if a page doesn't impact searches you can still keep it on the site. However, if the pages without relevance are greater in number (and perhaps much) than those you aim to make visible on Google, you can treat them with a noindex as a meta robots setting or with a link tag with canonical attribute, but these measures they will not reduce the consumption of crawling resources, they will only serve to avoid showing those pages to users . You can block the crawling of entire areas of the site using the robots.txt file or alternatively by setting an x-robots tag on the htaccess file, but once again you will not have solved the problem in the best way, because if the resources to be blocked are the Most crawlable paths and these paths are distributed throughout the site, telling Google that they should be blocked is an anomaly.

To be clear, the robots.txt file is not designed to make the bot understand that on each page it must make various distinctions between what to follow and what not. This problem is solved by first avoiding keeping inappropriate paths Denmark Telegram Number Data on the page or alternatively by serving them as dynamic loads rather than explicit URLs. We'll delve deeper into this aspect in a bit. So if you have this problem, forget about fixing it. Areas to be cut on the company website A case I discussed a few days ago involves a company that offers products. There will be around a hundred contents to be made visible, but on the same website there is an area dedicated to press reviews with thousands of articles (copied from elsewhere) dating back to 2009.

These articles have no relevance to the research, they concern recovered resources elsewhere and are in extremely higher quantities compared to the pages to be positioned. In this case the thing to do would be to take an ax and remove everything , but in case this is impossible due to political decisions taken, rather than putting everything in noindex or blocking the crawl with robots.txt, a solution that seemed ideal to me is to move all this gigantic mass of pages to a subdomain dedicated to press reviews, as independent as possible from the main domain with respect to internal links. In this way Google spiders will only be able to follow the relevant pages on the main site and at the same time we will have avoided removing the historical archive area. You can apply this solution to areas of your site that are not relevant to the business model, especially if they have a large number of pages.