Before learning how to find an Orphan Page, you should first learn what an Orphan page on the website is?
An orphan page is a web page that doesn’t contain any links to it.
In this blog post, you will learn how to find the Orphan Pages and why fixing them is very important for Search Engine Optimization (SEO).
Finding the web pages that have no links is the fatiguing thing to do, but not impossible. If there are web pages on your website that the users and even the search engines like Google, Bing, and others can’t reach, then this is a problem that you need to fix.
These types of pages are called “Orphan pages”.
Now you have an idea what the orphan pages are, now we are going to be discussing why fixing them is important for the Search Engine Optimization (SEO), and how to find every single orphan page on your website.
There are mainly two types, search engines, like Google, usually find the newly created web pages.
The first one is: The crawler of the search engine follows a link from another webpage.
The second one is: the crawler of the search engine finds a URL listed in your XML sitemap.
Now the thing is if you want Google and other search engines to crawl and index all pages of your website, then they need to be able to find all of them.
Now a question probably raises in your minds that why the orphan pages are an issue for SEO?
The answer is that the search engines cannot be able to find the orphan pages through the proper links, so the pages with no links go unindexed and never show up in the search results of any search engine.
Even if the orphan pages are listed in the XML sitemap of your website, they still are a problem for the Search Engine Optimization (SEO).
What are the disadvantages of Orphan Pages?
- The Orphan pages are not helpful for either website’s users or the crawler of the search engines.
- The visitors of your website can’t be able to reach those pages through the natural structure of your website, so if there is any vital information available on those pages, it’s wasted.
- These orphan pages can create a frustrating user experience.
- Without the internal linking, the search engines will pass no authority to the pages, and the search engines have no semantic or structural context in which to test the page.
- With no way of knowing where the page fits into your website, it can be more difficult for the search engines to determine which queries the page is relevant to.
How to find and fix them
1–Identify Your Crawlable Pages
To find the orphan pages, you will need a list of all the URLs that can be reached by crawling links of your website.
There are several crawlers that you can use to find the pages like an SEO spider or ScreamingFrog.
Whatever crawler you used to find your website’s link, just make sure that it is set to crawl only pages that are indexable by the search engines.
The meaning of the above line is that the crawler should not crawl the webpages that are ‘Noindexed’ and ‘Hidden from the search engines by robots.txt’.
Start crawling from the homepage of the website.
Before crawling, make sure to use the canonical URL, including the proper HTTP or HTTPS, and www and without www.
Once the crawler finished the job and has crawled your website, export the URLs to a spreadsheet.
2–Resolve Two Common Cause of Orphan Pages
There are two major causes of the Orphan pages on your website that should be immediately addressed.
Both these causes are basically page duplicates that should automatically redirect consistently to only one URL.
If they don’t automatically redirect, it is likely that some versions of the pages are not linked to and as a result are orphans.
In this scenario, the fact that these pages are orphan pages isn’t the primary issue, the fact that they are the duplicates is.
These pages may come up later while you are looking for the orphan pages, and need to be dealt with, so it is a good approach to get them out of the way beforehand.
Non-Canonical https/http or www/non-www
Every single page on your website should ideally use http or https consistently (preferred https), and www or non-www constantly.
In order to check if this is the case, try typing all of these variations of your website’s homepage into the browser.
All four variations of the website’s homepage should redirect automatically to the exact same URL.
If any of these variations don’t redirect properly, it can be a sign of a similar problem on your wider site.
In that case, you must try checking other URLs, using that specific variation, to see if it’s a more widespread issue.
You must have to try a few other pages of your website and check the .htaccess file of your website to make sure that the redirects of these are set up properly.
3–Get a list of the URLs from the Google Analytics
One of the best places to start finding out the orphan pages is the data of your Google Analytics.
As long as the pages of the website have the Google Analytics installed, if the page has ever been visited, there is a record of that specific page somewhere in the Google Analytics.
In order to get the list of the URLs of your website, from the left sidebar of the Google Analytics website, navigate to Behavior > Site Content > All Pages.
As we know that the orphan pages are difficult to find, the number of times these pages have been visited is likely to be very low.
On the ‘All Pages’ webpage of the Google Analytics, click “Pageviews” so that the arrow is pointing upwards, showing that the list contains the data of the URLs is sorted in ascending order from least to most pageviews.
This step will move the pages that are most likely to be Orphan pages to the top of the list. You must have to set the starting date back to a time before the Google Analytics was installed and click the Apply button, after selecting the date correctly.
In the bottom right of the page, click the Show rows drop-down menu and select the highest number of rows like 5,000 URLs at a time.
Google Analytics will take a little time to fetch all the data. During this process, be patient and don’t try to rush things, or you will risk crashing your browser.
Once the data of the URLs is loaded properly, navigate to the top right corner, select export option, and export the data in a form of Google Sheet, Excel File, or CSV spreadsheet.
Next step is to copy the URLs from the exported analytics file into your orphan page spreadsheet.
In order for the data to be useful, you will have to these into URL format.
Now insert a new column and paste down the homepage URL. To combine these together into a URL in the next column over, use the concat() formula in the Microsoft’s Excel sheet.
4–Identify Your Orphan Pages/URLs
In order to identify the orphan URLs, you will have to compare the list of the Crawlable URLs that you have earlier crawled and the list of the found Analytics URLs in the MS Office spreadsheet.
By comparing both lists, you can easily identify the orphan pages and then you can copy the links and paste them to a new spreadsheet where you can easily fix them.
5- How to Fix Them
The final step you will have to perform is to link these pages with the other pages of your website, so that the issue can be resolved that helps you make your SEO more effective.