Search Knowledge Base by Keyword
How to find the source of URL(s)
If you want to find the source file that contains a URL you can use the Download URL List with Source file.
To access the download source file, log into your account and go to the main “View” screen and click the green “View URLs” button to bring up the master list of URLs for that country.
When the master list of URLs appears click the green “Download URLs list with source files” button at the top. This will give you a list of all the URLs for that market as well as the source(s) we imported them from.
The output of this report will have at least two columns
- URL – This is the URL that is in the database
- Source 1 – This is the first source file that it was found
- Source 2 – This is the second source that it was found
Typical Import Source Example
In the screen capture below, for this market most of the URLs are from their CMS generated XML site maps. There are a few in yellow that came from an append file most likely to add URLs that were missing and/or receiving errors in Google Search Console due to missing alternate pages.
Multiple Source Import Example
In the example below we can see a more complex import setup. You can see some URL’s are from as many as 4 different sources.
- Source Column 1 – The green represents those URLs that were in the CMS-generated XML site map that was imported. Also in source 1 are some light yellow and blue that indicated XML additional XML site maps that were imported that had the URL
- Source Column 2 – These are being imported from the client’s DeepCrawl account via the API.
- Source Column 3 – These URLs were from various lists of URLs that were appended using the append function of Auto Import. It is good practice is to check these source files to remove those that other sources are providing in case they change later.
- Source Column 4 – The red sources represent an error file, based on the name, that was imported by the client team to add the URLs to fix some problems. Note some were also in column 1 indicating that was the only source for those URLs.