We use DeepCrawl’s API to extract URL’s from your scheduled SEO diagnostic crawls. THE API uses a two step process.
The first step requests DeepCrawl to generate their “Indexable URL Report” for each site(s) that you have set up. This brilliant filtering system removes URL’s from the export that have robots blocks and canonicals that could otherwise give a 200 status. This minimizes pages with errors from being added to HREFLang Builder.
The second step downloads this report into our system and once all of them are downloaded triggers the mapping and XML generation process.
The biggest benefits of using the DeepCrawl API integration is we do not need to validate your URL’s reducing the requests against the site and you get a dynamic source of URL’s that is frequently updated during your diagnostic crawls.
Even if we are using XML site maps or another source for URL’s, you can augment that using data you already have in DeepCrawl. As we have shown it is difficult to get a complete set of URL’s unless we are using multiple sources.
There are a few requirements to using the DeepCrawl API as your primary or incremental source for URL’s.
The integration process is pretty straight forward but as you may expect when integrating 3rd party applications there are some potential issues and the following are provided for your consideration:
If getting a clean and complete source of URL’s has been a challenge for you or you want to augment what you have follow these instructions to Setup Up Deep Crawl API Auto Updates.