Search Knowledge Base by Keyword

Table of Contents
< Back
You are here:
Print

Lumar (DeepCrawl) API Considerations

Overview

We use Lumar’s new GraphQL API to extract URLs from your scheduled SEO diagnostic crawls. The API uses a two-step process.

Lumar’s GraphQL API to import 200 Indexable URLs.   They were sunsetting the previous API that allowed us to import the specific 200 Indexable report.  We can still import the URLs but have to do more filtering during the import. 

The first step requests DeepCrawl to generate their “Indexable URL Report” for each site(s) that you have set up. This brilliant filtering system removes URL’s from the export that have robots blocks and canonicals that could otherwise give a 200 status. This minimizes pages with errors from being added to HREFLang Builder.

The second step downloads this report into our system and once all of them are downloaded triggers the mapping and XML generation process.

The biggest benefits of using the DeepCrawl API integration is we do not need to validate your URL, reducing the requests against the site, and you get a dynamic source of URLs that is frequently updated during your diagnostic crawls.

Even if we use XML site maps or another source for URLs, you can augment that using data you already have in DeepCrawl. As we have shown, it is difficult to get a complete set of URLs unless we are using multiple sources.

Requirements

There are a few requirements to using the DeepCrawl API as your primary or incremental source for urls.

  1. You must have a current Lumar account, this is not included in our costs.
  2. You must have scheduled crawls set up to crawl the site(s)
  3. You should only archive your crawls after an update to ensure there is always one current crawl. If your archive crawls too quickly, it may take a second request for the archive, then opening the archive and processing the data. which will extend the time of your build.    
  4. You must give us access to the account via the API key (We do not need login access only your API information)

Considerations

The integration process is pretty straight but as you may expect, when integrating 3rd party applications, there are some potential issues and the following are provided for your consideration:

  1. Your DeepCrawl account limits and budget – if you plan to use the crawl results as your primary source of URL’s for HREFLang Builder, you need to ensure you have sufficient credits for your DeepCrawl account to allow for the full crawl of the site(s) at the update intervals you want. We have had a couple of clients that had caps set on the number of URLs that were less than they had on the site.
  2. Our system requests the most recent crawl – If you run out of credits or stop the crawl, we may not get a current list of urls. We plan to add in an alert to the dashboard to match the dates to display any sites that have reports that are not current.
  3. We do not prompt crawls of your site via the API – Especially if you are using DeepCrawl for your URL source, you should be using their scheduler functionality to set up crawls at appropriate intervals. We suggest starting with a weekly crawl
  4. Set Crawl Restrictions in DeepCrawl – we take what is presented to us without exception so If you want/need any crawl restrictions you can set them in Phase 2 of our DeepCrawl project setup workflow. .
  5. Lumar Error Management – with any crawler or dynamic tools, errors can happen, impacting your Deep Crawl results. They have an excellent help guide on how to fix website crawl errors for any additional questions please consult your Lumar Customer support representative.
  6. Indexable URLReport creation and exporting – the time it takes to generate each report depending on the number of URL’s in the crawl and the number of indexable URL’s. If your crawl has completed, when we request the report it is typically generated in a few minutes. If the call back fails, we will try again in 1 hour, then 12 hours and 24 hours later. If fails after this we will rebuild the report with the most recent source we have and alert you.
  7. Generating Updates in HREFLang Builder – your final consideration is when to generate updates. If you are doing your crawls on the weekend then you should set HREFLang Builder to update weekly on Monday or Tuesday.

If getting a clean and complete source of URLs has been a challenge for you or you want to augment what you have, follow these instructions to set up Lumar API Auto Updates.

Table of Contents