Archives

Does SearchIQ crawler automatically reindex the changes on my site?

October 13, 2020

Our delta crawl runs hourly and checks the post date in the sitemap to index the new/updated posts. The ‘lastmod’ field in the sitemap must be later than the last indexed date for delta crawl to work.
But any deleted URL(s) from the sitemap need re-indexing to avoid displaying the deleted pages in search results.

Continue Reading

What is the XPath and how to use it to improve the crawler accuracy?

June 14, 2017

XPath is a query language which provide easy way to select node from an XML/HTML document. You can read more about the language here.

Below we provide some examples:

  • Post title: /html/head/title
  • Short description: /html/head/meta[@name="description"]/@content
  • Post image: //div[@id="container"]/article/img/@src
  • Post content: //div[@id="container"]/article
  • Post author: //div[@id="container"]/span[@id="author"]

If you have different html structure for every post content you can set two or more XPath queries separated by comma.

Continue Reading

How SearchIQ crawler index my site?

June 14, 2017

TBA

Continue Reading

Can SearchIQ index my site if it is not publicly accessible?

June 14, 2017

If your site is WordPress site, you don’t need to be publicly accessible as the site indexing is performed via the WordPress plugin.
Otherwise, your site needs to be indexed by our crawler that requires your site to be publicly accessible.

Continue Reading

I’ve set my robots.txt to disallow indexing of a folder or subpath, but it’s still showing in search results. Why?

June 14, 2017

If you provide us sitemap and the folder/subpath is listed there, we will ignore your robots.txt. To get around without changing your sitemap, you can use our blacklist feature.

Continue Reading