Results Ranking

From Yfittu

Audience 
This is intended for all developers interested in what lies behind the scenes or anyone that is just curious. Not intended for users.
WARNING
This page is only a draft now. It gives some concept but a lot of rewording and reorganisation is necessary. (Of course, this paragraph should be removed when this page would be clear and well organised)
Object of this page
describe how pages are ranked to sort results by relevance. The object of this page is not the ranking for indexation designed to provide a good distribution of the indexing job and a more important place for more important pages/sites.


General site and page relevance scoring


Note: this scoring is also taken into consideration for calculating the score for indexing pages or not (see the indexing robot).

Criteria:

  • document analysis (antispamdexing, antiphishing, global lexical, spelling evaluation) : that concerns what we could assume just by analysing the page, without taking other pages into consideration
  • number of links FROM other sites that points here and where they come from (that should not in any case penalize: we can't accept that a site could have a negative impact on another by using spamdexing technics)
  • number of internal links (to detect the most important pages on a site)
  • links TO other sites (here links to some sites may implies penalties: a site which attempts to use spamdexing technics should be penalized)
  • date of last update of the page known by the indexing robot
  • and a lot of others ...

The evaluation of each criterion (by a single peer) is made at indexing time, not at search time, so a good analysis is preferred, even if it takes some CPU time.

Relevance according to match with search terms


Criteria:

  • best distance between the words
  • number of times they occurred
  • importance of the searched text in the page (in absolute and as a rate, depends on both bold, title attributes, position: beginning / end or middle ...)
  • title of links that points to this page
  • text in the url

Evaluation on a search request but all what can be done at indexation time in order to speed up this evaluation should be done.