Basic Architecture
From Yfittu
- Purpose
- This document will provide a high level view of the structure of the software. It will highlight the basic components and their arrangement. It is not intended as an exhaustive design.
- Audience
- This is intended for all developers interested in what lies behind the scenes or anyone that is just curious. Not intended for users.
Contents |
Introduction
The basic premis of the software is that many decentralised clients index and share pages and queries.
Structure
The client will be split into 4 core modules:
- Indexer
- Search
- GUI
- Storage
These will talk directly to each other giving the basis to the client.
Core Modules
Searcher
This module will do the business of actually finding the pages that the user will like. It will parse them to find the interesting terms and then pass the to the Indexer.
The searching will be acheived in 3 ways
- Using a virtual proxy : all browser's traffic would pass through the virtual proxy so we could be able to index pages when the user reads them.
- Using the browser's cache (we should prefer the first method when possible) (Pascal: just use this feature on initial installation??)
- On instruction from the Indexer, it will retrieve and parse specific pages, following links that it finds to be passed back to the Indexer. It is the yfittu indexing robot. Following links from already indexed pages like all indexing robots (that consumes bandwidth so it should be used when the connection is not fully in use, this is an option that should be modifiable) This is perhaps a little similar to how Googlebot and most of other indexing robots do things.
Indexer
The basic idea of the Indexer is to store, search for, and retrieve addresses of pages that a user may find interesting. The specific duties can be split up then.
- Storage
- Taking address of new pages and terms linked to them and storing them in an organised way.
- Getting addresses and terms from peers and storing them. Caching previous results propagated by the P2P network (a result doesn't contain as much information as a fully indexed page, but it is useful to index pages corresponding to frequent requests)
- Seaching
- Taking a search query and applying it to currently stored data.
- Passing query to peers. (Which will be added to the currently indexed data.)
- Get Searcher to 'investigate' new pages.
GUI
This will be how the user interacts with the software. There will be 2 ways to interact with the system:
- Searching for results
- Setting preferences/administering the software
Interfaces
In addition, there will be separate interface modules that will link the modules to the OS specific parts of the software. So the Indexer and Search modules will require P2P and network interfaces, the storage will require a disk interface and the GUI will require an interface too. The aim is to keep the core modules identical across all platforms and keeping the OS specific code to the interfaces.
Topics to be discussed
In order to define exactly how the yfittu project will work and if we want it to be efficient, there is some topics that we should discuss deeply. Here is a short list of some of them:
- Temporal priorities: search requests should be propagated quickly through the network and peers should give answers as soon as possible, whereas time to index pages can be increased if the goal is to have a better efficiency in the two first tasks
- Indexation job distribution (all computers that are part of the network shouldn't index the same part of the web,and best results of frequent requests should be indexed by more peers than bad ranking pages),
- Peer to peer network organisation (in a way accurate to temporal priorities and indexation distribution)
- Data propagated through the peer to peer network: : mainly search requests and results. Not the content indexed pages.
- Computer ressource monitoring (cpu and bandwidth use for page indexation should not be a drawback for the yfittu user during his activities)
- Criteria for page ranking
- Search expressions
- Protections against "spamdexing", misuse of the network by a peer, intrusive behaviors...

