Previously, self_html_function was a function taking all parameters as arguments. As new optionnal parameters are beeing added, the function had too much arguments and each usage of the function would have to be modified each time an argument will be added. Therefore, it have been moved to a configuration structure with a `run` function taking only one argument, the html string.
2.1 KiB
2.1 KiB
title |
---|
Scope of the project |
This project mainly aims at providing an unified interface for several newspapers. Side objectives are to provide web API and different clients like a webUI or chatbots.
Several big components are planned for this project (it is an initial draft and may change later) :
@startuml
frame "backend" {
[Retrieval tools] as retrieval_tools
[Article representation] as article_repr
[Automatic retrieval] as auto_retrieve
[Atom/RSS adapters] as rss
[Cache DB] as cache
[Newspaper\n(Mediapart, …)] as newspaper
() "Newspaper" as np_i
newspaper -up- np_i
[Article location] as article_location
[API] as api
() "API" as api_i
api -up- api_i
article_location ..> np_i
api -> article_location
api -> rss
newspaper -> retrieval_tools: uses to implement
article_location --> article_repr: uses
retrieval_tools -up-> article_repr: uses
auto_retrieve --> rss: watches
auto_retrieve --> article_location
auto_retrieve --> cache: stores in
}
frame "Web ui" {
[Web UI] as webui
[HTML renderer] as html_rend
[Pdf exporter] as pdf_rend
[Articles] as articles
webui --> html_rend
webui --> pdf_rend
webui -> articles
articles ..> api_i
}
[Chatbot] as chatbot
chatbot ..> api_i
actor User
User ..> webui
User ..> chatbot
actor "Newspaper programmer" as newspaper_programmer
newspaper_programmer ..> newspaper: implements
@enduml
A task queue could be added later to space requests.
Implementation plan
Phase I
Newspaper
interface : use to retrieve from newspaper websites- minimal chatbot (uses libraries directly)
ArticleLocation
: library for using severalNewspaper
and retrieving from a given url.
Phase II
- Article Representation : having a (beta) unified representation for downloaded
articles
- adding this representation to Newpsaper
Phase III
- Cache
- Atom/rss adapters
- automatic retrieve
Phase IV
- API
- chatbot (uses api)
Phase V
- web ui