Previous: , Up: Usage  


WARCs

Similarly to enclosures downloading, you may run downloading of X-URL URLs, pointing to the article itself. If it is a HTML document, it can depend on various other resources, like images and stylesheets. GNU Wget has the ability to download it with all required requisites. Moreover it is able to output the whole document in WARC format.

$ ./feeds-warcs
[...]
www.darkside.ru_news_rss/warcs/20220218-145755-www.darkside.ru_news_140480.warc
[...]

It is not compressed by default. You can both view and compress them with tofuproxy’s help as an option. After you get pile of various *.warc files, you can simply add them to running tofuproxy:

$ for w (feeds/*/warcs/*.warc) print $w:a > path/to/tofuproxy/fifos/add-warcs

And then visit http://warc/ URL (when tofuproxy already acts as a proxy) to view and visit existing URLs.

Of course you can download only single feed:

$ cmd/warcs path/to/FEED [optional overriden destination directory]