Commit graph

8 commits

Author SHA1 Message Date
Ross Williams
310b4d1242 Add htmltotext extractor
Saves HTML text nodes and selected element attributes in
`htmltotext.txt` for each Snapshot. Primarily intended to be used
for search indexing.
2023-10-23 21:42:32 -04:00
Cristian
f6ce1de882 fix: archivebox version was being called as root 2020-10-27 09:15:14 -05:00
Cristian
e1d0b8bce7 feat: Initialize django at the beginning 2020-10-26 07:45:21 -05:00
Cristian
62ed11a5ca fix: Improve headers handling 2020-09-24 12:55:51 -05:00
ttimasdf
e3329be291 tests: add test for mercury-parser 2020-09-22 18:44:12 -05:00
Cristian
8aa7b34de7 tests: Add readability to ignored methods in tests 2020-08-11 08:58:49 -05:00
Cristian
5429096c30 tests: Add mechanism to avoid using extractors that we are not testing 2020-08-04 08:42:30 -05:00
Cristian
d5fc13b34e refactor: Move pytest fixtures to its own file 2020-07-07 08:36:58 -05:00