Commit Graph

67 Commits

Author SHA1 Message Date
mickael
61c3ae6130 fix: define retrievals on import 2019-05-01 17:43:28 +02:00
rofl0r
2bacf77c8c split ppf into two programs, ppf/scraper 2019-01-18 22:53:35 +00:00
rofl0r
8be5ab1567 ppf: move insert function into dbs.py 2019-01-18 21:43:17 +00:00
rofl0r
5fd693a4a2 ppf: remove more unneeded stuff 2019-01-18 19:55:54 +00:00
rofl0r
d926e66092 ppf: remove unneeded stuff 2019-01-18 19:53:55 +00:00
rofl0r
b0f92fcdcd ppf.py: improve urignore code readability 2019-01-18 19:52:15 +00:00
rofl0r
4a41796b19 factor out http related code from ppf.py 2019-01-18 19:30:42 +00:00
rofl0r
0dad0176f3 ppf: add new field proxies_added to be able to rate sites
sqlite3 urls.sqlite "alter table uris add proxies_added INT"
sqlite3 urls.sqlite "update uris set proxies_added=0"
2019-01-18 15:44:09 +00:00
mickael
f489f0c4dd set retrievals to 0 for new uris 2019-01-13 16:50:54 +00:00
rofl0r
69d366f7eb ppf: add retrievals field so we know whether an url is new
use

sqlite3 urls.sqlite "alter table uris add retrievals INT"
sqlite3 urls.sqlite "update uris set retrievals=0"
2019-01-13 16:40:12 +00:00
rofl0r
54e2c2a702 ppf: simplify statement 2019-01-13 16:40:12 +00:00
rofl0r
2f7a730311 ppf: use slice for the 500 rows limitation 2019-01-13 16:40:12 +00:00
mickael
7c7fa8836a patch: 1y4C 2019-01-13 16:40:12 +00:00
rofl0r
24d2c08c9f ppf: make it possible to import a file containing proxies directly
using --file filename.html
2019-01-11 05:45:13 +00:00
rofl0r
ecf587c8f7 ppf: set newly added sites to 0,0 (err/stale)
we use the tuple 0,0 later on to detect whether a site is new or not.
2019-01-11 05:23:05 +00:00
rofl0r
8b10df9c1b ppf.py: start using stale_count 2019-01-11 05:08:32 +00:00
rofl0r
d2cb7441a8 ppf: add optional debug output 2019-01-11 05:03:40 +00:00
rofl0r
b6dba08cf0 ppf: only extract ips with port >= 10 2019-01-11 03:29:13 +00:00
rofl0r
122847d888 ppf: fix bug referencing removed db field 2019-01-11 02:53:16 +00:00
mickael
4c6a83373f split databases 2019-01-11 00:25:01 +00:00
rofl0r
087559637e ppf: improve cleanhtml() and cache compiled re's
now it transforms e.g. '<td>118.114.116.36</td>\n<td>1080</td>'
correctly.
(the newline was formerly preventing success)
2019-01-10 19:22:21 +00:00
mickael
383ae6f431 fix: no uris were tested because commented" 2019-01-10 00:21:57 +00:00
mickael
da4f228479 discard urls who fail at first test 2019-01-09 23:38:59 +00:00
mickael
15dee0cd73 add -intitle:pdf to searx query 2019-01-09 23:30:55 +00:00
mickael
e94644a60e searx: loop for 10 pages on each searx instance 2019-01-09 22:55:55 +00:00
mickael
8993727f03 changed regex 2019-01-09 20:07:28 +00:00
mickael
33887385f0 is_usable_proxy: group the 2 firsts lines 2019-01-09 19:23:09 +00:00
mickael
9828db79d4 is_usable_proxy(): dont check twice if A < 1 2019-01-09 19:11:05 +00:00
mickael
6f0d5c1ffa modify and rename should_i_... function
> remove :port from D
> check if octets are within a correct range
2019-01-09 19:01:55 +00:00
mickael
a74d6dfce8 do not save invalid IPs 2019-01-09 00:42:28 +00:00
rofl0r
6e4c45175e ppf: add safeguards against tor outage 2019-01-08 15:48:38 +00:00
rofl0r
1f3179de48 ppf: check for valid ports 2019-01-08 04:30:50 +00:00
rofl0r
9ccf8b7854 ppf: write dates as int 2019-01-08 04:19:09 +00:00
rofl0r
38d89f5bd9 ppf: add option for number of http retries 2019-01-08 03:30:31 +00:00
rofl0r
115c4a56f5 ppf: honor timeout 2019-01-08 03:25:52 +00:00
rofl0r
f16f754b0e implement combo config parser
allows all options to be overridden by command line.

e.g.
[watchd]
threads=10
debug=false

--watch.threads=50 --debug=true
2019-01-08 02:17:10 +00:00
rofl0r
e7b8d526c0 ppf: print url if fetching failed 2019-01-08 00:46:41 +00:00
mickael
1b3ce72efc add and use combining class 2019-01-07 23:19:14 +00:00
mickael
1288dca38f fixme: change var names 2019-01-07 21:41:41 +00:00
mickael
aeff09d2b3 move math function inside the sql statement 2019-01-07 21:11:08 +00:00
rofl0r
898c8f36ee ppf: fix cpu hogs 2019-01-07 15:38:51 +00:00
rofl0r
ad7c7fce67 ppf: use timeout and only 1 try for http 2019-01-07 05:37:44 +00:00
mickael
8b15faf84d ppf: change user-agent; use headers 2019-01-06 23:29:30 +00:00
mickael
3223cc82c4 use http2.py instead of requests 2019-01-06 22:22:42 +00:00
mickael
1a025f102f only load search/bad terms when "search" arg is enabled 2019-01-06 18:31:42 +00:00
mickael
5e9f8baf56 remove unused imports 2019-01-06 18:27:06 +00:00
mickael
64d9da9156 sleep even when no proxies are added 2019-01-06 02:58:58 +00:00
mickael
63b77043ac minor changes
remove comments, minimal code reorganization
2019-01-06 01:35:18 +00:00
mickael
84a1de26c3 sqlite: do not create tables with "duration" column 2019-01-06 00:50:35 +00:00
mickael
d93f4dcaf2 introduce success_count and total_duration (proxylist.sqlit
run those commands to update the database:

sqlite3 proxylist.sqlite "alter table proxylist add success_count int"
sqlite3 proxylist.sqlite "alter table proxylist add total_duration int"
sqlite3 proxylist.sqlite "update proxylist set success_count=0,total_duration=0"
2019-01-05 22:24:38 +00:00