Commit Graph

113 Commits

Author SHA1 Message Date
rofl0r
d926e66092 ppf: remove unneeded stuff 2019-01-18 19:53:55 +00:00
rofl0r
b0f92fcdcd ppf.py: improve urignore code readability 2019-01-18 19:52:15 +00:00
rofl0r
4a41796b19 factor out http related code from ppf.py 2019-01-18 19:30:42 +00:00
rofl0r
0dad0176f3 ppf: add new field proxies_added to be able to rate sites
sqlite3 urls.sqlite "alter table uris add proxies_added INT"
sqlite3 urls.sqlite "update uris set proxies_added=0"
2019-01-18 15:44:09 +00:00
mickael
f489f0c4dd set retrievals to 0 for new uris 2019-01-13 16:50:54 +00:00
rofl0r
69d366f7eb ppf: add retrievals field so we know whether an url is new
use

sqlite3 urls.sqlite "alter table uris add retrievals INT"
sqlite3 urls.sqlite "update uris set retrievals=0"
2019-01-13 16:40:12 +00:00
rofl0r
54e2c2a702 ppf: simplify statement 2019-01-13 16:40:12 +00:00
rofl0r
2f7a730311 ppf: use slice for the 500 rows limitation 2019-01-13 16:40:12 +00:00
mickael
7c7fa8836a patch: 1y4C 2019-01-13 16:40:12 +00:00
rofl0r
24d2c08c9f ppf: make it possible to import a file containing proxies directly
using --file filename.html
2019-01-11 05:45:13 +00:00
rofl0r
ecf587c8f7 ppf: set newly added sites to 0,0 (err/stale)
we use the tuple 0,0 later on to detect whether a site is new or not.
2019-01-11 05:23:05 +00:00
rofl0r
8b10df9c1b ppf.py: start using stale_count 2019-01-11 05:08:32 +00:00
rofl0r
d2cb7441a8 ppf: add optional debug output 2019-01-11 05:03:40 +00:00
rofl0r
b6dba08cf0 ppf: only extract ips with port >= 10 2019-01-11 03:29:13 +00:00
rofl0r
122847d888 ppf: fix bug referencing removed db field 2019-01-11 02:53:16 +00:00
mickael
4c6a83373f split databases 2019-01-11 00:25:01 +00:00
rofl0r
087559637e ppf: improve cleanhtml() and cache compiled re's
now it transforms e.g. '<td>118.114.116.36</td>\n<td>1080</td>'
correctly.
(the newline was formerly preventing success)
2019-01-10 19:22:21 +00:00
mickael
383ae6f431 fix: no uris were tested because commented" 2019-01-10 00:21:57 +00:00
mickael
da4f228479 discard urls who fail at first test 2019-01-09 23:38:59 +00:00
mickael
15dee0cd73 add -intitle:pdf to searx query 2019-01-09 23:30:55 +00:00
mickael
e94644a60e searx: loop for 10 pages on each searx instance 2019-01-09 22:55:55 +00:00
mickael
8993727f03 changed regex 2019-01-09 20:07:28 +00:00
mickael
33887385f0 is_usable_proxy: group the 2 firsts lines 2019-01-09 19:23:09 +00:00
mickael
9828db79d4 is_usable_proxy(): dont check twice if A < 1 2019-01-09 19:11:05 +00:00
mickael
6f0d5c1ffa modify and rename should_i_... function
> remove :port from D
> check if octets are within a correct range
2019-01-09 19:01:55 +00:00
mickael
a74d6dfce8 do not save invalid IPs 2019-01-09 00:42:28 +00:00
rofl0r
6e4c45175e ppf: add safeguards against tor outage 2019-01-08 15:48:38 +00:00
rofl0r
1f3179de48 ppf: check for valid ports 2019-01-08 04:30:50 +00:00
rofl0r
9ccf8b7854 ppf: write dates as int 2019-01-08 04:19:09 +00:00
rofl0r
38d89f5bd9 ppf: add option for number of http retries 2019-01-08 03:30:31 +00:00
rofl0r
115c4a56f5 ppf: honor timeout 2019-01-08 03:25:52 +00:00
rofl0r
f16f754b0e implement combo config parser
allows all options to be overridden by command line.

e.g.
[watchd]
threads=10
debug=false

--watch.threads=50 --debug=true
2019-01-08 02:17:10 +00:00
rofl0r
e7b8d526c0 ppf: print url if fetching failed 2019-01-08 00:46:41 +00:00
mickael
1b3ce72efc add and use combining class 2019-01-07 23:19:14 +00:00
mickael
1288dca38f fixme: change var names 2019-01-07 21:41:41 +00:00
mickael
aeff09d2b3 move math function inside the sql statement 2019-01-07 21:11:08 +00:00
rofl0r
898c8f36ee ppf: fix cpu hogs 2019-01-07 15:38:51 +00:00
rofl0r
ad7c7fce67 ppf: use timeout and only 1 try for http 2019-01-07 05:37:44 +00:00
mickael
8b15faf84d ppf: change user-agent; use headers 2019-01-06 23:29:30 +00:00
mickael
3223cc82c4 use http2.py instead of requests 2019-01-06 22:22:42 +00:00
mickael
1a025f102f only load search/bad terms when "search" arg is enabled 2019-01-06 18:31:42 +00:00
mickael
5e9f8baf56 remove unused imports 2019-01-06 18:27:06 +00:00
mickael
64d9da9156 sleep even when no proxies are added 2019-01-06 02:58:58 +00:00
mickael
63b77043ac minor changes
remove comments, minimal code reorganization
2019-01-06 01:35:18 +00:00
mickael
84a1de26c3 sqlite: do not create tables with "duration" column 2019-01-06 00:50:35 +00:00
mickael
d93f4dcaf2 introduce success_count and total_duration (proxylist.sqlit
run those commands to update the database:

sqlite3 proxylist.sqlite "alter table proxylist add success_count int"
sqlite3 proxylist.sqlite "alter table proxylist add total_duration int"
sqlite3 proxylist.sqlite "update proxylist set success_count=0,total_duration=0"
2019-01-05 22:24:38 +00:00
rofl0r
af8f82924f fix logic so threads do an orderly shutdown
basically the issue was that the main loop received the SIGINT
and therefore broke out before reaching the parts of the code
that care about bringing down the child threads.

therefore there's now a finish() method that needs to be called
after stop().

because sqlite dbs insists to be used from the thread that created
the object, the DB cleanup operation are done from the thread
that controls it.

for standalone operation, in order to keep the main thread alive,
an additional run() method is used. this is not necessary when
used via ppf.py.
2019-01-05 17:17:27 +00:00
rofl0r
bb3da7122e ppf: properly reraise keyboard interrupts 2019-01-05 17:11:08 +00:00
rofl0r
9ac3ed45d6 rewrite threading code in jobwatchd
now it distributes the tasks properly among all threads,
and it can be used as a standalone program.
there are some minor performance issues which will be fixed shortly.
2019-01-05 06:35:41 +00:00
rofl0r
ffbe450aee outsource configuration to external module 2019-01-05 03:47:03 +00:00