rofl0r
d926e66092
ppf: remove unneeded stuff
2019-01-18 19:53:55 +00:00
rofl0r
b0f92fcdcd
ppf.py: improve urignore code readability
2019-01-18 19:52:15 +00:00
rofl0r
4a41796b19
factor out http related code from ppf.py
2019-01-18 19:30:42 +00:00
rofl0r
0dad0176f3
ppf: add new field proxies_added to be able to rate sites
...
sqlite3 urls.sqlite "alter table uris add proxies_added INT"
sqlite3 urls.sqlite "update uris set proxies_added=0"
2019-01-18 15:44:09 +00:00
mickael
f489f0c4dd
set retrievals to 0 for new uris
2019-01-13 16:50:54 +00:00
rofl0r
69d366f7eb
ppf: add retrievals field so we know whether an url is new
...
use
sqlite3 urls.sqlite "alter table uris add retrievals INT"
sqlite3 urls.sqlite "update uris set retrievals=0"
2019-01-13 16:40:12 +00:00
rofl0r
54e2c2a702
ppf: simplify statement
2019-01-13 16:40:12 +00:00
rofl0r
2f7a730311
ppf: use slice for the 500 rows limitation
2019-01-13 16:40:12 +00:00
mickael
7c7fa8836a
patch: 1y4C
2019-01-13 16:40:12 +00:00
rofl0r
24d2c08c9f
ppf: make it possible to import a file containing proxies directly
...
using --file filename.html
2019-01-11 05:45:13 +00:00
rofl0r
ecf587c8f7
ppf: set newly added sites to 0,0 (err/stale)
...
we use the tuple 0,0 later on to detect whether a site is new or not.
2019-01-11 05:23:05 +00:00
rofl0r
8b10df9c1b
ppf.py: start using stale_count
2019-01-11 05:08:32 +00:00
rofl0r
d2cb7441a8
ppf: add optional debug output
2019-01-11 05:03:40 +00:00
rofl0r
b6dba08cf0
ppf: only extract ips with port >= 10
2019-01-11 03:29:13 +00:00
rofl0r
122847d888
ppf: fix bug referencing removed db field
2019-01-11 02:53:16 +00:00
mickael
4c6a83373f
split databases
2019-01-11 00:25:01 +00:00
rofl0r
087559637e
ppf: improve cleanhtml() and cache compiled re's
...
now it transforms e.g. '<td>118.114.116.36</td>\n<td>1080</td>'
correctly.
(the newline was formerly preventing success)
2019-01-10 19:22:21 +00:00
mickael
383ae6f431
fix: no uris were tested because commented"
2019-01-10 00:21:57 +00:00
mickael
da4f228479
discard urls who fail at first test
2019-01-09 23:38:59 +00:00
mickael
15dee0cd73
add -intitle:pdf to searx query
2019-01-09 23:30:55 +00:00
mickael
e94644a60e
searx: loop for 10 pages on each searx instance
2019-01-09 22:55:55 +00:00
mickael
8993727f03
changed regex
2019-01-09 20:07:28 +00:00
mickael
33887385f0
is_usable_proxy: group the 2 firsts lines
2019-01-09 19:23:09 +00:00
mickael
9828db79d4
is_usable_proxy(): dont check twice if A < 1
2019-01-09 19:11:05 +00:00
mickael
6f0d5c1ffa
modify and rename should_i_... function
...
> remove :port from D
> check if octets are within a correct range
2019-01-09 19:01:55 +00:00
mickael
a74d6dfce8
do not save invalid IPs
2019-01-09 00:42:28 +00:00
rofl0r
6e4c45175e
ppf: add safeguards against tor outage
2019-01-08 15:48:38 +00:00
rofl0r
1f3179de48
ppf: check for valid ports
2019-01-08 04:30:50 +00:00
rofl0r
9ccf8b7854
ppf: write dates as int
2019-01-08 04:19:09 +00:00
rofl0r
38d89f5bd9
ppf: add option for number of http retries
2019-01-08 03:30:31 +00:00
rofl0r
115c4a56f5
ppf: honor timeout
2019-01-08 03:25:52 +00:00
rofl0r
f16f754b0e
implement combo config parser
...
allows all options to be overridden by command line.
e.g.
[watchd]
threads=10
debug=false
--watch.threads=50 --debug=true
2019-01-08 02:17:10 +00:00
rofl0r
e7b8d526c0
ppf: print url if fetching failed
2019-01-08 00:46:41 +00:00
mickael
1b3ce72efc
add and use combining class
2019-01-07 23:19:14 +00:00
mickael
1288dca38f
fixme: change var names
2019-01-07 21:41:41 +00:00
mickael
aeff09d2b3
move math function inside the sql statement
2019-01-07 21:11:08 +00:00
rofl0r
898c8f36ee
ppf: fix cpu hogs
2019-01-07 15:38:51 +00:00
rofl0r
ad7c7fce67
ppf: use timeout and only 1 try for http
2019-01-07 05:37:44 +00:00
mickael
8b15faf84d
ppf: change user-agent; use headers
2019-01-06 23:29:30 +00:00
mickael
3223cc82c4
use http2.py instead of requests
2019-01-06 22:22:42 +00:00
mickael
1a025f102f
only load search/bad terms when "search" arg is enabled
2019-01-06 18:31:42 +00:00
mickael
5e9f8baf56
remove unused imports
2019-01-06 18:27:06 +00:00
mickael
64d9da9156
sleep even when no proxies are added
2019-01-06 02:58:58 +00:00
mickael
63b77043ac
minor changes
...
remove comments, minimal code reorganization
2019-01-06 01:35:18 +00:00
mickael
84a1de26c3
sqlite: do not create tables with "duration" column
2019-01-06 00:50:35 +00:00
mickael
d93f4dcaf2
introduce success_count and total_duration (proxylist.sqlit
...
run those commands to update the database:
sqlite3 proxylist.sqlite "alter table proxylist add success_count int"
sqlite3 proxylist.sqlite "alter table proxylist add total_duration int"
sqlite3 proxylist.sqlite "update proxylist set success_count=0,total_duration=0"
2019-01-05 22:24:38 +00:00
rofl0r
af8f82924f
fix logic so threads do an orderly shutdown
...
basically the issue was that the main loop received the SIGINT
and therefore broke out before reaching the parts of the code
that care about bringing down the child threads.
therefore there's now a finish() method that needs to be called
after stop().
because sqlite dbs insists to be used from the thread that created
the object, the DB cleanup operation are done from the thread
that controls it.
for standalone operation, in order to keep the main thread alive,
an additional run() method is used. this is not necessary when
used via ppf.py.
2019-01-05 17:17:27 +00:00
rofl0r
bb3da7122e
ppf: properly reraise keyboard interrupts
2019-01-05 17:11:08 +00:00
rofl0r
9ac3ed45d6
rewrite threading code in jobwatchd
...
now it distributes the tasks properly among all threads,
and it can be used as a standalone program.
there are some minor performance issues which will be fixed shortly.
2019-01-05 06:35:41 +00:00
rofl0r
ffbe450aee
outsource configuration to external module
2019-01-05 03:47:03 +00:00