Mickaël Serneels
c729bf666e
searx: use sample instances
...
don't loop over *all* instances
2019-05-01 17:43:28 +02:00
rofl0r
207574c815
import.txt: add chinese site
2019-05-01 17:43:28 +02:00
rofl0r
bf7ec03fbf
fetch.py: factor out twice used var
2019-05-01 17:43:28 +02:00
rofl0r
096ee21286
urignore: add some rules suppressing SEO spam
2019-05-01 17:43:28 +02:00
mickael
310b01140a
irc: implement use_ssl = 2
...
0: disabled, 1: enabled, 2: maybe
default is 0
2019-05-01 17:43:28 +02:00
mickael
0eebe4daff
populate import.txt
2019-05-01 17:43:28 +02:00
mickael
61c3ae6130
fix: define retrievals on import
2019-05-01 17:43:28 +02:00
mickael
0d1316052c
add servers.txt.sample
2019-03-05 22:29:16 +00:00
mickael
ceb840b00f
remove noexistent server
2019-03-05 22:29:16 +00:00
mickael
1ad5ca53e5
take care of old proxies
...
test old proxies during free time
2019-03-05 22:29:16 +00:00
rofl0r
2bacf77c8c
split ppf into two programs, ppf/scraper
2019-01-18 22:53:35 +00:00
rofl0r
8400eab7ee
insert_proxies: remove 500-at-a-time logic
...
it's now done by mysqlite.py executemany.
2019-01-18 21:50:48 +00:00
rofl0r
8be5ab1567
ppf: move insert function into dbs.py
2019-01-18 21:43:17 +00:00
rofl0r
aba74c8eab
mysqlite.py: improve
...
1) use a common try/except block for all ops
2) do not display query and args when DB is locked (could be several
hundreds rows)
3) re-raise non locking-related exceptions (e.g. a wrong sql statement)
4) split executemany rows into chunks of 500 (so the caller doesn't have
to do it)
2019-01-18 20:42:15 +00:00
rofl0r
5fd693a4a2
ppf: remove more unneeded stuff
2019-01-18 19:55:54 +00:00
rofl0r
d926e66092
ppf: remove unneeded stuff
2019-01-18 19:53:55 +00:00
rofl0r
b0f92fcdcd
ppf.py: improve urignore code readability
2019-01-18 19:52:15 +00:00
rofl0r
b99f83a991
fetch.py: improve readability of extract_urls
2019-01-18 19:32:37 +00:00
rofl0r
4a41796b19
factor out http related code from ppf.py
2019-01-18 19:30:42 +00:00
rofl0r
0dad0176f3
ppf: add new field proxies_added to be able to rate sites
...
sqlite3 urls.sqlite "alter table uris add proxies_added INT"
sqlite3 urls.sqlite "update uris set proxies_added=0"
2019-01-18 15:44:09 +00:00
rofl0r
0734635e30
watchd main thread: be less nervous
2019-01-18 15:35:19 +00:00
rofl0r
ddee92d20f
watchd: introduce configurable 'outage_threshold'
2019-01-18 15:34:49 +00:00
mickael
aaac14d34e
worker: add threading lock
...
add lock to avoid same proxy to be scanned multiple time when
a small number a jobs is handed to worker
2019-01-13 16:50:54 +00:00
mickael
f489f0c4dd
set retrievals to 0 for new uris
2019-01-13 16:50:54 +00:00
rofl0r
69d366f7eb
ppf: add retrievals field so we know whether an url is new
...
use
sqlite3 urls.sqlite "alter table uris add retrievals INT"
sqlite3 urls.sqlite "update uris set retrievals=0"
2019-01-13 16:40:12 +00:00
rofl0r
bc41bad9de
dbs.py: remove unused column hash
2019-01-13 16:40:12 +00:00
rofl0r
54e2c2a702
ppf: simplify statement
2019-01-13 16:40:12 +00:00
rofl0r
2f7a730311
ppf: use slice for the 500 rows limitation
2019-01-13 16:40:12 +00:00
rofl0r
d209356a85
comboparse: fix bug with bool cmd args always True
2019-01-13 16:40:12 +00:00
mickael
7c7fa8836a
patch: 1y4C
2019-01-13 16:40:12 +00:00
rofl0r
24d2c08c9f
ppf: make it possible to import a file containing proxies directly
...
using --file filename.html
2019-01-11 05:45:13 +00:00
rofl0r
ecf587c8f7
ppf: set newly added sites to 0,0 (err/stale)
...
we use the tuple 0,0 later on to detect whether a site is new or not.
2019-01-11 05:23:05 +00:00
rofl0r
8b10df9c1b
ppf.py: start using stale_count
2019-01-11 05:08:32 +00:00
rofl0r
d2cb7441a8
ppf: add optional debug output
2019-01-11 05:03:40 +00:00
rofl0r
b6dba08cf0
ppf: only extract ips with port >= 10
2019-01-11 03:29:13 +00:00
rofl0r
122847d888
ppf: fix bug referencing removed db field
2019-01-11 02:53:16 +00:00
rofl0r
7d59404d31
watchd: add totals statistics
2019-01-11 00:52:11 +00:00
mickael
4c6a83373f
split databases
2019-01-11 00:25:01 +00:00
mickael
b85cb863ba
remove more dead servers
2019-01-11 00:25:01 +00:00
rofl0r
5e774b4e2a
config.py: put section name in var
...
avoids errors due to typos
2019-01-11 00:25:01 +00:00
rofl0r
ef9158015f
proxywatchd: make checktime constants configurable
...
this requires only saving the last checked time in `tested`.
you can run the following sql statement to update the existing values
in the database:
sqlite3 proxylist.sqlite \
"update proxylist set tested=tested-(1800+(failed*3600)) where failed < 6"
2019-01-11 00:25:01 +00:00
rofl0r
087559637e
ppf: improve cleanhtml() and cache compiled re's
...
now it transforms e.g. '<td>118.114.116.36</td>\n<td>1080</td>'
correctly.
(the newline was formerly preventing success)
2019-01-10 19:22:21 +00:00
rofl0r
befb346941
proxywatchd: preliminary support for ip caching
...
whenever we make a socks4 check, the ip of the destination server
needs to be resolved because socks4 does not support server-side
dns resolution. in order to prevent doing the same lookups over
and over, we know manually resolve the ip before first usage, and
store it in a cache.
2019-01-10 19:22:21 +00:00
rofl0r
7067a8199f
rocksock: bump to latest
2019-01-10 19:22:21 +00:00
rofl0r
10d6b3afd8
servers.txt: remove dead server
2019-01-10 19:22:21 +00:00
mickael
383ae6f431
fix: no uris were tested because commented"
2019-01-10 00:21:57 +00:00
mserneels
0cbc434883
Merge branch 'experiment' into 'master'
...
Experiment
See merge request mserneels/ppf!11
2019-01-09 23:40:54 +00:00
mickael
da4f228479
discard urls who fail at first test
2019-01-09 23:38:59 +00:00
mickael
15dee0cd73
add -intitle:pdf to searx query
2019-01-09 23:30:55 +00:00
mickael
e94644a60e
searx: loop for 10 pages on each searx instance
2019-01-09 22:55:55 +00:00