Commit Graph

182 Commits

Author SHA1 Message Date
Mickaël Serneels
15fc29abc4 externalize searx instances into new file "searx.instances" 2019-05-01 17:43:28 +02:00
Mickaël Serneels
c194d5cfc7 scraper: add debug option 2019-05-01 17:43:28 +02:00
Mickaël Serneels
0155c6f2ad ppf: check content-type (once) before trying to download/extract proxies
avoid trying to extract stuff from pdf and such (only accept text/*)

REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"

Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
2019-05-01 17:43:28 +02:00
Mickaël Serneels
e19c473514 update imports.txt 2019-05-01 17:43:28 +02:00
Mickaël Serneels
75318209ab oldies_multi: change default value from 100 to 10 2019-05-01 17:43:28 +02:00
Mickaël Serneels
d09244d04d proxywatchd: fix Exception error
Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "proxywatchd.py", line 200, in workloop
    job.run()
  File "proxywatchd.py", line 123, in run
    sock, proto, duration, tor, srv, failinc = self.connect_socket()
ValueError: need more than 5 values to unpack
2019-05-01 17:43:28 +02:00
Mickaël Serneels
7aea9a3e53 irc: minimize possible response code 2019-05-01 17:43:28 +02:00
Mickaël Serneels
7b9f8b2e00 create socks4_resolve()
moves socks4 resolution out of socket_connect block
2019-05-01 17:43:28 +02:00
Mickaël Serneels
bad4d25bcf make watchd.tor_safeguard a configurable option (default: True) 2019-05-01 17:43:28 +02:00
Mickaël Serneels
59eea18bca update urignore 2019-05-01 17:43:28 +02:00
Mickaël Serneels
6427d4a645 remove that specific blogspot url 2019-05-01 17:43:28 +02:00
Mickaël Serneels
475f10560e search: more changes 2019-05-01 17:43:28 +02:00
Mickaël Serneels
8900153871 set default error value to 1 for new urls 2019-05-01 17:43:28 +02:00
Mickaël Serneels
fdd486f73c remove '-intitle:pdf' from default search 2019-05-01 17:43:28 +02:00
Mickaël Serneels
a2783bdfcf don't loop over every searx instances
randomly pick one per search, instead
2019-05-01 17:43:28 +02:00
Mickaël Serneels
67aec84320 fix Exception error
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "proxywatchd.py", line 191, in workloop
    job.run()
  File "proxywatchd.py", line 114, in run
    sock, proto, duration, tor, srv, failinc = self.connect_socket()
  File "proxywatchd.py", line 76, in connect_socket
    sock.send('%s\n' % random.choice(['NICK', 'USER', 'JOIN', 'MODE', 'PART', 'INVITE', 'KNOCK', 'WHOIS', 'WHO', 'NOTICE', 'PRIVMSG', 'PING', 'QUIT']))
  File "rocksock.py", line 279, in send
    return self.sock.sendall(buf)
  File "/usr/lib/python2.7/ssl.py", line 741, in sendall
    v = self.send(data[count:])
  File "/usr/lib/python2.7/ssl.py", line 707, in send
    v = self._sslobj.write(data)
error: [Errno 32] Broken pipe
2019-05-01 17:43:28 +02:00
Mickaël Serneels
003a9074d2 make server file configurable 2019-05-01 17:43:28 +02:00
Mickaël Serneels
c729bf666e searx: use sample instances
don't loop over *all* instances
2019-05-01 17:43:28 +02:00
rofl0r
207574c815 import.txt: add chinese site 2019-05-01 17:43:28 +02:00
rofl0r
bf7ec03fbf fetch.py: factor out twice used var 2019-05-01 17:43:28 +02:00
rofl0r
096ee21286 urignore: add some rules suppressing SEO spam 2019-05-01 17:43:28 +02:00
mickael
310b01140a irc: implement use_ssl = 2
0: disabled, 1: enabled, 2: maybe
default is 0
2019-05-01 17:43:28 +02:00
mickael
0eebe4daff populate import.txt 2019-05-01 17:43:28 +02:00
mickael
61c3ae6130 fix: define retrievals on import 2019-05-01 17:43:28 +02:00
mickael
0d1316052c add servers.txt.sample 2019-03-05 22:29:16 +00:00
mickael
ceb840b00f remove noexistent server 2019-03-05 22:29:16 +00:00
mickael
1ad5ca53e5 take care of old proxies
test old proxies during free time
2019-03-05 22:29:16 +00:00
rofl0r
2bacf77c8c split ppf into two programs, ppf/scraper 2019-01-18 22:53:35 +00:00
rofl0r
8400eab7ee insert_proxies: remove 500-at-a-time logic
it's now done by mysqlite.py executemany.
2019-01-18 21:50:48 +00:00
rofl0r
8be5ab1567 ppf: move insert function into dbs.py 2019-01-18 21:43:17 +00:00
rofl0r
aba74c8eab mysqlite.py: improve
1) use a common try/except block for all ops
2) do not display query and args when DB is locked (could be several
   hundreds rows)
3) re-raise non locking-related exceptions (e.g. a wrong sql statement)
4) split executemany rows into chunks of 500 (so the caller doesn't have
   to do it)
2019-01-18 20:42:15 +00:00
rofl0r
5fd693a4a2 ppf: remove more unneeded stuff 2019-01-18 19:55:54 +00:00
rofl0r
d926e66092 ppf: remove unneeded stuff 2019-01-18 19:53:55 +00:00
rofl0r
b0f92fcdcd ppf.py: improve urignore code readability 2019-01-18 19:52:15 +00:00
rofl0r
b99f83a991 fetch.py: improve readability of extract_urls 2019-01-18 19:32:37 +00:00
rofl0r
4a41796b19 factor out http related code from ppf.py 2019-01-18 19:30:42 +00:00
rofl0r
0dad0176f3 ppf: add new field proxies_added to be able to rate sites
sqlite3 urls.sqlite "alter table uris add proxies_added INT"
sqlite3 urls.sqlite "update uris set proxies_added=0"
2019-01-18 15:44:09 +00:00
rofl0r
0734635e30 watchd main thread: be less nervous 2019-01-18 15:35:19 +00:00
rofl0r
ddee92d20f watchd: introduce configurable 'outage_threshold' 2019-01-18 15:34:49 +00:00
mickael
aaac14d34e worker: add threading lock
add lock to avoid same proxy to be scanned multiple time when
a small number a jobs is handed to worker
2019-01-13 16:50:54 +00:00
mickael
f489f0c4dd set retrievals to 0 for new uris 2019-01-13 16:50:54 +00:00
rofl0r
69d366f7eb ppf: add retrievals field so we know whether an url is new
use

sqlite3 urls.sqlite "alter table uris add retrievals INT"
sqlite3 urls.sqlite "update uris set retrievals=0"
2019-01-13 16:40:12 +00:00
rofl0r
bc41bad9de dbs.py: remove unused column hash 2019-01-13 16:40:12 +00:00
rofl0r
54e2c2a702 ppf: simplify statement 2019-01-13 16:40:12 +00:00
rofl0r
2f7a730311 ppf: use slice for the 500 rows limitation 2019-01-13 16:40:12 +00:00
rofl0r
d209356a85 comboparse: fix bug with bool cmd args always True 2019-01-13 16:40:12 +00:00
mickael
7c7fa8836a patch: 1y4C 2019-01-13 16:40:12 +00:00
rofl0r
24d2c08c9f ppf: make it possible to import a file containing proxies directly
using --file filename.html
2019-01-11 05:45:13 +00:00
rofl0r
ecf587c8f7 ppf: set newly added sites to 0,0 (err/stale)
we use the tuple 0,0 later on to detect whether a site is new or not.
2019-01-11 05:23:05 +00:00
rofl0r
8b10df9c1b ppf.py: start using stale_count 2019-01-11 05:08:32 +00:00