Your Name
78b29a1187
some changes
2021-01-24 03:52:56 +01:00
Mickaël Serneels
fe2353acb2
update urignore
2019-05-30 21:17:46 +02:00
Mickaël Serneels
d6b1880ade
urignore: modify entry
2019-05-17 23:00:18 +02:00
Mickaël Serneels
f179080cca
use geoloc
...
now saves proxy's country in db
2019-05-17 22:59:32 +02:00
Mickaël Serneels
eeedf9d0a1
extract url only from same domains ? (default: False)
...
setting this option will make ppf not follow external links when extracting uris
2019-05-14 21:24:29 +02:00
Mickaël Serneels
b226bc0b03
check if bad url *after* building the url
2019-05-14 19:31:19 +02:00
Mickaël Serneels
eeae849e12
space2tab
2019-05-14 19:29:30 +02:00
Mickaël Serneels
bcaf7af0e7
extract_urls(): only when stale_count = 0
2019-05-13 23:49:35 +02:00
Mickaël Serneels
e2122a27d9
ppf: strip extraced uris
2019-05-13 23:48:55 +02:00
Mickaël Serneels
225b76462c
import_from_file: don't add empty url
2019-05-13 23:48:55 +02:00
Mickaël Serneels
99330204bc
add new ignores
2019-05-13 23:48:55 +02:00
Mickaël Serneels
c241f1a766
make use of dbs.insert_urls()
2019-05-01 23:19:50 +02:00
Mickaël Serneels
c8d594fb73
add url extraction
...
url get extracted from webpage when page contains proxies
this allows to "learn" as much links as possible from a working website
2019-05-01 22:58:23 +02:00
rofl0r
866f308322
proxywatchd: remove bogus blanket exception handler
...
this would catch *any* exception, including typos
2019-05-01 20:05:57 +01:00
rofl0r
01435671c1
add latest rocksock
2019-05-01 20:04:30 +01:00
Mickaël Serneels
0fb706eeae
clean code
2019-05-01 17:43:29 +02:00
Mickaël Serneels
9a624819d3
check content type
2019-05-01 17:43:29 +02:00
Mickaël Serneels
0962019386
add own searx instance
2019-05-01 17:43:29 +02:00
Mickaël Serneels
70b6285394
scraper: more changes
2019-05-01 17:43:29 +02:00
Mickaël Serneels
482cf79676
scraper: make query configurable (Proxies, Websites, Search)
...
--scraper.query = 'pws'
2019-05-01 17:43:28 +02:00
Mickaël Serneels
15fc29abc4
externalize searx instances into new file "searx.instances"
2019-05-01 17:43:28 +02:00
Mickaël Serneels
c194d5cfc7
scraper: add debug option
2019-05-01 17:43:28 +02:00
Mickaël Serneels
0155c6f2ad
ppf: check content-type (once) before trying to download/extract proxies
...
avoid trying to extract stuff from pdf and such (only accept text/*)
REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"
Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
2019-05-01 17:43:28 +02:00
Mickaël Serneels
e19c473514
update imports.txt
2019-05-01 17:43:28 +02:00
Mickaël Serneels
75318209ab
oldies_multi: change default value from 100 to 10
2019-05-01 17:43:28 +02:00
Mickaël Serneels
d09244d04d
proxywatchd: fix Exception error
...
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "proxywatchd.py", line 200, in workloop
job.run()
File "proxywatchd.py", line 123, in run
sock, proto, duration, tor, srv, failinc = self.connect_socket()
ValueError: need more than 5 values to unpack
2019-05-01 17:43:28 +02:00
Mickaël Serneels
7aea9a3e53
irc: minimize possible response code
2019-05-01 17:43:28 +02:00
Mickaël Serneels
7b9f8b2e00
create socks4_resolve()
...
moves socks4 resolution out of socket_connect block
2019-05-01 17:43:28 +02:00
Mickaël Serneels
bad4d25bcf
make watchd.tor_safeguard a configurable option (default: True)
2019-05-01 17:43:28 +02:00
Mickaël Serneels
59eea18bca
update urignore
2019-05-01 17:43:28 +02:00
Mickaël Serneels
6427d4a645
remove that specific blogspot url
2019-05-01 17:43:28 +02:00
Mickaël Serneels
475f10560e
search: more changes
2019-05-01 17:43:28 +02:00
Mickaël Serneels
8900153871
set default error value to 1 for new urls
2019-05-01 17:43:28 +02:00
Mickaël Serneels
fdd486f73c
remove '-intitle:pdf' from default search
2019-05-01 17:43:28 +02:00
Mickaël Serneels
a2783bdfcf
don't loop over every searx instances
...
randomly pick one per search, instead
2019-05-01 17:43:28 +02:00
Mickaël Serneels
67aec84320
fix Exception error
...
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "proxywatchd.py", line 191, in workloop
job.run()
File "proxywatchd.py", line 114, in run
sock, proto, duration, tor, srv, failinc = self.connect_socket()
File "proxywatchd.py", line 76, in connect_socket
sock.send('%s\n' % random.choice(['NICK', 'USER', 'JOIN', 'MODE', 'PART', 'INVITE', 'KNOCK', 'WHOIS', 'WHO', 'NOTICE', 'PRIVMSG', 'PING', 'QUIT']))
File "rocksock.py", line 279, in send
return self.sock.sendall(buf)
File "/usr/lib/python2.7/ssl.py", line 741, in sendall
v = self.send(data[count:])
File "/usr/lib/python2.7/ssl.py", line 707, in send
v = self._sslobj.write(data)
error: [Errno 32] Broken pipe
2019-05-01 17:43:28 +02:00
Mickaël Serneels
003a9074d2
make server file configurable
2019-05-01 17:43:28 +02:00
Mickaël Serneels
c729bf666e
searx: use sample instances
...
don't loop over *all* instances
2019-05-01 17:43:28 +02:00
rofl0r
207574c815
import.txt: add chinese site
2019-05-01 17:43:28 +02:00
rofl0r
bf7ec03fbf
fetch.py: factor out twice used var
2019-05-01 17:43:28 +02:00
rofl0r
096ee21286
urignore: add some rules suppressing SEO spam
2019-05-01 17:43:28 +02:00
mickael
310b01140a
irc: implement use_ssl = 2
...
0: disabled, 1: enabled, 2: maybe
default is 0
2019-05-01 17:43:28 +02:00
mickael
0eebe4daff
populate import.txt
2019-05-01 17:43:28 +02:00
mickael
61c3ae6130
fix: define retrievals on import
2019-05-01 17:43:28 +02:00
mickael
0d1316052c
add servers.txt.sample
2019-03-05 22:29:16 +00:00
mickael
ceb840b00f
remove noexistent server
2019-03-05 22:29:16 +00:00
mickael
1ad5ca53e5
take care of old proxies
...
test old proxies during free time
2019-03-05 22:29:16 +00:00
rofl0r
2bacf77c8c
split ppf into two programs, ppf/scraper
2019-01-18 22:53:35 +00:00
rofl0r
8400eab7ee
insert_proxies: remove 500-at-a-time logic
...
it's now done by mysqlite.py executemany.
2019-01-18 21:50:48 +00:00
rofl0r
8be5ab1567
ppf: move insert function into dbs.py
2019-01-18 21:43:17 +00:00