Commit Graph

203 Commits

Author SHA1 Message Date
Your Name e15b9d2994 more changes 2021-02-04 23:06:37 +01:00
Your Name 78b29a1187 some changes 2021-01-24 03:52:56 +01:00
Mickaël Serneels fe2353acb2 update urignore 2019-05-30 21:17:46 +02:00
Mickaël Serneels d6b1880ade urignore: modify entry 2019-05-17 23:00:18 +02:00
Mickaël Serneels f179080cca use geoloc
now saves proxy's country in db
2019-05-17 22:59:32 +02:00
Mickaël Serneels eeedf9d0a1 extract url only from same domains ? (default: False)
setting this option will make ppf not follow external links when extracting uris
2019-05-14 21:24:29 +02:00
Mickaël Serneels b226bc0b03 check if bad url *after* building the url 2019-05-14 19:31:19 +02:00
Mickaël Serneels eeae849e12 space2tab 2019-05-14 19:29:30 +02:00
Mickaël Serneels bcaf7af0e7 extract_urls(): only when stale_count = 0 2019-05-13 23:49:35 +02:00
Mickaël Serneels e2122a27d9 ppf: strip extraced uris 2019-05-13 23:48:55 +02:00
Mickaël Serneels 225b76462c import_from_file: don't add empty url 2019-05-13 23:48:55 +02:00
Mickaël Serneels 99330204bc add new ignores 2019-05-13 23:48:55 +02:00
Mickaël Serneels c241f1a766 make use of dbs.insert_urls() 2019-05-01 23:19:50 +02:00
Mickaël Serneels c8d594fb73 add url extraction
url get extracted from webpage when page contains proxies

this allows to "learn" as much links as possible from a working website
2019-05-01 22:58:23 +02:00
rofl0r 866f308322 proxywatchd: remove bogus blanket exception handler
this would catch *any* exception, including typos
2019-05-01 20:05:57 +01:00
rofl0r 01435671c1 add latest rocksock 2019-05-01 20:04:30 +01:00
Mickaël Serneels 0fb706eeae clean code 2019-05-01 17:43:29 +02:00
Mickaël Serneels 9a624819d3 check content type 2019-05-01 17:43:29 +02:00
Mickaël Serneels 0962019386 add own searx instance 2019-05-01 17:43:29 +02:00
Mickaël Serneels 70b6285394 scraper: more changes 2019-05-01 17:43:29 +02:00
Mickaël Serneels 482cf79676 scraper: make query configurable (Proxies, Websites, Search)
--scraper.query = 'pws'
2019-05-01 17:43:28 +02:00
Mickaël Serneels 15fc29abc4 externalize searx instances into new file "searx.instances" 2019-05-01 17:43:28 +02:00
Mickaël Serneels c194d5cfc7 scraper: add debug option 2019-05-01 17:43:28 +02:00
Mickaël Serneels 0155c6f2ad ppf: check content-type (once) before trying to download/extract proxies
avoid trying to extract stuff from pdf and such (only accept text/*)

REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"

Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
2019-05-01 17:43:28 +02:00
Mickaël Serneels e19c473514 update imports.txt 2019-05-01 17:43:28 +02:00
Mickaël Serneels 75318209ab oldies_multi: change default value from 100 to 10 2019-05-01 17:43:28 +02:00
Mickaël Serneels d09244d04d proxywatchd: fix Exception error
Exception in thread Thread-9:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "proxywatchd.py", line 200, in workloop
    job.run()
  File "proxywatchd.py", line 123, in run
    sock, proto, duration, tor, srv, failinc = self.connect_socket()
ValueError: need more than 5 values to unpack
2019-05-01 17:43:28 +02:00
Mickaël Serneels 7aea9a3e53 irc: minimize possible response code 2019-05-01 17:43:28 +02:00
Mickaël Serneels 7b9f8b2e00 create socks4_resolve()
moves socks4 resolution out of socket_connect block
2019-05-01 17:43:28 +02:00
Mickaël Serneels bad4d25bcf make watchd.tor_safeguard a configurable option (default: True) 2019-05-01 17:43:28 +02:00
Mickaël Serneels 59eea18bca update urignore 2019-05-01 17:43:28 +02:00
Mickaël Serneels 6427d4a645 remove that specific blogspot url 2019-05-01 17:43:28 +02:00
Mickaël Serneels 475f10560e search: more changes 2019-05-01 17:43:28 +02:00
Mickaël Serneels 8900153871 set default error value to 1 for new urls 2019-05-01 17:43:28 +02:00
Mickaël Serneels fdd486f73c remove '-intitle:pdf' from default search 2019-05-01 17:43:28 +02:00
Mickaël Serneels a2783bdfcf don't loop over every searx instances
randomly pick one per search, instead
2019-05-01 17:43:28 +02:00
Mickaël Serneels 67aec84320 fix Exception error
Exception in thread Thread-8:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "proxywatchd.py", line 191, in workloop
    job.run()
  File "proxywatchd.py", line 114, in run
    sock, proto, duration, tor, srv, failinc = self.connect_socket()
  File "proxywatchd.py", line 76, in connect_socket
    sock.send('%s\n' % random.choice(['NICK', 'USER', 'JOIN', 'MODE', 'PART', 'INVITE', 'KNOCK', 'WHOIS', 'WHO', 'NOTICE', 'PRIVMSG', 'PING', 'QUIT']))
  File "rocksock.py", line 279, in send
    return self.sock.sendall(buf)
  File "/usr/lib/python2.7/ssl.py", line 741, in sendall
    v = self.send(data[count:])
  File "/usr/lib/python2.7/ssl.py", line 707, in send
    v = self._sslobj.write(data)
error: [Errno 32] Broken pipe
2019-05-01 17:43:28 +02:00
Mickaël Serneels 003a9074d2 make server file configurable 2019-05-01 17:43:28 +02:00
Mickaël Serneels c729bf666e searx: use sample instances
don't loop over *all* instances
2019-05-01 17:43:28 +02:00
rofl0r 207574c815 import.txt: add chinese site 2019-05-01 17:43:28 +02:00
rofl0r bf7ec03fbf fetch.py: factor out twice used var 2019-05-01 17:43:28 +02:00
rofl0r 096ee21286 urignore: add some rules suppressing SEO spam 2019-05-01 17:43:28 +02:00
mickael 310b01140a irc: implement use_ssl = 2
0: disabled, 1: enabled, 2: maybe
default is 0
2019-05-01 17:43:28 +02:00
mickael 0eebe4daff populate import.txt 2019-05-01 17:43:28 +02:00
mickael 61c3ae6130 fix: define retrievals on import 2019-05-01 17:43:28 +02:00
mickael 0d1316052c add servers.txt.sample 2019-03-05 22:29:16 +00:00
mickael ceb840b00f remove noexistent server 2019-03-05 22:29:16 +00:00
mickael 1ad5ca53e5 take care of old proxies
test old proxies during free time
2019-03-05 22:29:16 +00:00
rofl0r 2bacf77c8c split ppf into two programs, ppf/scraper 2019-01-18 22:53:35 +00:00
rofl0r 8400eab7ee insert_proxies: remove 500-at-a-time logic
it's now done by mysqlite.py executemany.
2019-01-18 21:50:48 +00:00