avoid trying to extract stuff from pdf and such (only accept text/*)
REQUIRES:
sqlite3 websites.sqlite "alter table uris add content_type text"
Don't test known uris:
sqlite3 websites.sqlite "update uris set content_type='text/manual' WHERE error=0"
Exception in thread Thread-9:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "proxywatchd.py", line 200, in workloop
job.run()
File "proxywatchd.py", line 123, in run
sock, proto, duration, tor, srv, failinc = self.connect_socket()
ValueError: need more than 5 values to unpack
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "proxywatchd.py", line 191, in workloop
job.run()
File "proxywatchd.py", line 114, in run
sock, proto, duration, tor, srv, failinc = self.connect_socket()
File "proxywatchd.py", line 76, in connect_socket
sock.send('%s\n' % random.choice(['NICK', 'USER', 'JOIN', 'MODE', 'PART', 'INVITE', 'KNOCK', 'WHOIS', 'WHO', 'NOTICE', 'PRIVMSG', 'PING', 'QUIT']))
File "rocksock.py", line 279, in send
return self.sock.sendall(buf)
File "/usr/lib/python2.7/ssl.py", line 741, in sendall
v = self.send(data[count:])
File "/usr/lib/python2.7/ssl.py", line 707, in send
v = self._sslobj.write(data)
error: [Errno 32] Broken pipe
1) use a common try/except block for all ops
2) do not display query and args when DB is locked (could be several
hundreds rows)
3) re-raise non locking-related exceptions (e.g. a wrong sql statement)
4) split executemany rows into chunks of 500 (so the caller doesn't have
to do it)