mickael
|
61c3ae6130
|
fix: define retrievals on import
|
2019-05-01 17:43:28 +02:00 |
|
rofl0r
|
2bacf77c8c
|
split ppf into two programs, ppf/scraper
|
2019-01-18 22:53:35 +00:00 |
|
rofl0r
|
8be5ab1567
|
ppf: move insert function into dbs.py
|
2019-01-18 21:43:17 +00:00 |
|
rofl0r
|
5fd693a4a2
|
ppf: remove more unneeded stuff
|
2019-01-18 19:55:54 +00:00 |
|
rofl0r
|
d926e66092
|
ppf: remove unneeded stuff
|
2019-01-18 19:53:55 +00:00 |
|
rofl0r
|
b0f92fcdcd
|
ppf.py: improve urignore code readability
|
2019-01-18 19:52:15 +00:00 |
|
rofl0r
|
4a41796b19
|
factor out http related code from ppf.py
|
2019-01-18 19:30:42 +00:00 |
|
rofl0r
|
0dad0176f3
|
ppf: add new field proxies_added to be able to rate sites
sqlite3 urls.sqlite "alter table uris add proxies_added INT"
sqlite3 urls.sqlite "update uris set proxies_added=0"
|
2019-01-18 15:44:09 +00:00 |
|
mickael
|
f489f0c4dd
|
set retrievals to 0 for new uris
|
2019-01-13 16:50:54 +00:00 |
|
rofl0r
|
69d366f7eb
|
ppf: add retrievals field so we know whether an url is new
use
sqlite3 urls.sqlite "alter table uris add retrievals INT"
sqlite3 urls.sqlite "update uris set retrievals=0"
|
2019-01-13 16:40:12 +00:00 |
|
rofl0r
|
54e2c2a702
|
ppf: simplify statement
|
2019-01-13 16:40:12 +00:00 |
|
rofl0r
|
2f7a730311
|
ppf: use slice for the 500 rows limitation
|
2019-01-13 16:40:12 +00:00 |
|
mickael
|
7c7fa8836a
|
patch: 1y4C
|
2019-01-13 16:40:12 +00:00 |
|
rofl0r
|
24d2c08c9f
|
ppf: make it possible to import a file containing proxies directly
using --file filename.html
|
2019-01-11 05:45:13 +00:00 |
|
rofl0r
|
ecf587c8f7
|
ppf: set newly added sites to 0,0 (err/stale)
we use the tuple 0,0 later on to detect whether a site is new or not.
|
2019-01-11 05:23:05 +00:00 |
|
rofl0r
|
8b10df9c1b
|
ppf.py: start using stale_count
|
2019-01-11 05:08:32 +00:00 |
|
rofl0r
|
d2cb7441a8
|
ppf: add optional debug output
|
2019-01-11 05:03:40 +00:00 |
|
rofl0r
|
b6dba08cf0
|
ppf: only extract ips with port >= 10
|
2019-01-11 03:29:13 +00:00 |
|
rofl0r
|
122847d888
|
ppf: fix bug referencing removed db field
|
2019-01-11 02:53:16 +00:00 |
|
mickael
|
4c6a83373f
|
split databases
|
2019-01-11 00:25:01 +00:00 |
|
rofl0r
|
087559637e
|
ppf: improve cleanhtml() and cache compiled re's
now it transforms e.g. '<td>118.114.116.36</td>\n<td>1080</td>'
correctly.
(the newline was formerly preventing success)
|
2019-01-10 19:22:21 +00:00 |
|
mickael
|
383ae6f431
|
fix: no uris were tested because commented"
|
2019-01-10 00:21:57 +00:00 |
|
mickael
|
da4f228479
|
discard urls who fail at first test
|
2019-01-09 23:38:59 +00:00 |
|
mickael
|
15dee0cd73
|
add -intitle:pdf to searx query
|
2019-01-09 23:30:55 +00:00 |
|
mickael
|
e94644a60e
|
searx: loop for 10 pages on each searx instance
|
2019-01-09 22:55:55 +00:00 |
|
mickael
|
8993727f03
|
changed regex
|
2019-01-09 20:07:28 +00:00 |
|
mickael
|
33887385f0
|
is_usable_proxy: group the 2 firsts lines
|
2019-01-09 19:23:09 +00:00 |
|
mickael
|
9828db79d4
|
is_usable_proxy(): dont check twice if A < 1
|
2019-01-09 19:11:05 +00:00 |
|
mickael
|
6f0d5c1ffa
|
modify and rename should_i_... function
> remove :port from D
> check if octets are within a correct range
|
2019-01-09 19:01:55 +00:00 |
|
mickael
|
a74d6dfce8
|
do not save invalid IPs
|
2019-01-09 00:42:28 +00:00 |
|
rofl0r
|
6e4c45175e
|
ppf: add safeguards against tor outage
|
2019-01-08 15:48:38 +00:00 |
|
rofl0r
|
1f3179de48
|
ppf: check for valid ports
|
2019-01-08 04:30:50 +00:00 |
|
rofl0r
|
9ccf8b7854
|
ppf: write dates as int
|
2019-01-08 04:19:09 +00:00 |
|
rofl0r
|
38d89f5bd9
|
ppf: add option for number of http retries
|
2019-01-08 03:30:31 +00:00 |
|
rofl0r
|
115c4a56f5
|
ppf: honor timeout
|
2019-01-08 03:25:52 +00:00 |
|
rofl0r
|
f16f754b0e
|
implement combo config parser
allows all options to be overridden by command line.
e.g.
[watchd]
threads=10
debug=false
--watch.threads=50 --debug=true
|
2019-01-08 02:17:10 +00:00 |
|
rofl0r
|
e7b8d526c0
|
ppf: print url if fetching failed
|
2019-01-08 00:46:41 +00:00 |
|
mickael
|
1b3ce72efc
|
add and use combining class
|
2019-01-07 23:19:14 +00:00 |
|
mickael
|
1288dca38f
|
fixme: change var names
|
2019-01-07 21:41:41 +00:00 |
|
mickael
|
aeff09d2b3
|
move math function inside the sql statement
|
2019-01-07 21:11:08 +00:00 |
|
rofl0r
|
898c8f36ee
|
ppf: fix cpu hogs
|
2019-01-07 15:38:51 +00:00 |
|
rofl0r
|
ad7c7fce67
|
ppf: use timeout and only 1 try for http
|
2019-01-07 05:37:44 +00:00 |
|
mickael
|
8b15faf84d
|
ppf: change user-agent; use headers
|
2019-01-06 23:29:30 +00:00 |
|
mickael
|
3223cc82c4
|
use http2.py instead of requests
|
2019-01-06 22:22:42 +00:00 |
|
mickael
|
1a025f102f
|
only load search/bad terms when "search" arg is enabled
|
2019-01-06 18:31:42 +00:00 |
|
mickael
|
5e9f8baf56
|
remove unused imports
|
2019-01-06 18:27:06 +00:00 |
|
mickael
|
64d9da9156
|
sleep even when no proxies are added
|
2019-01-06 02:58:58 +00:00 |
|
mickael
|
63b77043ac
|
minor changes
remove comments, minimal code reorganization
|
2019-01-06 01:35:18 +00:00 |
|
mickael
|
84a1de26c3
|
sqlite: do not create tables with "duration" column
|
2019-01-06 00:50:35 +00:00 |
|
mickael
|
d93f4dcaf2
|
introduce success_count and total_duration (proxylist.sqlit
run those commands to update the database:
sqlite3 proxylist.sqlite "alter table proxylist add success_count int"
sqlite3 proxylist.sqlite "alter table proxylist add total_duration int"
sqlite3 proxylist.sqlite "update proxylist set success_count=0,total_duration=0"
|
2019-01-05 22:24:38 +00:00 |
|