diff --git a/ROADMAP.md b/ROADMAP.md index b0b49ee..952fd1a 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -37,11 +37,12 @@ Profiling-driven optimizations and source pipeline hardening. | Item | Status | Description | |------|--------|-------------| | Extraction short-circuits | done | Guard clauses in fetch.py extractors | -| Skip shutdown on failed sockets | pending | Avoid 39s/session wasted on dead connections | -| SQLite connection reuse (odin) | pending | Cache per-greenlet, eliminate 2.7k opens/session | -| Lazy-load ASN database | pending | Defer 3.6s startup cost to first lookup | -| Add more seed sources (100+) | pending | Expand beyond 37 hardcoded URLs | -| Protocol-aware source weighting | pending | Prioritize SOCKS5-yielding sources | +| Skip shutdown on failed sockets | done | Track _connected flag, skip shutdown on dead sockets | +| SQLite connection reuse (odin) | done | Per-greenlet cached handles via threading.local | +| Lazy-load ASN database | done | Defer ipasn.dat parsing to first lookup | +| Add more seed sources (100+) | done | Expanded to 120+ URLs with SOCKS5-specific sources | +| Protocol-aware source weighting | done | Dynamic SOCKS boost in claim_urls scoring | +| Sharpen error penalty in URL scoring | done | Reduce erroring URL claim frequency | ## Phase 2: Proxy Diversity and Consumer API @@ -79,6 +80,13 @@ Worker-driven link discovery from productive pages. | Item | Date | Description | |------|------|-------------| +| Sharpen URL error penalty | 2026-02-22 | error*0.5 cap 4.0 + stale*0.2 cap 1.5 | +| SOCKS5 source expansion | 2026-02-22 | Added 10 new SOCKS5-specific sources | +| SQLite connection reuse | 2026-02-22 | Per-greenlet cached handles via threading.local | +| Lazy-load ASN database | 2026-02-22 | Deferred ipasn.dat to first lookup | +| Socket shutdown skip | 2026-02-22 | _connected flag, skip shutdown on dead sockets | +| Protocol-aware weighting | 2026-02-22 | Dynamic SOCKS boost in claim_urls scoring | +| Seed sources expanded | 2026-02-22 | 37 -> 120+ URLs | | last_seen freshness fix | 2026-02-22 | Watchd updates last_seen on verification | | Periodic re-seeding | 2026-02-22 | Reset errored sources every 6h | | ASN enrichment | 2026-02-22 | Pure-Python ipasn.dat reader + backfill | diff --git a/TASKLIST.md b/TASKLIST.md index 8a6c0c8..bf588e3 100644 --- a/TASKLIST.md +++ b/TASKLIST.md @@ -8,16 +8,11 @@ Active execution queue. Ordered by priority. | # | Task | File(s) | Notes | |---|------|---------|-------| -| 1 | Skip socket.shutdown on failed connections | rocksock.py | ~39s/session saved on workers | -| 4 | Add more seed sources (100+) | dbs.py | Expand PROXY_SOURCES list | -| 6 | Protocol-aware source weighting | httpd.py, ppf.py | Prioritize SOCKS5-yielding sources | ## Queued | # | Task | File(s) | Notes | |---|------|---------|-------| -| 2 | SQLite connection reuse on odin | httpd.py | Cache per-greenlet handle | -| 3 | Lazy-load ASN database | httpd.py | Defer to first lookup | | 12 | API filters on /proxies (proto/country/ASN) | httpd.py | Consumer query params | | 8 | Graduated recheck intervals | proxywatchd.py | Fresh proxies checked more often | @@ -25,6 +20,13 @@ Active execution queue. Ordered by priority. | # | Task | Date | |---|------|------| +| - | Sharpen URL error penalty scoring | 2026-02-22 | +| - | Add SOCKS5-specific sources (10 new) | 2026-02-22 | +| 3 | Lazy-load ASN database | 2026-02-22 | +| 2 | SQLite connection reuse on odin | 2026-02-22 | +| 1 | Skip socket.shutdown on failed connections | 2026-02-22 | +| 4 | Add more seed sources (100+) | 2026-02-22 | +| 6 | Protocol-aware source weighting | 2026-02-22 | | - | Extraction short-circuits | 2026-02-22 | | - | last_seen freshness fix | 2026-02-22 | | - | Periodic re-seeding | 2026-02-22 |