I'm making extensive whois queries in generating spam reports (everyone needs a hobby, right...). Which...is...slow....
...so I was excited to discover "jwhois", a caching whois client. This creates a cache (/var/cache/jwhois/jwhois.db) for previously requested domain. In spam lookups this is convenient as 100 domains accounts for well over half my spam (1403 total domains recorded). Problem is: jwhois only caches lookups where it already knows the server. The trick then, is to seed the cache. I'd already performed a host lookup on some 5000+ spams I've received (since early November! -- and yes, caching DNS helps tons), so the following does the trick: Assuming domains in /tmp/spamdomains-ranked, in the following format (modify recipie to suit): ------------------------------------------------------------------------ 1 345 kornet.net 2 156 freeserve.com 3 148 comcast.net 4 138 rr.com 5 132 guangzhou.gd.cn 6 107 uu.net 7 104 attbi.com 8 95 dacom.co.kr 9 67 pacbell.net 10 64 wanadoo.fr ------------------------------------------------------------------------ for dom in $( # Extract domains from list, get rid of any numeric IPs which # have snuck through. awk '{print $3}' /tmp/spamdomains-ranked | sed -e '/^[0-9]\{1,3\}\.[0-9]\{1,3\}/d' ) do echo -e "\n>>> $dom <<<" # Recursive query. Query the second time, using the # WHOIS server indicated by the first pull jwhois -h $( jwhois -h whois.internic.net $dom | head | grep '^\[' | tail -1 | sed -e 's/[][]//g' -e 's/^$/whois.internic.net/' ) $dom | head -2 ; done; ...that's a serial query, which can bog down on timeouts for any given domain. To speed processing, batch reqeusts, e.g.: step=40 # Number of requests to batch in simultaneous submits for s in $( seq 1 $step $( wc -l /tmp/spamdomains-ranked | awk '{print $1}' ) ) do e=$(( s + step - 1 )) echo "e: $e" for dom in $( awk '{print $3}' /tmp/spamdomains-ranked | sed -e '/^[0-9]\{1,3\}\.[0-9]\{1,3\}/d' | sed -ne "${s},${e}p" ) do echo -e "\n>>> $dom <<<" jwhois -h $( jwhois -h whois.internic.net $dom | head | grep '^\[' | tail -1 | sed -e 's/[][]//g' -e 's/^$/whois.internic.net/' ) $dom | head -2 done & wait; done Alternatively, sleep for 5-20 seconds between batches rather than 'wait'ing. What I don't have is a way to periodically repeat this seeding, which would be useful, though using the recursive lookup in scripts could satisfy most needs. Peace. -- Karsten M. Self <[EMAIL PROTECTED]> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
signature.asc
Description: Digital signature