HttpClient4基于shadowsocks-netty的Socks代理

前言
最近想批量下载一些国外网站的视频,之前写过一个代理程序shadowsocks-netty,打算直接
用它来当作客户端代理程序,而HttpClient4也支持Socks代理;所有准备用HttpClient4来访问国外网站和视频资源

HttpClient4版本

<dependency>
	<groupId>org.apache.httpcomponents</groupId>
	<artifactId>httpclient</artifactId>
	<version>4.3.6</version>
</dependency>

访问网站
设置代理ip和port分别是:localhost和1080
访问国外网站hostname为:www.google.com
具体代码如下:

public class ClientExecuteSOCKS {

	/** 代理参数 IP+PORT **/
	private static String PROXY_IP = "localhost";
	private static int PROXY_PORT = 1080;

	public static void main(String[] args) throws Exception {
		Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
				.register("http", new MyConnectionSocketFactory()).build();
		PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(reg);
		CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
		try {
			InetSocketAddress socksaddr = new InetSocketAddress(PROXY_IP, PROXY_PORT);
			HttpClientContext context = HttpClientContext.create();
			context.setAttribute("socks.address", socksaddr);

			HttpHost target = new HttpHost("www.google.com", 80, "http");
			HttpGet request = new HttpGet("/");

			System.out.println("Executing request " + request + " to " + target + " via SOCKS proxy " + socksaddr);
			CloseableHttpResponse response = httpclient.execute(target, request, context);
			try {
				System.out.println("----------------------------------------");
				System.out.println(response.getStatusLine());
				String htmlStr = EntityUtils.toString(response.getEntity());
				System.out.println(htmlStr);
			} finally {
				response.close();
			}
		} finally {
			httpclient.close();
		}
	}

	static class MyConnectionSocketFactory implements ConnectionSocketFactory {

		public Socket createSocket(final HttpContext context) throws IOException {
			InetSocketAddress socksaddr = (InetSocketAddress) context.getAttribute("socks.address");
			Proxy proxy = new Proxy(Proxy.Type.SOCKS, socksaddr);
			return new Socket(proxy);
		}

		public Socket connectSocket(final int connectTimeout, final Socket socket, final HttpHost host,
				final InetSocketAddress remoteAddress, final InetSocketAddress localAddress, final HttpContext context)
				throws IOException, ConnectTimeoutException {
			Socket sock;
			if (socket != null) {
				sock = socket;
			} else {
				sock = createSocket(context);
			}
			if (localAddress != null) {
				sock.bind(localAddress);
			}
			try {
				sock.connect(remoteAddress, connectTimeout);
			} catch (SocketTimeoutException ex) {
				throw new ConnectTimeoutException(ex, host, remoteAddress.getAddress());
			}
			return sock;
		}
	}
}

以上代码是Httpclient提供的实例,稍作修改;
先启动shadowsocks-netty
然后运行ClientExecuteSOCKS

1.结果报如下错误:

I/O exception (org.apache.http.NoHttpResponseException) caught when processing request to {}->http://www.google.com:80: 
The target server failed to respond

可以观察shadowsocks-netty的服务器端shadowsocks-netty-server,有如下日志:

org.netty.proxy.ClientProxyHandler$2 - connect fail host = 67.15.129.210,port = 80,inetAddress = /67.15.129.210

域名解析后的ip地址连接失败,多次试验ip地址是会变动的,导致有时候能成功,有时候失败;

针对此问题可以直接使用域名访问,代码做如下修改:

sock.connect(remoteAddress, connectTimeout);

将如上代码改成:

sock.connect(InetSocketAddress.createUnresolved(remoteAddress.getHostName(), remoteAddress.getPort()),connectTimeout);

2.重新运行,报如下错误:

Caused by: org.apache.http.ProtocolException: The server failed to respond with a valid HTTP response
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:151)
	at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
	at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260)
	at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:161)
	at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:153)
	at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271)
	at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:86)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
	... 2 more

通过debug进入DefaultHttpResponseParser的parseHead方法中发现,每次读取http协议的状态行是” HTTP/1.1 200 OK”,有两个空格cunz,
导致比对失败,经分析发现是在Socks四次握手的时候没有将握手数据读取干净,导致后面的真实数据出现脏数据;
分别看netty提供的SocksCmdResponse和jdk中的SocksSocketImpl类:

public void encodeAsByteBuf(ByteBuf byteBuf) {
        byteBuf.writeByte(protocolVersion().byteValue());
        byteBuf.writeByte(cmdStatus.byteValue());
        byteBuf.writeByte(0x00);
        byteBuf.writeByte(addressType.byteValue());
        switch (addressType) {
            case IPv4: {
                byte[] hostContent = host == null ?
                        IPv4_HOSTNAME_ZEROED : NetUtil.createByteArrayFromIpAddressString(host);
                byteBuf.writeBytes(hostContent);
                byteBuf.writeShort(port);
                break;
            }
            case DOMAIN: {
                byte[] hostContent = host == null ?
                        DOMAIN_ZEROED : host.getBytes(CharsetUtil.US_ASCII);
                byteBuf.writeByte(hostContent.length);   // domain length
                byteBuf.writeBytes(hostContent);   // domain value
                byteBuf.writeShort(port);  // port value
                break;
            }
            case IPv6: {
                byte[] hostContent = host == null
                        ? IPv6_HOSTNAME_ZEROED : NetUtil.createByteArrayFromIpAddressString(host);
                byteBuf.writeBytes(hostContent);
                byteBuf.writeShort(port);
                break;
            }
        }
    }

SocksSocketImpl类connect方法部分代码如下:

switch (data[1]) {
        case REQUEST_OK:
            // success!
            switch(data[3]) {
            case IPV4:
                addr = new byte[4];
                i = readSocksReply(in, addr, deadlineMillis);
                if (i != 4)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
            case DOMAIN_NAME:
                len = data[1];
                byte[] host = new byte[len];
                i = readSocksReply(in, host, deadlineMillis);
                if (i != len)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                data = new byte[2];
                i = readSocksReply(in, data, deadlineMillis);
                if (i != 2)
                    throw new SocketException("Reply from SOCKS server badly formatted");
                break;
        ......
}

shadowsocks-netty返回的addressType为DOMAIN类型,会发现写入的数据格式和读取的格式不一致,导致产生脏数据;

此问题可以修改shadowsocks-netty返回的addressType为IPV4类型,具体代码在SocksServerConnectHandler中:

	private SocksCmdResponse getSuccessResponse(SocksCmdRequest request) {
		return new SocksCmdResponse(SocksCmdStatus.SUCCESS, SocksAddressType.IPv4);
	}

修改之后运行正确结果如下:

HTTP/1.1 200 OK
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's ...具体网页内容省略...</body></html>

下载视频
下载视频的部分代码如下:

public class ClientExecuteSOCKS2 {

	/** 代理参数 IP+PORT **/
	private static String PROXY_IP = "localhost";
	private static int PROXY_PORT = 1080;

	public static void main(String[] args) throws Exception {
		Registry<ConnectionSocketFactory> reg = RegistryBuilder.<ConnectionSocketFactory>create()
				.register("http", new MyConnectionSocketFactory()).build();
		PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(reg);
		CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(cm).build();
		try {
			InetSocketAddress socksaddr = new InetSocketAddress(PROXY_IP, PROXY_PORT);
			HttpClientContext context = HttpClientContext.create();
			context.setAttribute("socks.address", socksaddr);
			HttpGet request = new HttpGet("http://xxxxxxx.mp4");
			CloseableHttpResponse response = httpclient.execute(request, context);

			InputStream is = null;
			OutputStream os = null;
			try {
				System.out.println(response.getStatusLine());
				is = response.getEntity().getContent();
				System.out.println(response.getEntity().getContentLength());
				os = new FileOutputStream(new File("D:\\tmp.mp4"));
				byte tmp[] = new byte[1024];
				int l;
				while ((l = is.read(tmp)) != -1) {
					os.write(tmp, 0, l);
				}
				os.flush();
			} finally {
				if (response != null) {
					response.close();
				}
				if (is != null) {
					is.close();
				}
				if (os != null) {
					os.close();
				}
			}
		} finally {
			httpclient.close();
		}
	}
}

总结
下载国外视频有很多种方式,比如浏览器插件,本文依赖客户端Socks5代理程序,使用Httpclient4进行资源下载,更容易自动化和可控性;
本文主要用于学习使用。