How to use unix commands to troubleshoot network connection problems

Back>

In the era of internet of things, the skills of trouble-shooting network connectivity became more and more import. Due to its small footage and reliability, linux system are the most popular operation systems on numerous web application servers, docker images, AWS virtual machines, GCP pods, etc.

For example, you got some (pagerduty) alert about the the connection timeout exceptions on one of your web servers, the exception shows the target url http://xyznetwork.blogspot.com/2017/08/xyznetwork-how-to_5.html is not reachable.

DNS lookup

The trouble-shooting start at hostname lookup. You need to know if your dns server is able to solve the hostname part of the url to ip address.

The following command:

nslookup xyznetwork.blogspot.com

will reply you with an ip address or complain that "server can't find xyznetwork.blogspot.com: NXDOMAIN".

nslookup also allows reverse lookup ip address for hostname. As you may already guessed
nslookup 172.217.12.129 won't resolve to xyznetwork.blogspot.com, too many blog urls map to the same ip address, so the ip address won't map to any particular blog url.


dig xyznetwork.blogspot.com

will give you more information about he dns lookup process, including the technical details of the response from the dns servers.

with the trace flag, the dig will reveal the trace log of dns lookup process, including which dns servers were requested and which one of them has the authoritative answer about the ip address.

dig +trace xyznetwork.blogspot.com

dig's flag system also makes it a good scripting command.
For example, the most common dns queries are

  1. A (the IP address), 
  2. TXT (text annotations), 
  3. MX (mail exchanges), 
  4. NS nameservers.
by default, dig performs A query, the following commands will issue other types of queries and the +noall +answer control which part of the information to print to stdout.


dig xyznetwork.blogspot.com MX +noall + answer

=============================================
demo>dig xyznetwork.blogspot.com NS +noall +answer

; <<>> DiG 9.10.6 <<>> xyznetwork.blogspot.com NS +noall +answer
;; global options: +cmd
xyznetwork.blogspot.com. 23 IN CNAME blogspot.l.googleusercontent.com.
demo>dig xyznetwork.blogspot.com MX +noall +answer

; <<>> DiG 9.10.6 <<>> xyznetwork.blogspot.com MX +noall +answer
;; global options: +cmd

xyznetwork.blogspot.com. 2943 IN CNAME blogspot.l.googleusercontent.com.
=============================================

If your DNS servers has no problem of solving the hostname, the next check is to check the ip's reachability.

Routing to the target ip

The simple command ping is the first command we should issue.

ping 172.217.12.129
If the ping replies returned are fast and stable, we at least know the routing from the source ip to the target ip is ok and we don't have firewall dropping the network packets between source ip and target ip.

If the ping didn't go through, there are many possibilities. There is no routing to the ip, firewall is blocking us, the target ip disabled the ping reply, the gateway don't allow ping command to go through, etc. Just mention a few.

As a special note, you can ping the broadcast address to figure out the first hop of the routing process.

ping 255.255.255.255

When ping the broadcast address 255.255.255.255, all the discoverable hosts in your LAN will reply its ip address. One of them could be the network gateway, which is usually your router, one of them is the host you issue the ping command. The rest of them are the other hosts. If you don't want a host to be discovered by its neighbors, you can block the broadcast on the network gateway or configure the host to ignore ping traffic in its firewall.

To know more about the routing, use traceroute command
traceroute 172.217.12.129

The traceroute command will display the route taken by packets across an IP network from your host to the target ip. The ip address the packet traversal will be displayed sequentially. It also shows you how systems are connected to each other, letting you see how your ISP connects to the Internet as well as how the target system is connected. Many routers block traceroute command, making the target system topology invisible to users.

If the ping and traceroute shows there is no route to the target ip, we still can not get conclusion by the results of these 2 commands, since some network nodes might be blocking ICMP port.

However, since our web application previously can connect to the target url, we know for sure that, when everything is working, the http port 80 of the target host must open.

Check port availability

nc -zv xyznetwork.blogspot.com 80

To check if a port is open on a particular host, we can use netcat, the advantage the above command over "telnet xyznetwork.blogspot.com 80" are,

  • telnet command might be disabled, 
  • the nc print the result then exit, so we can scripting it for multiple hosts and ports.
For https connections, the port is 443
issue the following command to check the port https protocol needs:
nc -zv xyznetwork.blogspot.com 443

use openssl we can check the public key of the host server, make sure the target host is what we think it is:
openssl s_client -connect xyznetwork.blogspot.com:443
this command also tested the ssl handshaking process is working between your host and the target host.


At this point, if your dns servers can solve the hostname to target ip address, there is working route from host to the target ip, the port for http or https are open, we have to check application layer.

Check http protocol is working

curl http://xyznetwork.blogspot.com/2017/08/xyznetwork-how-to_5.html

curl https://xyznetwork.blogspot.com/2017/08/xyznetwork-how-to_5.html

Try use curl to issue the http GET/POST command to the target url, if the http webserver application hosted on the target server ip is working, we should get the html code wrapped in http response. In the above example, since the target url is a webpage, GET command is all we need to get the http response back from the webserver.

The curl command displays the plain text html code, that the web browser such as google chrome, firefox used to generate the colorful webpage.

If the curl command can not communicate with the target web server with correct http command (default is GET), headers, protocol, url string, request parameter,  request body etc, then it is time to escalate the issue to the network operation center of your organization.

Your network operation center might reply. Hey, we recently applied new firewall rules, your access to the outside url must go through proxy server, here is the proxy server dev.fancycorpproxy.com, the proxy port is 8080.

curl -x 'dev.fancycorpproxy.com:8080' http://xyznetwork.blogspot.com/2017/08/xyznetwork-how-to_5.html

Then you should try with curl command with proxy, if response come back from xyznetwork.blogspot.com, that could explain the connectivity issue. If the proxy server gives you something like 403 forbidden, please contact fancycorp IT administrator at email blabla, they need to add a new firewall rule or a new proxy ACL.

If your network operation center don't have explanation,  In this case it is google.com...probably you won't get to this problem and probably the other side already known about the issue.

Check your own application

Assuming the application logged the connection timeout is a java application, we need to inspect the network connectivity of the process reporting the issue.

netstat -nulpt | grep java

The netstat command will list all the listening port for a process with java in the name. You can figure out if there are established connections to the target server, if the debug port is opening, or someone is currently connecting to the process via a local connection, which indicates the existence of a reverse proxy setup on the host etc.

If you are worrying about rouge host in your network, a tell-tell check is to use arp -a, this list gives away all the hosts you recently connected to. Do the ips you are connecting have the correct MAC address it suppose to be?

demo>arp -a
openrg.home (192.168.1.1) at f6:4f:5a:4:7b:f2 on en1 ifscope [ethernet]
? (224.0.0.251) at 1:0:5e:0:0:fb on en1 ifscope permanent [ethernet]
? (239.255.255.250) at 1:0:5e:7f:ff:fa on en1 ifscope permanent [ethernet]

In the above example, all the 3 entries are normal:


  • 192.168.1.1 is the gateway. 
  • 224.0.0.251 is the address for the multicast DNS (mDNS) protocol. The mDNS protocol resolves hostnames to IP addresses within small networks that do not include a local name server. It is a zero-configuration service, using essentially the same programming interfaces, packet formats and operating semantics as the unicast Domain Name System (DNS). 
  • 239.255.255.250 This address is used for UPnP (Universal Plug and Play)/SSDP (Simple Service Discovery Protocol) by various vendors to advertise the capabilities of (or discover) devices on a VLAN. MAC OS, Microsoft Windows, IOS and other operating systems and applications use this protocol. Client devices can use this protocol to advertise its capabilities to other devices.


No comments:

Post a Comment