Every distributed application works based on a collaborative environment that consists of services and method calls. Moreover, in a distributed environment almost services are hosted on different platforms and far machines that talks through network. Distribution makes some difficulties in the debugging of this kind of applications.
During the development of a J2EE based core banking software, I as a member of the team have had many struggled challenges for debugging. The solution uses some far, distributed components through network that work together using message queuing following SOA disciplines.
DB2 Database, IBM MQ, IBM Websphere and Swing reach client are talking together using a vast of message sending and receiving. Also architecture uses Mule ESB, Spring, JMS and a vast of configuration files. We have found and fixed most of bugs using below simple techniques.
Use a Map. Just draw a symbolic map of the environment that causes the problem. Highlight servers, clients, firewalls, routers, with their specific IP and ports of each one that used in the scenario. May be an IP, a port or a firewall policy has been changed and the problem has arisen. Using Ping and Telnet be sure that service listeners are available at least.
If the service was working and a problem has arisen eventually; then, configuration changing is the most error prone factor.
Interpret Log Files. Review log files and exception messages carefully. Almost every raised exception message points to the problem exactly. In the other hand, logging is not a centralized activity in a layered distributed application. Indeed, each layer or component may logs its exceptions separately. So check them all. Check log files just like a detector and use your imagination to guess the problem.
Go Deeper. Adjust logging level to the proper value to let logger catch more detail messages.
Set Break Points to Watch. Set some break points to watch out what is happening during run time. Be sure messages send and receive by end points correctly. Then, step toward inner layers to find out what happened.
Use Replacements. Replace the component you suspected in with the correct one to find out if they are working probably. For example, you have another available application server, message queue or database use them regard the situation.
Check Configuration Files. Be sure the build routines has done their tasks perfectly. Maven and Ant write values of variables in the compiled version of configuration files. Such as context.xml, web.xml and other xml files. Lack of privilege or wrong configuration may prevent build process to finish its task completely. So check the compiled and built version of configuration files.
Be Patient. In a distributed system, method call doesn't perform as much fast as the standalone application. So check if timeout values are enough or not. Sometimes increasing timeout value is a key to solve the problem.
Do it Faster. Faster compiling and running on lighter machines is the an effective approach to test more situations rapidly. So use lighter application servers and databases. For example, use Tomcat over Websphere; Also, use MySQL rather than DB2 during development.