Motivation: The need for improvement

Every three hours a trace is taken at each of 21+ monitors. The current methodology gives us a ninety-second trace, which is then followed by three hours during which our post-processing, analysis, and retrieval may be done.

The ratio of three hours processing time to ninety seconds of trace time may seem like plenty, though as link usage continues to grow. And as the links themselves grew faster, we begin to run into time constraints which may lead to corrupted analyses, corrupted traces, or outright loss of a trace.

On top of our current problems, we are constantly receiving requests for more long traces, on the order of hours instead of minutes. There have also toyed with the possibility for some real-time analyses or continuous traces. Our current tools and methods are not even close to being able to run in real time for any substantial amount of traffic.

Strategy: Reimplementation

Reimplementing updatesAll analyses were previously done in Perl. While Perl is a fine language for prototyping and cross-platform compatibility, it’s efficiency as an interpreted language is less than optimal. Since we do not wish to change our methodology in any significant way, we must improve the performance of our existing methodologies. This can be accomplished either by enhancing the runtime of the analyses on current resources or by improving the resources themselves.

Our resources are many, varying in performance, and cost-prohibitive to update. Also, in resources deployed in the future, even given the improvements in quality and speed of the architectures deployed, we will still face analysis bottlenecks. An improvement in the software seems justified in this situation. I chose to rewrite all of the tools used on passive monitors for analyses, starting with the most time-consuming ones. The prominent place to start was the tsh2ta Perl script, which is a preliminary analysis required for all other analyses and has been known to fail due to resource exhaustion.

Implementation problems

One of the luxuries afforded by Perl is its implicit hash structure. Reimplementing a program which relies on hash’s efficiency is very difficult in C. Especially when adequate, free, standard, and efficient abstract hashtables do not seem to exist for C. After trying out several alternatives, including glib, publib, the possibility of writing my own, and a chained hash implementation by Jeff Brown.

After wading through the API for glib, I discounted it as being too complex for our purposes. Publib looked more promising, but consultations with Jeff lead me away from publib due to implementation issues and the possibility of performance limitations with the datasets I planned to be working with. Eventually, I decided to use Jeff’s chash and asked him to tailor it in a few aspects so that it would suit my application.

My last problem with regards to hash tables was in iterating over them. I found that deleting records from a hash while iterating through hash elements can cause bizarre and unexpected results. I managed a workaround for this problem using a global list.

Initial Performance Findings

My first test was on a huge trace taken at AIX which had caused the Perl script tsh2ta implementation to lock up. The initial results were very promising; the C implementation finished in about 10 minutes where the Perl script on the same machine and load was killed prematurely after approximately 300 minutes.