During the earlier days of collectl development a third design consideration was minimizing the amount of storage used for raw files. At the time disks were smaller and the amount of data collected was small enough that being more judicious about what was saved felt like it made a difference. However, since the inclusion of process, slab and now interrupt data along with larger disks the amount of storage used has become less of a consideration. In other words don't spend extra data collection cycles trying to be more selective about what is recorded if it's going to add to the overhead.
# time collectl -scdnm -i0 -c8640 -f /tmp real 0m9.711s user 0m7.480s sys 0m2.140sand you can see that collectl uses about 10 seconds out of 86400 or about 0.01% of the cpu to collect cpu, disk, network and memory data. If we repeat the test again for just cpu the time drops to under 4 seconds, so you can see if performance is really critical, you can improve things by recording less data or maybe just do it less frequently. The point is these are tradeoffs only you can make if you feel collectl is using too much resource.
So what happens if you take a different processing path and save collectl data in plot format? This means adding the additional overhead of parsing the /proc data and performing some basic math of the values. If we use the same command as above and include -P:
# time collectl -scdnm -i0 -c8640 -f /tmp real 0m20.607s user 0m17.970s sys 0m2.580swe can see that this takes a little over twice as much overhead even though it is still pretty low.
One other example worth mentioning and is process monitoring overhead, which is the highest overhead operation collectl can do and one of the reasons it has its own monitoring interval. The overheard for collection of this type of data can vary quite broadly depending on how may processes are running at the time and on a system with only 138 processes look at this:
# time ./collectl.pl -sZ -i0 -c8640 -f /tmp real 1m8.453s user 0m54.650s sys 0m13.430snoting collectl is also smart enough to only look at 1/6 as many samples of process data since that is the default relationship of process monitoring to other subsystem data. This also leads to the mention of a way to further optimize process monitoring. If you are monitoring a specify set of processes, say http daemons, collectl no longer has to look at as much data in /proc and so we now see:
# time ./collectl.pl -sZ -Zchttp -i0 -c8640 -f /tmp real 0m5.721s user 0m4.480s sys 0m1.180sIn fact, if we know there are never going to be any new http process appearing (collectl looks for new processes that match selection strings by default):
# time ./collectl.pl -sZ -Zchttp -i0 -c8640 --procopts p -f /tmp real 0m5.080s user 0m3.930s sys 0m1.130sAnd things get even better as you could even image monitoring these processes at the same interval as everything else with almost no additional overhead!
Another wrinkle - process I/O statistics
Since the inclusion of process I/O stats, collectl now has to read an addition set of data for each process, specifically /proc/pid/io, which adds over 25% to the total process data collection load. For most users, collectl's overhead is reasonable low enough for this extra overhead to not be a problem. Remember - we're talking about an increase of 25% to a relatively small number. But for those concerned with this extra overhead, a new --procopt value of I has been added to V3.6.1 which suppresses the reading of process I/O stats.
| updated Jan 15, 2012 |