Saturday, December 03, 2011

You really are already retired...

I had an interesting conversation the other night with the stray cat. On May 21st 2014 he turns 59 1/2 and will "consider" retiring "depending on the situation." I have news for you, you have already retired in place.

Another part of the conversation dealt with a situation we are having with one part of our system. We (very badly) support a remote duplication of our financial system for the purpose of failover in case the primary fails. We have numerous problems with this system, and 18 months ago there was a series of planning meetings to determine what areas we would work to improve it. I had pushed for enhancing the ability to stop and restart the system at known data points, increasing its flexibility and providing better recovery to network issues. The support people insisted this was not necessary, and we should switch the data transport over to a system know as NFS. My contention was that the current system was faulting out because of a bad network, and switching to NFS was not going to solve the underlying problem. The support people stated they had an example of a client running both the existing TCP and an NFS data mount they put in place, and the TCP solution faulted out their NFS based backup they kept on working. In the end we went with changing over to use a new NFS based solution. After 18 months of development, testing, and deployment, (over 1200 hours in total effort) the results are in. Indeed the NFS based solution stays up and running when there are network issues, however it quietly corrupts the data without reporting any issues.

What does this have to do with the stray cat? When we were discussing the issue, he thought that the NFS was the proper solution, because from an application level it was the easiest to implement. My thought is that it does not matter how 'easy' it is to implement, if the transport ends up corrupting the data it is worthless. My other thought is that we are now worse off in that with the TCP solution, the clients knew there was a problem because when it failed it announced the fact with an email and/or page. Now, the data is corrupted, and we don't know about it until we need it. He also brought up another issue with the software, not realizing that I had fixed it over 2 years ago. This is why I consider him retired in place, retired people bore you with stories about things that happened in the past, regardless of how applicable they are to today.

0 Comments:

Post a Comment

<< Home