Thursday, April 24, 2025

Audacious Goals - Fail!

To follow-up on my Audacious Goals post the client that went to RFP gave notice in March of 2023, about 8 months after that post, that they had selected our newest competitor for their core solution.  They will be moving to that core in about 48 months - sounds better than 4 years - from the notice date so that is about 2 years out from now.  For me this was no surprise, I provided some of the reasons why they had not been happy with our service and those reasons go back about 8 years prior to their leaving.  Right up until they gave formal notice to us management was scrambling, starting work and moving people around to do it, to try and save them.  Given the long standing grievances the client had and their perception about the limitations of our software I knew that they were leaving our platform, it was just a matter of where they landed.

Timeline of events I was involved with:

2017 January

Our current performance improvement project has notified the client that they will not be part of the first round of efforts but will be part of the second round, in difference to a different client.  We know we have daytime performance issues with this client that need to be addressed.  (These issues would be the root cause of a number of crisis situations with them in the coming years, with a badly implemented fix causing a crash, and a proper fix implemented mere months before they formally notified us of them leaving.)

2017 July 

The client has experienced 30-45 second delays (verified by us) accessing the Loan records interactively.  This was traced to sub-optimal code that was accessing 10,000 parameter records for each of a number of fields on the Loan record.  This occurred in combination with the core DBMS process being bottlenecked due to some other bad processing.

2017 August

Client has been experiencing multiple issues off and on for over a month.  The issues and our response became so acute their CEO was in contact with our division president and the parent company C level executives.  There were several concurrent issues the most egregious was when a few members logged in to the home banking they we show information for a different member.  In the same timeframe to this we had multiple down events and multiple slowdowns.  

2017 Fall (Don't have the date)

Due to a change in prioritization, the performance improvement project was shut down. I turned to my direct manager during this meeting and told him we still had issues at the client and we need to investigate them to determine their root cause.

2018 October

Client has been experiencing recurring issues with their ATM processing slowing to a relative crawl.  On multiple nights, the ATM requests started slowing down taking 2 to 3 seconds to process each request.  This does not sound like much but due to request volume and queuing this was resulting in overall turnaround times for a request that eventually built up to a 30 second queue.  After much diagnosis it was found there was one Account involved in fraudulent activity that had 100 active share records, 87 active card records, and 14,400 active transaction hold records.  Some sub-optimal coding was having to sift through that multiple times for each transaction and in combination with the processing mode the system was in at the time resulted in the backlog.  The fraud came in in that when the overall turnaround time hit 15 seconds, the ATM network went into 'stand-in' mode and when that happened all 87 cards were used simultaneously to withdraw money across the entire ATM network.  Ouch.

2019 July

Its baack . . . The slowdown accessing the Loan records has reoccurred.  There was an initial project to stopgap the issue with the intent to follow-up with a full project to minimize these reads across the system.  The latter was not implemented.

2020 May

The client crashed during their afternoon rush and through the teller and branch end of day processing.  The short of this is that a fix for an issue that was mostly specific to them, but did occur sporadically for other large clients, was improperly implemented, a workaround for that created, but when the release with the fix was installed by the client the workaround was forgotten to be implemented, resulting in them crashing.  They were pissed (no other word for it) at us and rightfully so.

2020 October/November

The client hired a new person to interface with our company.  Her attitude seemed a little off in the first couple of meetings.  By early November our CSR for their company, our CSR manager, and their interface person were having meetings where they were talking past each other.  Their interface person wanted to examine the defect trends and our CSR manager only wanted to talk about the absolute backlog.  This person and their attitude should was an indicator of how unhappy they were with our service.

2021 March

The client ended up implementing a business change to work around the area that caused their crash in May of 2020.  Yes 8 months later and we still have not implemented a fix for this area and the client had to change instead of us fixing it.

A second issue involved running of concurrent posting programs.  This works correctly but in combination with a 3rd issue caused a slowdown.  What do we tell them, they can only run one at a time.  We should have investigated this further to determine what the 3rd item was.

2021 April

The client is now 2 full years behind installing our software for production and will install 2 yearly releases back to back.  From the meeting I was in it is clear to me they are scared of our software now.  This is also my first note that they are looking to replace us.

2021 October

We are having a series of meetings with the client over the last 6 months.  In this month's meeting the client stated the are going to review what processes and/or functions within our software impact the DBMS process.  In other words, our client is going to research our software to determine what is wrong with it.  This whole thing is a reaction to their crashes over the prior year, most of which were preventable with some good decisions (like not deferring maintenance on known broken items).

2021 December

I wrote the following about a status meeting, "The opening comments regarding the business changes that <the client> has made in an attempt to reduce their daytime load was just devastating: They have held their branch expansion (c o v i d impacted this); They have shut down their day-time automated submission system and are down to process that are required for operations; They are considering shutting down part of our off-host integration service to reduce load"

We found out that an important performance improvement configuration had not been implemented on their primary production server.  This was a project based on hard one (deep investigation) knowledge that produces a 10-20% performance improvement under normal load and, most importantly for this client, improves even more on highly loaded systems.  This performance improvement setup should have been put in place when they installed their current system back in mid 2020.

2022 January

Implemented the performance improvement configuration change.  There is another performance improvement change available at the system level.  This change recovers processor performance we are 'leaving on the floor' according to the systems expert we brought in (at no small expense).

2022 February

The end of the line for the current crisis.  Not sure how much of the many items for improving the client's system performance were implemented in the end.  For sure we had them adjust their processing to work around the issues they had run up against.

2022 September

Per an in-house conversation, the client feels we "have a gun to their head" concerning them converting to our new platform.  Also they have not made a decision about what platform they will be moving to.  They are doing a Proof Of Concept with our newest competitor.

2023 January

The client is finally testing the proper fix for what crashed them in May of 2020.  It only took us 3 years to get a 12 line fix in place.  And we are only doing it now to try and save them as a client.

2023 March

The client formally notified us that they are going to our competitor.  No surprise here.

Wednesday, April 23, 2025

Now it makes sense

Its been a while, been a bad year personally and professionally, but I'll be posting here semi-reguarlly going forward.  My company has gone through quite a lot of upset, almost all of it self inflicted.  But today there was a meeting of the Product Development staff with all hands present.  Half the people in the room I did not recognize which shows the level of turnover in the last 2 years.  Until today I did not understand why work levels had been pushed so hard resulting in that turnover.

I'll be putting out a number of posts in the coming days showing what has been going on.  But here is my take on it.  Our parent company is putting together a platform that is intended to support both Banking and Credit Union core operations.  All well and good, if properly executed.  Until today I understood that to mean that it would function as a front-end to our existing core products (both the Banking and Credit Union back-end).  Today it was stated that the new platform will replace all of the core functionality for the existing Banking and Credit Union systems.  And it was discussed that we will be transitioning to the new system with the timeline bandied about of 5 years.  However what was actually discussed was that the new platform should be ready to support production clients in a minimum of 5 years and at that point we would be able to start transitioning our clients to that platform.  All of the decisions I will be showing as questionable make sense, if you assume our software only has to last 5 years or so and the existing staff will not be brought forward to work on the new system.  You don't have to care if the existing system gets a bad reputation, we will have a new system to show the clients.  You don't have to worry about burning out the existing staff because we are not going to be retained for the new system.

Now it makes sense.  Some of the stupid things that have been done based on the idea we only have to continue to develop for 5 years or so on our current platform will follow in the coming days.

Note, I said to show the clients.  I didn't say it would be usable for the majority of them, especially the largest ones, after that 5 year time span.  The reason is that automation and customizations that don't make sense for small operations become nearly mandatory for large operations.  This is for both Banks and Credit Unions.  These automations and our ability to support customizations are what makes our Credit Union software attractive to our largest clients.  There is also a non-trivial 3rd party ecosystem that exists for our Credit Union software.  We also have to review all of the custom coded interfaces that we have built for our clients, some of which are one offs that will need to be retained.  All of this will need to be reviewed and re-implemented to transition each client from the old system to the new system.  Also we need to train the CUs staff on the new system and ensure there is adequate support from us to answer questions post transition.  All of this for each of our 700 clients.  If we transition 1 per day that would take 2 1/2 years to complete.  Any way I look at this I'm thinking from the day the new system is production ready, it is only in demo mode for a very limited set of functionality today, if we convert all 700 of our clients within a 5 year time span that will be a decent result.  Optimistically it is 5 years to get to that initial production stage so at a bare minimum this is a 10 year undertaking until we can retire our current core system.  Pessimistically call it 15 years.

So the idea that we only have to keep developing on our existing software for 5 more years and that is what we can plan to is nonsensical.

But at least what has been going on now makes sense.