The customer, a leading global investment bank, runs a variety of bespoke and packaged systems in its financial markets business unit. One of the key applications is a web-based share trading system that it provides to a wide range of its portfolio-managed clients. The system collects real-time price feeds from stock exchanges and enables clients to monitor market movements and place buy and sell orders for immediate execution. Timing is critical for these orders – if the market moves before an order is processed, clients may lose large amounts of money. This system had a long history of unreliability, with the following specific problems:
In addition, the system had failed all disaster recovery tests with negative comment from internal audit.
Youngblood was requested to provide a senior consultant DBA to deal with the various issues, who diagnosed a number of problem areas.
The Windows Performance Monitor provided 1st-level diagnostic information. Many counters had been initiated by in-house staff and other consultants, but the interpretation of these was incomplete. The consultant identified immediately that 10 was critically slow, caused by a poorly configured SAN. SOL wait statistics also indicated inefficient code as well as a defunct indexing strategy. The progressively poorer performance of the system was attributable to poor change control, inadequate testing, and lack of understanding of SOL Server design principles when developing application code. In addition, poorly controlled access to production data had resulted in changes to data that damaged its integrity. There were large volumes of queries that severely affected system performance as well as the client experience.
The Youngblood consultant undertook the following remedial actions:
- Redesigned the 10 subsystem: optimized multipathing and channel utilization, redesigned LUNs on the SAN and balanced file distribution and 10 across LUNs
- Step by step, identified and corrected problem code. This was done purely from a SQL statement point of view: optimised multithreading, minimized hashing and sort warnings, redesigned and implemented a practical and effective indexing strategy with the associated code revisions
- Implemented strict security and access control, not allowing direct access to live data or execution of unapproved queries
- Implemented effective testing and change control processes
- Drafted complete production documentation, including feeds, normal processes, problem resolution, database design, data architecture, etc.
- Designed and fully documented a disaster recovery strategy and process from scratch
- Established effective monitoring and alerting ( a proprietary Youngblood development)
The following were measured:
- Deadlocks and timeouts – None have occurred for the last six months
- Response times -Dropped to between one and two seconds
- Batch window – Reduced from 6 hours to 2 hours
- Query throughput time – Reduced from an average of 2 seconds to 100 milliseconds
- Client satisfaction – The client base has tripled where previously it was declining. (Note that it was not necessary to purchase new capacity) Financial loss and claims – Negligible
- Disaster recovery – 20 second failover time compared to 2 days previously. First time that a DR test was passed – no criticism from internal audit.
- All database events are automatically communicated via SMS to the DBA team
- Hardware and code performance are continuously monitored by automatic processes with real-time alerts via SMS to the DBA team
- Automated health-state monitoring has been implemented (growth, index utilization, etc)
- 100% compliance with extremely demanding SLA : 5 minutes to respond, 10 minutes to correct.
The application of Youngblood’s disciplined approach and deep understanding of Microsoft technology has changed a strategic business offering from being a serious risk, to becoming a strong contributor to the bottom line, and a potent brand-builder for complementary marketing.