This is one adventure about which I learnt the best part through some reliable sources. As I was not actually involved in the best part of this story, there could be flaws in what I state. There are 2 parts to this story. I first relate the part where I was fully involved and thus have complete knowledge. Before we get to this part, I will delve on a bit of the history.
Reliance (the story refers to Reliance Telecom Limited which was later merged with Reliance Communications Limited) used our (Siemens Information Systems Limited) system SmartPay and GABS for their Pre-Paid and Post-Paid operations. SmartPay was interfaced with eServ IN, where the charging of the calls would happen. An IN will allow a pre-paid mobile user to make any call or send any SMS or make any other usage only if there is enough balance in the IN against the user account. To maintain enough balance in the account, the pre-paid mobile user needs recharging the account from time to time. In the Reliance setup, a pre-paid mobile user could recharge the account by buying vouchers from the market and recharge using the IVRS or the USSD. Using the IVRS and USSD, the pre-paid user could recharge the account themselves without need for any help. The pre-paid mobile user could also visit a Reliance Store and recharge the account by making appropriate payments across the counter, or could send a SMS with the voucher information to recharge the account.
I will get a bit technical here. Please do not run away as we will return very soon from this diversion. The way an IVRS or a USSD works is that we have IVRS or USSD servers and applications, which are machines devised for Computer Telephone Interface (CTI). These machines receive the signals from the mobile phones, which are requests like recharge my account using the provided voucher number and secret code. It interprets the message and forwards it to the systems (generally called back-end systems) where the user request is processed. In the case of Reliance, we had provided a socket server, which would receive the message from the IVRS and/or the USSD and do the needed validations and if everything were okay, it would update the IN to record the increase in the user account balance and/or validity.
During the delivery of the SmartPay-eServ IN integration project, we had to change the socket server, as we needed interfacing with the new eServ IN. We took up this task towards the end of the project, as this was not a major development effort. Also, the Mediation Team was developing the most complex part of interfacing with the eServ IN. Subrata was supposed to change the socket server. When the time came, he refused to program it as he found that the interface provided by the Mediation Team was not as per his expectations and he reasoned that we would end up providing an inoperable solution. I was in total fix and requested Somenath to develop it. Somenath tried for 2 days and he was unable to make any headway. So, I had no choice but to develop it myself. The existing code was in Visual Basic and I found it to be beyond my comprehension to understand the code. Time was running out. So, I gathered all the requirements through whatever documents existed and from Subrata. I understood the library developed by the Mediation Team and installed the library on my machine. Equipped with this, I developed the socket server afresh using Visual BASIC .Net in next 3-4 days. It was a great fun as I was the boss sitting and writing the program, while all my team members flocked around me and advised me what I needed doing and boosted my morale like they were cheering a struggling long distance athlete. We tested this program over the next 2 weeks. This was a very critical part as there was huge financial impact from this part. We had a simulator in our office and fired requests and it got processed appropriately through the socket server.
When it was time, we deployed the socket server along with the rest of the components. When the system went live, most of the components were working. However, I soon got complaint that the socket server was going down every half hour. I went to Reliance to study the problem and found that the program steadily used more memory, and when the memory was exhausted it crashed. So, there was a memory leakage in the program. I got down to debug the program. It was a huge credibility issue as I had made some very bold decisions during the project. I lost my sleep and was in the office for most of the day and night trying out different tricks using all the knowledge I had. In the meanwhile, I assigned Abhijit and Sivajit to work in two 12 hours shifts at the Reliance office. Their job was to restart the socket server every half hour so that the impact was reduced. I kept sending updates through Somenath to install. However, the problem would not go away. Reliance escalated the matter to our Management, as there was no solution for 1 week. Mr. Neeraj Vyas, our General Manager, came to Kolkata. I declared that I had checked everything in my program and could not find anything. So, it would be prudent to also check the library provided by the Mediation Team. Mediation Team said this was not essential as there was nothing wrong with the library. It was my greatest piece of luck that day that Mr. Vyas told the Mediation Team to get their program code in a laptop to my workstation. I opened my program as well. All of us, including Mr. Vyas, were going through the programs line by line. Soon, we found that there was a memory leak being produced by the library. The mystery was solved. Mediation Team was told to fix the problem. However, this session was enough for me to see what was inside in the library.
I got my team together that night. With everyone surrounding my workstation again, I modified the socket server to remove the library from the Mediation Team and replaced it with our own code to handle that function. We tested the program through the night. Next afternoon, we deployed our new program without letting anyone know about it. For the next 2 days we studied the system. There was no more need for restart to be applied and the memory usage was just as required. In the meanwhile, the Mediation Team fixed their library and gave it to me. I told them that I would make the needed changes to my program and test it for 3 days and then deploy the solution. Everyone agreed. After 1 week, I requested Reliance to send a mail about the IVRS performance. Reliance sent a mail stating that the traffic processed by the IVRS was increasing and there was no breakdown. So, everyone was happy.
My boss wanted a detailed analysis of the issue and the solution. Now, I had to tell everyone what was actually running in Reliance. Mediation Team was very disappointed and they did an analysis and came up with a report that the solution provided was not sustainable. Here was another challenge. So, I asked them to quantify, taking into consideration the expected increase in traffic, the number of days the solution could function. I did not receive any response and the matter died a silent death and the socket server continued to live. This was helped by the fact that there was no complaint from Reliance.
All this time, the recharge through SMS option and the option through SmartPay GUI was serviced through a different program. As we had to support 2 sets of programs, I got the SMS handler and SmartPay GUI modified to use the socket server. Now, the socket server was handling more load and there was no performance or reliability issue. I was very happy. However, this was a very bad decision, which I realized later. Though the socket server performed as required, I had created a single point of failure.
Changes were required to the socket server when we introduced USSD in Reliance. As USSD provided more possibilities, new features needed introducing. Also, much bigger load needed handling by the socket server. We updated the socket server. By this time, Somenath had become an expert on the socket server. There were no issues with the new deployment. Everything was going on smoothly and we were engaged in other activities.
With all this history behind us, I come to the first part of my story. One evening, at around 5PM, I received a call from Reliance saying the socket server was not working and recharge function was fully down. This is a major issue for a Telecom Operator as an outage of recharge function for even 10 minutes results in huge revenue loss. Somenath went to Reliance and soon reported that the interface to eServ IN was failing, as eServ IN was not processing our requests. We checked our log and found nothing unusual in our requests to eServ IN. Then, Vamsi, from eServ, informed that they had deployed a fix that day and they had forgotten to tell us that there was a change in the PI command for recharge. We informed Reliance that there were 2 options – 1) rollback the eServ IN fix, which was very expensive, 2) we could change our system in about 2 hours. Reliance opted for the second option and we were soon in our office modifying our socket server. We deployed our solution and observed the system for some time. Everything was back to normal and again everyone was happy.
While all this drama was being played in Kolkata, Reliance offices were getting smashed in Siliguri and the whole of North Bengal. This was because most of the Reliance customers were not able to recharge for over 3 hours. The customers were extremely agitated because they wanted to vote for their candidate in the Indian Idol Television show and were unable to do so as they did not have the needed balance in their accounts and were unable to recharge. The Indian Idol final was scheduled for the next day and the final had 2 contestants – one was Prashant from Siliguri (who was a Police Constable) and the other was from Assam. This was the most crucial stage to send the votes. The state administration restored law and order eventually. Next day, the final show was telecast and in very tense final moments, Prashant won the Indian Idol crown.
My reliable source told me that what had transpired after the system was restored was that Reliance had written a program which pumped in SMS votes in the system to compensate for the time when the people of North Bengal could not vote. The program must have been super efficient as the greater weight for the winner choice is given to the public votes.