I recently initiated an idea that led to a team from Pramati Technologies (Antony Raj, Pragadeeswaran, Vasanth, Arun Kumar, and me) participating in the nationwide Hack for Social Good Contest organized by SPI Cinemas (17-18 December 2016). Team 2.0 ended up winning Best Design for our Infracity: Making India’s Roads Better app, and we lost Best Team by a whisker, the judges told us.
The win was no accident. What we did for the hackathon’s 36-hour sprint–after we initially made the cut into the top 15 out of 120 proposals with our pitch–was no different from how I plan and run a project, on a daily basis, as part of my work at Pramati.
Our key differentiator was, simply, results.
In today’s techscape, many engineers get certified as CSM. A vigilante who often goes by the name Scrum Master ensures that the culture of failing fast is embedded into the DNA of every member of the team.
But, from what I see on the job and in the industry, I find it hard to separate aggression from Scrum. The people who think and act like I-know-this-there-is-nothing-new-here are the ones who constantly repeat mistakes over and over! Although many teams say they follow agile, when it comes to delivery, they invariably choke, one way or other.
What we’ve achieved in my team, using our own template for agile, is maximum reduction (~0) in the risk of missing a deadline. You need acceleration right at the start, and a solid scrum master who can think end to end.
Which brings me to my reason for writing. I see a lot of agile teams around me failing, though they may have completed CSM certification and worked many years in Scrum.
So, what do we do that works? Here is a closer look at the carefully conceived and implemented model we’ve been following for the past 2+ years which has had a 40+ successful release streak! Any team can adopt this template with few tweaks.
Let me begin with a high-level overview of what happens in a given sprint timeline:
The actual sprint happens between the two green circles (below). A few events occur within a sprint and outside the sprint. For instance, the IPM (Iteration Planning Meeting) happens just a day before the sprint starts, on the day the previous sprint ends. The Retrospective Meeting happens a day before the actual sprint ends. There are a series of events on this timeline as discussed in detail below.
Although most teams working in the agile mode of development would be well-versed in all this jargon, I wanted to emphasize exactly how we operate our sprints.
The following are the typical events/meetings that take place without fail in every sprint.
- Test Case Review
- QA Meet (Twice a sprint)
- Code freeze
Let us dive into what happens during each of these meetings, keeping in mind where exactly these meetings appear in the timeline. The timing is key and has evolved over a period of time.
A new user story is presented in front of the team for the first time by the Product Owner. During this phase, the product owner explains the need for the story from a business standpoint and the priority of the same. The developers and QA actively go through the story and clarify their doubts and concerns. Based on the initial set of questions from the team, we decide whether the story can be moved to the “Already Groomed” phase or if it requires more vetting from the Product back to business. When a story goes into the “Already Groomed” state, it becomes ready to be estimated and get picked up as a work item during the Planning Meeting. Mandatory attendees: PM/Dev/QA.
IPM or Iteration Planning Meeting is the next phase in the sprint lifecycle. Here the sprint goal is defined at the start by the Product Owner. Capacity planning and velocity are determined by the Scrum Master based on past delivered points. Here, the Scrum Master collects the availability for all the developers and the QA during the sprint duration to gauge the firepower available for that sprint. He also notes any holidays during that time, both onshore and offshore, and arrives at an acceptable velocity after discussing with the entire team. Prior to the IPM Meeting, the product owner is expected to have already ordered all the stories in terms of priority. A 10% Buffer in capacity is reserved citing any urgent customer feedback or production Issues. Only stories from the “Already Groomed” bucket are considered for the sprint. All the developers vote their estimates in Points for a User Story and based on the average points, the Scrum master determines the point for that particular User Story. We keep doing this until we hit the velocity we’ve set for ourselves.
Typical Point System followed:
- 0 Point – Usually bugs, miscellaneous Dev tasks/errands. These hardly take any time for a developer to complete.
- 1 Point – Easy task, could typically be completed with no help within a day. From QA’s perspective too, this should be the most simple item to test.
- 2 Points – Still an easy item, but requires more testing than a typical 1 Point story.
- 3 Points – Slightly complicated, has a considerable impact across the codebase. Testing 3 point stories is a moderately difficult task and requires thorough test case preparation.
- 5 Points – Complicated piece usually requiring research, development, and coordination. A maximum of three 5 point stories are allowed each sprint for a typical sprint velocity of 30 points.
- 5+ Points – Very rare! We usually don’t pick these up. Will be deemed undoable or split into multiple tickets.
If you notice the pointing system it follows the Fibonacci series. The Scrum process includes two more points – 8 and 13, however, it is not a strict criterion. We are contented with the maximum 5 points and we’ve stopped there.
Test Case Review:
After IPM, the QA team takes the initial two days to prepare test cases for all the stories that were added. The QA Team uses JIRA to achieve this. For each story, there will be a high-level description capturing the story’s objective which is later followed by a granular test case covering the detailed step. Later, these test cases will be shared with the Product Owner and the Developers who in turn will go through them and come to an understanding. If a test case is found invalid, then that test case needs to be vetted again by the QA team, else an agreement is to be reached between the Product Owner, QA, and Development Groups.
This is a forum designed to bring together the Manual QA, Functional QA, and Data Automation QA teams with the Engineering Manager in order to strategize and review overall QA metrics. This happens 1-2 times each sprint. The objective of this meeting is to continuously improve both manual & automation processes by going through the past sprint’s metrics.
Code Freeze is a phase during the sprint process where the development of features comes to a halt and the code is prepared for deployment to the staging environment. No features should be included in the code base after the code or feature freeze period. Only regression bugs and high priority show stopper bugs are allowed, and that too, very selectively. This is also the time where the QA and Developers can suggest items that could be moved out of the sprint due to unforeseen reasons and the Product Owner approves of such action. This is also the time the QA team cuts the release branch from the main branch to be regressed. This build later becomes the final build to be deployed to Production.
This is a very important aspect of development. Various companies follow different strategies. In our case, we make development happen on the “master” branch until Code Freeze. A Sprint branch is cut from master on Code Freeze day. Minimal changes/bug fixes go to sprint branch. The last successfully Regressed Tag of Sprint branch is then deployed to Production. Post deployment, this sprint branch is merged back to master.
The retrospective should happen after the end of each sprint. This should contain the following broad categories while doing a thorough post mortem on the goods and bads of the sprint that just ended. It could have the following points:
- What went well? – The things that went right and as expected in the previous sprint. A few examples are delivering stories ahead of time and having a satisfactory turnaround of any customer-related queries directed towards the Product and Development teams.
- What did not go well? – The things that did not go right in the previous sprint. Examples include a story getting delayed in delivery or story getting removed from the sprint. The team tries to analyze the Root Cause of such problems.
- What could be improved? – Collective feedback from the entire team on what are the aspects of the development practice that could be improved in the future sprints.
- Action Items review from Past Retro – Review the action items that the team agreed upon to take up in the previous sprint and the status of each of the action shall be reviewed in the current sprint. If there are any action items that were not completed, they are typically carried over in the current sprint.
- Action Items and Owners assignment – After getting feedback on topics and use cases under “what did not go well” and “what could be improved”, an action item will be taken to minimize such occurrences in future or the improvement notes. Here the owners are identified for each action item. The owners could be QA or Dev or Product Managers. These action items will be noted and delegated to the owners of the next sprint.
Once the regression build gets the sign off from Dev, QA and Product teams – the software is tagged with a timestamp. An OPS ticket is created for the OPS Engineers to coordinate the Deployment activities. The OPS Engineer has all the steps required for the deployment and a Deployment Meeting is scheduled. This meeting has the Product Manager, QA Team, Development Representative and the OPS Engineer who does the actual deployment. OPS Engineer should keep a log of all the deployment against Github and the steps will be captured there. OPS Engineer follows those steps and then informs the entire team about the deployment.
The OPS Engineer deploys in two phases. In the first phase, he will deploy the software to a single node that does not face internet traffic by taking that node out of the Load Balancer. The QA Team will perform a high-level smoke test against that Node. And if the QA and Product teams feel comfortable, they will then request the OPS Engineer to do the deployment to the further nodes. If the QA Engineer raises a flag during this phase, the deployment is halted and the OPS Engineer rolls back the code using the Rollback script and the deployment is deemed as a failure. However, for end users, there won’t be any downtime or impact on the software they see. On the other hand, if the first phase of the deployment is deemed successful, the OPS Engineer will proceed to deploy to the rest of the nodes. The deployment is broadcasted to the respective stakeholders. PS: We never had to exercise rollback for even a single release!
Post Deploy Monitoring:
After each deployment, in order to ensure that all systems work correctly, there are certain key aspects involving ownership division. This phase would typically involve various health checkpoints. Some of them will be owned by the OPS team, some by the Data team, some by the Product Manager and some by the Developer. I have worked in small teams where all the above-mentioned verticals were governed by one or two people in the team. It is vital that the core developers in the team should be versatile enough to switch and play any of the above roles. This phase will be highly important when there are releases involving software updates carried over from multiple sprints. Many details could need attention in this period from monitoring the application health, alert management, uptime checks, etc.
I know this is a lengthy read and most of these steps are practiced in different forms and flavors. But, the idea behind this article is to share a system that has achieved the level of maturity where the process could just be adopted as-is and could help organizations in building highly cohesive agile teams.
If you are a follower of Scrum too, what nuances do you apply? If you have anything interesting to share do leave a comment below.
Acknowledgments: Many thanks to Akhila Ramnarayan for providing valuable feedback, and structuring the article!