Dev from trenches

Un regarde sur la qualité

Few days remain to finish the development of the new version of the product and you are with the latest developments. You're a bit behind the schedule and suddenly the Operations team reports a critical bug. The customer is nervous and requests to solve it because of the loss of service that it is causing. You have to stop to solve this bug. It takes you three hours to analyze its cause. And four others to fix it. And three more to test it. You have lost two days and now you will deviate even more from the plan.

A typical situation if you work on product development. However, what matters is how often it occurs. If it happens very frequently then the consequences are disastrous. The team will be under continuous stress due to the context and pace changes. It is not the same to solve a bug under the pressure of an [SLA] than to develop a new feature. The team is not motivated because they have the feeling that their day to day consists of nothing more than fixing bugs in a hurry. You cannot follow the established plan. Every day is an adventure and the days turn very long. The delivery dates that you agree with the customer are constantly breached. The over-cost for bug resolution is continually increasing. There is a risk to low the quality of the new features to try to recover lost time and then the story repeats itself indefinitely.

Bugs and more bugs appear continuously. The team again has to stop the development of the current version to correct them. Customer expectations of the product are not being met. At the same time, the expansion of the product is unstoppable due to a change in the client's business model. Now more than ever, a product with good quality is needed. A rapid evolution of the product is required to adapt it to new business requirements. This pressure situation inevitably leads to a Big Bang. And suddenly, boom! everything explodes to rebirth something new.

Suddenly, in isolated moments of lucidity, you remember good software engineering practices that ensure good code quality. You begin to appreciate and value this quality simply for pure necessity. The quality of your product must be improved.

Quality Appreciation

Everything has exploded into the air. The relationship with the customer has deteriorated and the position of the product plummets. You need to stop for a while and analyze the situation.

The customer requires a quality product. Well-designed and usable user interfaces that ensure a good user experience, a product that offers good service to users and free of critical bugs. The resolution time of the bugs should be short and the evolution of the product needs to be fast and not very expensive.
The company needs to obtain an economic benefit in the manufacturing of the product, a low maintenance cost, an effective evolution in cost and a quality in the product that ensures continuity in the business.

The problem is that we don't know how to make a product to meet all these requirements.

The first thing you have to assume is that even if you call "product" to what you are making, probably it hasn’t got enough quality to call it "product".

Improving the quality of an already manufactured product has its cost. Probably a much higher cost than if we built it with quality from scratch. We must be willing to assume this cost and decide whether to continue with the improvement process or not. It will be necessary to analyze what volume of business the product gives and what future business we foresee. We will be satisfied that the product is improving its quality and, as a consequence, the volume of bugs drops considerably and the client gains confidence in the product. This will open up new business opportunities. However, it is difficult for us to accept this improvement process and assume its extra cost.

Team

A good quality product can only be manufactured by a team with good skills. If your product has poor quality, the team may not be able to do better. It is possible that the team does have potential but that there is a working inertia that does not make it possible. It could be that the working pace set with very short delivery times prevents it. If we are in the latter situation and we are given higher priority to the deliveries than to the quality, we are mortgaging our future.

It takes time and effort to build a quality team. And beyond this, a team culture that appreciates and cares for quality. When this is achieved then good quality products can be made. However, a quality culture is built by doing, experimenting, comparing and not only defining processes from theory. Improving the quality of our product will shape this culture of quality.

Someone from outside may need to be joined to the team to lead quality improvement. The strategy will be defined with the rest of the team as they are the ones who know the product best, but this new person will be the motor for the change. The original team will have to evolve and adapt to new processes and ways of working.

The root cause of producing poor quality software is that you do not see the consequences of bad software development practices with regard to the service that the product offers. It is not that there is a single bad practice that causes a low quality product, but a series of bad practices, whose accumulated consequences cause a lousy product. Each line of code counts and has its own small consequence that will accumulate with the others. And if you have a product with 500,000 lines of code, then do the calculations...

It is a question to get the team assigns the appropriate value to each line of the product code. The most effective way in the short term is to make them see that the bugs reported are a consequence of the bad quality. I find [Matt Mullegberg's] approach interesting by which each person on the team spends at least one week a year on the Operations team giving support to the user to see the service offered by the product.

Another way is that they see that when a new feature is introduced it is necessary to modify parts of the product that are conceptually not related to this feature. The team will see that you have modified - indirectly - the same part of the product several times and will be fed up with not finding a final implementation. Another way is that they see how the estimates they give to modify existing code are not met due to the poor quality of the code. In the long term it is best to make a comparison of the bugs that occurred with previous versions and the bugs that now occur with higher quality versions. And how his day to day has improved.

Once the team is aware of the importance of quality, they may not know how to improve it. You will have to show them good practices in development to achieve their improvement. This set of good practices will define our quality reference framework.

After this quality reference framework has been identified, the team may not know how to fit their application into their day-to-day. It will be necessary to define a [refactoring strategy] and a [continuous improvement] strategy for the worst quality parts to improve them progressively.

At this point, the team becomes ready to produce good quality software. Now this quality must be improved iteratively and incrementally, as the product needs to continue evolving and must continue to provide service at the same time.

Customer

The customer also perceives the quality of the product, from a very different dimension. He will perceive it in the manufacturing phases where he is most involved.

If we deliver quality functional analyzes where we not only describe functionally the features but also identify limit cases that occur in his business and ask for his involvement, then the client will perceive that a good job is being done. He will make an effort to find the best solution in each case. It will begin to discuss with the client if it is worth developing a specific and complex part of a feature taking into account its manufacturing cost, maintenance and future evolution.

These in-depth discussions now allow these kinds of decisions. The typical discussions between customer and supplier of whether or not something is within the scope according to the contract will be left behind.

If in the acceptance test report the customer sees that there is a sufficient volume of tests and that the tests prove most of the cases and that the regression tests of the product have been modified appropriately, he will be confident. Not because bugs are not going to appear in production, but because the bugs that could arise will not have a great impact on his business.

The higher quality the customer perceives during manufacturing and the fewer bugs that arise in production, the more confidence he will have in the product. And the more confident he is, the more possibilities of evolution in the product he will see.

Investing in quality is investing in the long term.

Company

Our company requires quality certifications to compete for some contracts.

When your manufacturing process passes a quality audit and you verify that you meet most of the requirements, it means that you are on the right track, for two reasons. Because you will not have to carry parallel documentation to justify that you comply with the quality system. And because it is evident that you are doing things well.

The processes you follow and the assets you produce emerged from pure necessity in the constant attempt to improve quality. By coming up from this need, it is already matured and accepted by the entire team. What's more, the team won't be able to stop doing it. As these processes and assets have been imbued in your manufacturing process, it is certain that they will be agile to apply and follow, otherwise they would not be to the liking of developers and testers.

Quality minima

A product without quality cannot survive too long in the market offering service. As time passes, the product will evolve and the entropy will grow. If the product was created quickly to take advantage of a business opportunity, quality was probably not a priority. If, on the contrary, it was created from the beginning with quality, that does not guarantee that the quality will be maintained as it evolves. To maintain quality, you need to make an effort and reserve a budget for it. How this effort is spread over the life of the product is another story.

The team that initially created the product keeps changing. New members join in and other leave it. The knowledge that people have about the product is not always the same throughout the time. However, it is necessary to ensure a minimum of knowledge to maintain and evolve the product. This minimum knowledge is not always the same either, since the product can increasingly offer domain-specific services, so there is an incremental discovery of the domain by the team.

As we can see, all these dimensions vary over time. However, we need to hold on to invariable characterizations that lead us to work in the right direction to ensure quality.

A car has been assembled by parts on a production line following well-defined phases. If you don't have a well-defined manufacturing process then you won't be able to make a good quality product. Each production phase, the produced asset and team roles must exist because they are necessary, not because a quality standard mentions them. You know that something is necessary because if you eliminate it from your process you are leaving a hole in the manufacturing. A few months ago we started to break down the monolithic architecture of our product into various microservices. We have already developed the new features using this microservices paradigm. However, the first version we created with this mixed architecture was a disaster. We had not put enough focus on the assembly between microservices and monolithic components. What we did in the next version was to include an assembly phase in the manufacturing process, placing it as early as possible. We called this new manufacturing process “assembly first". We also identified a new role on the team: "assembler", to whom we assigned specific tasks and responsibilities. This is just one example of a phase that we have identified as necessary due to a change in the product architecture.

Car parts have been purchased from external suppliers. These suppliers have validated the right working of each part separately. They will have machines that test and ensure the quality of each piece. During manufacturing, the car will pass assembly tests, safety tests, usability tests, etc.

Your product must be able to be tested in parts. It is not enough to have end-to-end acceptance tests. You need to have tests at component level. The unit and integration tests code is as important as the product features code. A quality product should have more lines of code intended to test the product than the product's own lines of code. If there is not a good coverage of unit and integration tests, it is a sign that the code has not been executed as often as necessary. You cannot get a complete picture of your product until you have run it thousands of times and observed its dynamic running behavior. There is no other way to see the dynamic behavior of the software. Analysis and design on paper supports it all. You will need emulators and simulators to help you test at the component and integration level. They will play the same role as machines to test car parts.

The volatility of the software derived from the changes in the interfaces between components and in their behavior make these tests at the component level and at the end-to-end level even more necessary. In order to guarantee this testing, it is necessary that even during the design of the product you are thinking about how to test it. The product architecture will be designed looking from various perspectives: functionality, performance, environment in which the product will be deployed, testing and mocking components, etc.

The car will be sold offering a warranty for a few years. If a fault arises during this guarantee or even outside it, you take it to the car repair shop, they detect the fault and correct it. Unless the fault is very serious, the price will be affordable. The fault may be located in a specific part, and a replacement will be requested from the supplier and it will be mounted replacing the faulty part in the car.

You have put your product into production and it is offering service. Suddenly a critical bug is identified. However, when you analyze it, you realize that it is unsolvable or its resolution could destabilize the product. Can you imagine taking your car to the car repair shop and the mechanic tells you that he has no solution for the fault or that he does not know how long it will take to solve it? Cases surely there will be but I doubt there are as many as in the software.

A car maker plans to release a new model next year. You have already identified the parts that you need to improve, how these changes affect the assembly, manufacturing process and testing. The client asks you to include a new feature or modify an existing one and as a result of the analysis you conclude that before making these changes you have to refactor half of your product? It is inconceivable for the customer!

If you don't know the limitations of your product then you don't have a product. We see normal that the manufacturer of a car knows its limitations: maximum speed, resistance to impacts, maximum load capacity, etc. And you don't even know the business cases that your product doesn't cover? If you do not know these limitations, then you have not thoroughly analyzed your business, your model, or the coding of your product. And of course you don't have enough test coverage to have identified these limitations even before putting it into production.

Your product cannot have assumptions in the code that are not directly related to its limitations or the technical documentation of the product. How many times have we seen taking the first element of a list because a collection with only one element does not fit us? Hundreds of times. Or in the else branch to take an object with default data without any meaning. Or do nothing and leave a trace that will be buried in some huge log file ...

The client asks you a question about the behavior of your product in a specific case of their business. You ask the team and nobody knows. You enter to analyze the code and you cannot confirm its behavior without any doubt. How can this happen and so often? Can you imagine the same situation in which a customer asks the seller of a car and he does not know how to answer? Ok, we accept that software evolves very quickly. However, you must graduate the speed with which your product evolves or failure will be guaranteed. Your team must assimilate this evolution and will need time to consolidate it.

Note that I have not gone into too many technical details. I have set the level very low and yet I bet that most products do not meet these minimums that guarantee a minimum quality.

If a part of a car is manufactured with defects, this entails an economic repercussion since 1,000,000 units have already been manufactured. Instead, the software with a lot of bugs still goes into production. The subsequent cost is not taken into account when correcting bugs. The intangibility of the software works against this.
This intangibility influences the lack of discipline and rigor in software manufacturing. It is true that the tight deadlines and the pressure to release your product don’t help. And it is also true that when these restrictions do not exist, there is neither discipline nor rigor.

Continuous improvement

The key to recover the quality of a product is continuous improvement. Even more so if the product must continue to evolve and provide service in parallel. Continuous improvement consists of making changes to the code by applying the good practices in a incremental and iterative way.

As we see before, a best practices set is grouped in a reference quality framework. When we have applied the whole framework, we can define another new framework incorporating new good practices.

You will need to define at least two frames: one for legacy code and one for new code. This is so because not all the good practices of the framework defined for the new code can be applied because we have different versions of the language, or we do not have the same development framework or because the cost of applying the entire framework is not acceptable at that moment.

Legacy code

Senior developers will improve legacy code. They have been working with the product for a long time and understand the domain, the coded model and the product features. They will know what features users most use, in order to focus on the features that provide the most value.

We will apply different techniques to improve incrementally depending on the characteristics of the code to be modified.

I have often come across complex feature implementations trying to cover all possible cases and implementing very complex logic for it. However, when you test it, it does not work even in the most basic and frequent cases. To solve this we can apply the technique of code simplification.

This technique can be applied in various situations. If the code does not implement important features in the product. Or if its quality is so bad that it is not even possible to maintain it. Or they are features not visible to end users. Or it is quite isolated from the rest of the product and has almost no dependencies.

The code simplification technique consists in reducing the complexity of the code as much as possible and subsequently applying the reference frames successively to this simplified code. The first thing is to analyze the existing code. See what input and output data there is and what its dynamic behavior is. This behavior cannot always be seen by running the code as it can be very slow. Unless it is very critical we will avoid it. Instead, you will have to infer this behavior by analyzing your code. Afterwards, it will be recoded, covering the most representative cases of the business and leaving the less common cases. These cases will be annotated in the documentation so as not to lose them. Then, unit tests and integration tests will be coded to ensure that the modified code works. Finally, the technical documentation will be made from this code.

At this point the code has not been fully recovered. However, the most important thing is that you have taken control over your code. In the case of happening bugs, we will be able to resolve them in an acceptable time and cost.

There is a risk of bugs occurring in cases that we did not cover during the simplification. If any occurs, we will consult the documentation if it refers to a non-covered case, we will correct it and update the documentation. Paradoxically, rarely have I applied this technique after the user has reported bugs.

In the next iteration of improvement we will either apply more good practices from the current frame of reference or we will move on to the next frame of reference. Likewise, more cases than those registered in previous iterations will be covered.

New code

The continuous improvement in the new code will be to gradually apply the new good practices of the reference frameworks. In this case, the main objective is to optimize the manufacturing process of the new code to compensate for the extra effort and cost of recovering the quality of the legacy code.

We may become obsessed with continually improving quality. It is not counterproductive if you establish a progressive plan. It can be improved indefinitely as long as we analyze its cost and benefit, both in the short and long term. The team has to learn to apply the new frameworks and this takes time and effort.

To soften this obsession, it is good to do a retrospective. Go back to the past and see what code was being produced, what the manufacturing process was like, the volume of bugs reported by the testing team, the bugs reported by users or any other indicator that we consider useful. This will allow us to see the substantial improvement that we have accumulated and that is not easily perceived on a day-to-day basis.

Between two lands

It may happen that the same person on the team is improving legacy code and coding new code. This person then has a great challenge.

When he starts to modify legacy code, both frameworks are in his mind and he clearly perceives the remarkably superior quality of the new code. However, he only should consider the legacy code framework and this will create frustration for not being able to apply all the good practices of the new framework. For not being able to spend as much time as he would like, throwing away the legacy code and redoing it.

On the other hand, he'll be hopeful that after several iterations of improvement, he could regain the desired quality in the legacy code. They are very interesting opposing sensations to observe.

It is a good idea for several people to code in both legacy code and new code. The appreciation of quality will be reinforced.

Effects

As product quality increases, the importance that the team attaches to quality will also increase. Basically because their day to day is improving. There is less and less stress from having to solve bugs quickly.

As product quality improves, customer satisfaction increases and this satisfaction is a very positive feedback for the team. The team sees that the product that has been developing is well received.

Once the team sees the need to develop quality code, they begin to appreciate code as their most precious asset. The code no longer belongs to whoever did or last modified it. It belongs to no one and everyone. Every line counts and is questioned. If it is not understood what it does, it is simply eliminated. How many times do we see variable checks ! = Null! You see the code and quite possibly it will never come to null. You decide to remove this check which only makes the readability of the code worse and if it has to exploit it to do so. Maybe, it will arise during testing. The objective is to constantly eliminate uncertainty in the execution of the code. The team wants to understand what each of the product lines does. It seems obvious but this is almost never the case.

When someone from the team enters to modify a component of the product and sees a part with poor quality, he warns the rest of the team. Its severity is analyzed and it is recorded as technical debt. This debt can be recovered at that time or not, that depends on the [refactoring strategy] defined and the need. Good quality code makes the team cohesive. They all row in the same direction and collaborate in achieving that good quality and in recovering the technical debt.

Some people on the team do not want to improve. They may not have seen the need for improvement. They may not have suffered the consequences of having poor quality and that they have not had to solve bugs rapidly. Or they may not directly want to take the effort to code while ensuring good quality. In this case, an imbalance occurs. Other people who do appreciate quality are beginning to see that the code that some make is not of good quality. Their culture no longer allows them to leave that code like that and they have to redo it. Conflicts begin. Good quality people don't want to work with others. The team is divided and a decision has to be made. People who have not adapted to the new culture must leave the team. That is the decision.

When a new person joins the team, it is difficult to get them accommodated into our culture. This person questions the defined good practices and does not know the reason why they were defined. He cannot compare the code that is currently being created against the legacy code. A good approximation for this person is to participate in some iteration of improving a simple legacy code. Another more radical approach is to get involved in bug fixing. However, these approaches have their risk and will have to be properly evaluated before applying them.

Maximum viable capacity

I have been practicing wing tsun for years. As a martial art, it takes time and dedication to acquire good ability. It requires many repetitions of body movements with full attention and awareness. During practice I was able to observe how you tend to want to go faster than you can, to the detriment of the quality of your movements and provoking excessive body rigidity. You even do this with a newly learned technique. If you have not yet acquired the biomechanics of the technique, you will hardly be able to print speed in execution.
Observe yourself and be aware of your limitations when coding. If you are not able to guarantee the correct working of a simple code, do not complicate it even more. It is not within your reach. Your ego will like to code with complex data structures and flows, but the end result will be disastrous. Focus on going step by step, acquiring good skills in coding according to your level. As you master that level, you can code more complex things.

The mastery consists in coding a simple solution to solve a complex problem. The simplicity of the solution will obviously be relative to the complexity of the problem.

Quality in prototypes

The quality of a product that provides a 24x7 service is not the same as the quality of a prototype that will demonstrate the viability of an idea. Your product may include a prototype at some point.

Most likely, the model coded in the prototype has been simplified and does not cover all cases in the domain. Or maybe it has neither unit tests nor integration tests. Or perhaps integration with the rest of the components is not the most suitable technically but the one that required the least modifications in the rest of the product. Acceptance tests may not exist or may not be complete. The same can happen with the technical documentation of the prototype.

It may also happen that the development of this prototype has not been done by the team that develops the product. There was pressure to have the prototype as soon as possible and the product team already had in its roadmap several versions with scope committed to the client.

However, the objective of the prototype was met. Its manufacturing cost was minimized with these cuts and its development was shortened.

The problem comes when you want this prototype to also provide the same 24x7 service as the product without modifying any of it. Without analyzing its consequences, the prototype goes into production and the expected happens. There are too many bugs that are very expensive or even impossible to solve. There will be no other option but to refactor this prototype and recover its quality to reach the quality of the product.

If the company decided to have it developed by a team different from the product team, then effort will have to be put into transferring development to the product team. This will not be a simple task or free of conflicts that will be aggravated if the teams are from different departments.

An extreme case is when the prototype goes into production offering service without having it refactored to raise its quality and the product includes new features that affect this prototype. The prototype model does not match the product model. Now you are forced to refactor the prototype and bear this unexpected cost.

In estimating the cost of manufacturing the prototype, it will be necessary to include the cost of refactoring to recover its quality and the cost of making the transfer between equipment.

Define an ad-hoc manufacturing process for prototype development. Thus, when a prototype is developed, you will know which phases and assets of the product manufacturing process have not been produced. It will be difficult for the team to follow this custom manufacturing process. They will have a hard time leaving unit or integration tests uncoded, for example. It will include uncertainty that may materialize into a cost overrun during the product demonstration phase.

Quality maintenance

We are immersed in a continuous and incremental improvement of the quality of the product. We put special focus on recovering the quality of the legacy code by dedicating a lot of effort. However, we should not overlook how the new code is produced. Let's remember that the most senior team members are recovering the legacy code, while the most juniors are coding new code according to the defined quality framework.

There is a risk that the new code was also of poor quality, despite having defined the quality reference frame and having redefined the manufacturing processes. To mitigate this risk, we will include code audits that raise non-conformities as soon as possible and suggestions for improvement in the code.

It is very frustrating to see legacy code being recovered, and new code has still bad quality. Above all, it is frustrating for people who are overexerting themselves with legacy code.

Another aspect to maintain is the culture of quality. To facilitate this, it helps to keep the team stable, without too much staff turnover. After all, culture is formed, defined and maintained by the people on the team. If you do not have a low staff turnover, then you will have to spend more time so that the new people know and sponsor our culture.

Quality must be derived from the manufacturing process. Its production phases, the assets generated, the roles and competencies identified within the team inevitably lead you to manufacture a quality product. Once this is established the quality will be maintained. The team will pick up pace and follow this manufacturing process in each version of the product. In addition, you will have constant positive feedback because Operations will hardly report product bugs. Now they will be sure that what they manufacture has quality.

However, the company has to think carefully if it really wants to achieve good quality. Now it will no longer be worth giving more weight to delivery times or increasing profit to the detriment of quality. The team will not know how to manufacture otherwise unless another manufacturing process is redefined to achieve this new goal. And following this new process will take time. Some people on the team will even refuse to work in this way.




A Common and Expressive Language on Remote Teams

They looked human and spoke the same language. However, there was no understanding in our conversation. They didn’t understand each other so I discarded I was the only reason. They used different words to refer to the same concepts so reasoning in the same direction was impossible.


Todos los posts »

Subscríbete::About