My Decision Theory

On my way to work I usually take a bus. Once, I’ve arrived to the bus stop a little bit late and had to wait for the next bus. I looked at the timetable and found out that the next bus was going to come in 12 minutes. I take two bus stops, which takes 4 minutes with the bus, or 20 minutes to walk.

I’ve decided to walk.

Now, mathematically, it was a wrong decision. Waiting for 12 minutes, then driving with the bus 4 minutes gives 16 minutes, which is shorter than 20 minutes. But, that day was very cold, so I’ve figured out I’d better walk and warm me up than staying at the bus stop for 12 minutes, possibly catching a cold. So, even if the decision was mathematically wrong, it was correct from the health point of view.

Several minutes into walking, I’ve watched a bus driving past me. What I’ve forgot while making my decision, is that two different bus lines pass my bus stop, and I can take both to come to work. I’ve looked up just one timetable and forgot about the second one.

As a consequence of this decision, I came into work several minutes later than I ought to come. Normally, this is not a very good thing. But I’ve worked a little bit more on the previous day, and I didn’t have any meetings scheduled, so that this hasn’t caused any major troubles. On the positive side, I’ve walked for 20 minutes, which was better for my health.

So, I took a decision, which was wrong both mathematically (16 minutes is less than 20) and logically (there was another bus line), but it didn’t have any major negative consequences, and indeed it was even good for my health.

Crazy, but this is how the world is. We take wrong decisions, but earn only positive consequences. Sometimes, we take perfectly correct and elegant decisions, that become huge source of negative consequences.

I’m still trying to understand how to handle it.

And this is by the way why I’m always laughing when I hear CS academics speaking about “reasoning about your code” and “formal proof of correctness”. They seem to be thinking, the biggest problem of software industry was to figure out, if 16 is less than 20.

Childhood remembered

One of the things done right in the Russian school system is the vacations schedule. The summer vacations are the largest and cover the whole summer; they start in the beginning of June, and end on August 31st. This is almost three months of the best weather in the year. For children it is so long that they feel like eternity, and the most exiting part of them are the first several days. This unforgettable feeling that you finally don’t have to do anything what you don’t want to. These summertime streets and places that belong almost completely to children, because adults are at work. This anticipation of a journey, maybe for a couple of weeks, together with the parents. And the best of it, this feeling that this joy will (almost) never end.

In 2010, Ринат Тимеркаев has made an animated movie about his home city, which is a neighbour city to the city where I was born. The city nature, architecture and overall feeling is very similar and reminds me of my home city, during the Summer vacations. The movie is called “I love you”, and its meant to be “I love you, my home city”. I didn’t quite feel it was a good fit. This year, somebody has mixed the song “Childhood remembered” by Kevin Kern with the movie, and suddenly, it made click for me. Now this is a perfect movie to remember about my summer vacations.

Four Weeks of Bugfixing

The hardest bug I’ve ever fixed in my life took me 4 weeks to find. The bug report itself was pretty simple, but I have to give more context first.

I was one of the developers of a Smart TV software, and the bug related to the part of the software responsible for playing video files stored on your USB memory or drive. The CPU that was available for this task was a 750MHz ARM chip, and clearly it had not enough power to decode video (let alone HD video) in software. Luckily, every digital TV set has a hardware H.264 decoder, and our SOC was so flexible that we could use it programmatically. In this way, we were able to support H.264 video playback (too bad for you DivX and VC-1 owners).

Technically, the SOC has provided a number of building blocks, including a TS demux, an audio decoder, a video decoder, a scaler and multi-layer display device, and a DMA controller to transfer all the data between the blocks. Some of the blocks were present more than once (for example, for the PIP feature you naturally need two video decoders) and therefore could be dynamically and freely interconnected programmatically, building a hardware-based video processing pipeline. Theoretically, one could configure the pipeline by writing some proper bits and bytes in specified configuration registers of the corresponding devices. Practically, the chip manufacturer has provided an SDK for this chip, so that you only had to call a pretty well-designed set of C functions. The SDK was intended to run in the kernel mode of a Linux kernel, and it came from the manufacturer together with all building scripts needed to build the kernel.

Furthermore, this SDK was wrapped and extended by some more kernel-side code, first to avoid dependency on a particular SOC, and second to provide some devices to the user-mode, where the rest of the Smart TV software was running. So to play video programmatically, one needed to open a particular device from user mode as a file, and write into it a TS stream containing video and audio data.

Sadly, there are many people out there who have invented a lot of different container formats besides of TS. Therefore, our software had to detect the container format of the file to be played, demux the elementary streams out of it, then mux them again into a TS stream, and then hand it over to the kernel mode code. The kernel code would pass the TS bytes to the DMA device, that would feed the hardware TS demuxer, that would send the video elementary stream to the hardware video decoder, where it finally would be decoded and displayed.

For the user mode, we could implement all possible container formats ourselves (and this would mean some job security for the next 10 years of so). Fortunately the Smart TV software was architected very well so that the GStreamer framework was used (for you Windows developers it is an open-source alternative to DirectShow). The framework is written in C (to be quick) and GLib (to be object-oriented) and provides a pipeline container, where you can put some filters and interconnect them. Some filters read the data (sources), some process the data (eg. mux or demux), some use the data (sinks). When the pipeline starts playing, the filters agree on which one will drive the pipeline, and the driver would pull the data from all filters before it in the pipeline, and push the data into all the filters after it in the pipeline. Our typical pipeline looked like this (in a simplified form): “filesrc ! qtdemux ! mpegtsmux ! our_sink”. As you can expect from such a framework, there are also a lot of stuff related to events and state machines, as well as memory management.

So now, back to the bug report. It looked like this: when playing a TS file from USB memory, you can seek forward and backward with no limitation. When playing any other container format, you can seek forward, but you cannot seek backward. When seeking backward, the video freezes for several seconds, and then the playback continues from the latest position.

This is the sort of bugs when I think this might be fixed in a day or two. I mean, it works with TS, it doesn’t work with MP4, it is fully reproducible, so just find out what is different in those two cases and you’ve caught it.

The GStreamer pipeline in TS case looked like this: “filesrc ! our_sink”. So it must be either qtdemux or mpegtsmux. I’ve built another MP4 demuxer and replaced qtdemux with it. Negative, the bug is still there. No wonder, it also appeared in other container formats. I couldn’t replace mpegtsmux, because I haven’t found any alternatives. So the only thing I could do is to use the pipeline “filesrc ! qtdemux ! mpegtsmux ! filesink”, write the output into a file, and then try to dump the TS format structure and to look for irregularities.

If you know TS format, then for sure, you are already very sympathetic with me. TS is a very wicked and complicated format, and they repeat some meta-information every 188 bytes, so that the dump of several seconds of video took megabytes. After reading it, I didn’t find anything suspicious. Then I’ve converted my test MP4 video into a TS using some tool, dumped that TS, and compared. Well, there were some differences, in particular, how often the PCR was transmitted. Theoretically, PCR is just a system clock and should not influence the playback at all, but practically we already knew about some hardware bugs in the decoder making it allergic to unclear PCR signaling. I’ve spent some time trying to improve PCR, but this didn’t help either.

I have then played the dumped TS file, and I could see the seek backwards that I did during the recording. This has convinced me that mpegtsmux was also bug-free. The last filter I could suspect was our own sink. Implementing a GStreamer filter is not easy to do right in the first time. So that I went through all the functions, all the states, all the events, informed myself how the proper implementation should looked like, and found a lot of issues. Besides of a lot of memory leaks, we’ve generated a garbage during the seek. Specifically, GStreamer needs it to work in the following way:

1. The seek command arrives at the pipeline and a flush event is sent to all filters.

2. All filters are required to drop all buffered information to prepare themselves for the new data streamed from the new location.

3. When all filters have signaled to be flushed, the pipeline informs the pipeline driver to change playback location.

4. After the seek, the new bytes start flowing in the pipeline. Our code has conformed to this procedure somewhat, but did the cleanup prematurely, so that after the cleanup some more stale data polluted our buffers, before the data from the new location arrived.

I couldn’t explain why did it work with TS but not with MP4, but I’ve figured out that fixing it will make our product better anyways, so I’ve fixed it. As you can imagine, this didn’t solve the original problem.

At this point I’ve realized that I had to go into the kernel. This was a sad prospect, because every time I’ve changed anything in kernel, I had to rebuild it, then put the update on a USB stick, insert it into TV set, upgrade it to the new kernel by flashing the internal SOC memory, and then reboot the chip. And sometimes I’ve broken the build process, and the new kernel wouldn’t even boot, and I had to rescue the chip. But I had no other choice: I was out of ideas what else I could do in the user space, and I suspected that in the kernel space, we also had a similar issue with a garbage during the seek.

So that I’ve bravely read the implementation of the sink device and changed it in a way that it would explicitly receive a flush signal from the user space, then flush the internal buffer of the Linux device, then signal back to user space it is ready, and only then I would unlock the GStreamer pipeline and allow it to perform the seek and start streaming from the new location.

It didn’t help.

I went further and flushed the DMA device too. It didn’t help. Also flushing the video decoder device didn’t help.

At this point I’ve started to experiment with the flush order. If I first flush the DMA, the video decoder might starve in absence of data and therefore get stuck. But if I flush the decoder first, the DMA would immediately feed it with some more stale data. So perhaps I have to disconnect the DMA from video decoder first, then flush the decoder, then the DMA, and then reconnect them back? Implemented that. Nope, it didn’t work.

Well, perhaps the video decoder is allergic to asynchronous flushes? I’ve implemented some code that has waited until the video decoder reported that it has just finished the video frame, and then flushed it. Nope, this wasn’t it.

In the next step, I have subscribed to all hardware events of all devices and dumped them. Well, that were another megabytes of logs to read. And it didn’t help, that the video playback was a very fragile process per se. Even when playing some video, that looked perfectly well on the screen, the decoder and the TS demux would routinely complain of being out of sync, or losing it, or being unable to decode a frame.

After some time of trying to see a pattern, the only thing I could tell is that after the seek forward, the video decoder would complain for some frames, but eventually recover and start producing valid video frames. After a seek backward, the video decoder has never recovered. Hmm, can it be something with the H.264 stream itself that prevented the decoder to work?

Usually, one doesn’t think about elementary streams in terms of a format. They are just BLOBs containing the picture, somehow. But of course, they have some internal format, and this structure is normally only dealt with by authors of encoders and decoders. I went back to GStreamer and looked up, file by file, all the filters from the pipeline producing the bug. Finally, I’ve found out that mpegtsmux has a file having “h264” in its name, and this has immediately ringed alarm in my head. Because well, TS is one abstraction level higher than H.264, why the hell mpegtsmux has to know about the existence of H.264?

It turned out, H.264 bitstream has in its internal structure so-called SPS/PPS, the sequence parameter set that is basically a configuration for the video decoder. Without the proper configuration, it cannot decode video. In most container formats, this configuration is stored once somewhere in the header. The decoder normally reads the parameters once before the playback start, and uses them to configure itself. Not so in TS. The nature of TS format is so that it is not a file format, it is a streaming format. It has been designed in the way that you can start playing from any position in the stream. This means that all important information has to be repeated every now and then. This means, when H.264 stream gets packed into the TS format, the SPS/PPS data also has to be regularly repeated.

This is piece of code responsible for this repeating: http://cgit.freedesktop.org/gstreamer/gst-plugins-bad/tree/gst/mpegtsmux/mpegtsmux_h264.c?h=0.11#n232 As you can see, during the normal playback, it would insert the contents of h264_data->cached_es every SPS_PPS_PERIOD seconds. This works perfectly well until you seek. But look how the diff is calculated in the line 234, and how the last_resync_ts is stored in line 241. The GST_BUFFER_TIMESTAMP is as you can imagine the timestamp of the current video sample passing through the tsmux. When we seek backwards, the next time we come into this function, the GST_BUFFER_TIMESTAMP will be much less than last_resync_ts, so the diff will be negative, and thus the SPS/PPS data won’t be repeatedly sent, until we reach the original playback time before the seek.

To fix the bug, one can either use the system time instead of playback time, or reset last_resync_ts during the flush event. Both would be just a one line change in the code.

Now, the careful reader might ask, why could the TS file I’ve recorded using mpegtsmux in the beginning of this adventure be played? The answer is simple. In the beginning of this file (i.e. before I’ve seek), there are H.264 data with repeated SPS/PPS. At some point (when I’ve seek during the recoding), the SPS/PPS stop being sent, and then some seconds later appear again. Because these SPS/PPS data are the same for the whole file, already the first instance of them configures the video decoder properly. On the other hard, during the actual seek of MP4 playback, the video decoder is being flushed, and therefore the SPS/PPS data is being also flushed, and this is the point when the video decoder relies on repeated SPS/PPS in the TS stream to recover, and this is exactly the point when they stop coming from the mpegtsmux.

Four weeks of search. 8 hours a day, 5 days a week. Tons of information read and understood. Dozens of other smaller bugs fixed on the way. Just to find out a single buggy line of code out of 50 millions lines of code in the source folder. A large haystack would contain to my estimate 40 to 80 millions of single hays, making this bug fixing adventure literally equivalent of finding a needle in a haystack.

Best romantic movie in 2013

Male: Oh snap… Oh shit!!
Female: What!? What!? What?. Sashka, Sashka!

Female: Shit… Such an idiot!
Male: I was waiting until he decides which way to turn…

If you don’t immediately see why is this romantic, consider this:

  • The female (supposedly Sashka’s wife) goes from a calm, sleepy smalltalk into an absolute panik, in just the second, when she hears the tone of her husband voice – even though she can’t recognize the danger immediately. She knows exactly how her husband ticks, and fully trusts his assessment of the situation.
  • After recognizing the danger, she keeps repeating the husband’s name. Not “mama” or just shouting uncontrollably. Again, full trust in the man and his ability to control the situation.
  • The male, seeing the danger of frontal collision with the truck, which at this speed would inevitably mean serious injuries or even death of all car passengers, keeps nevertheless calm and makes an extremely hard but the only right decision – to do nothing and just wait until it is clear what path the truck driver is trying to choose. And then, in the right millisecond, to make right short movement, escaping from the danger.
  • His first words after the situation are directed to his wife: he excuses for the waiting, and therefore for the fear his wife had, by explaining why he had to wait in this situation.

As a man, I find these 20 seconds quite romantic. And I keep asking myself, what would be my own actions in a similar situation.

Four Seasons of Enterprise

In the beginning, there is no enterprise, just a couple of founders fascinated by a single idea and working hard to realize it. The startup does not earn much money, and there are barely any employees, so that I suppose it might feel just like a (very hardcore) hobby. Or a side gig. There are no formally defined roles. Everybody is doing everything, and everybody is responsible for everything, and everybody can see the real contribution of each other. There is an Enterprise Spring feeling, full of the can-do mentality.

The Enterprise Summer begins, when the enterprise starts earning substantial amount of money and hires their 20th employee. The founders, now CEOs, suddenly realize that “they” (their company) are earning much more money than they would have been ever able to earn by their own. And they are responsible that this revenue increases, not decreases. Also, they realize that dozens of their employees trusted them and build on stability of the company to plan their life, pay off mortgages and so on. This is a huge responsibility and huge pressure. And for sure, a lot of sleepless nights, with a single thought running through your head: “how are we going to survive?”

I had a chance to observe several founders in several companies, hitting this level. They were all good-hearted, creative, smart, modest and ethical people. But I could see, day for day, how this pressure had melted, squeezed or at least severely bent their personality. At some point, you have to ignore interests of your friends, in the sake of the enterprise. At some point, you have to make unpopular, hard decisions and stop some projects, because your enterprise can’t handle too much projects at once and has to focus more sharply to survive. You have to cut parts of the body to save the rest. And on some day, you have to lay off somebody for the first time. If you didn’t had grey hairs before, this is the time for the first one.

At this stage, enterprises usually have a very loyal staff, and everyone has a very entrepreneurial approach: everybody knows exactly, how are we earning money, what does he or she has to do to help earning money, and what will happen if someone stops earning money. Summer Enterprises that don’t have enough staff of this kind, die very quickly.

First formally defined roles appear, out of a very practical and extremely transparent reason (that everyone can follow): that the division of work will reduce overhead, and thus help earning more money, and thus help the enterprise to survive. With roles comes responsibility, and some formal processes. The individual contribution of every single person starts getting fuzzy, because of division of labor, so that first non-monetary KPIs appear. Non-monetary KPIs lead to the first “locality problems”, where some people tend to over-optimize their own KPI, at the expense of some other departments, and the overall revenue. But because the company is still on the profitability edge and is fighting for its survival, these problems are usually timely detected by CEOs and fixed.

At some point, the enterprise gets a momentum. Some kind of a flywheel appears, generating ever more revenue and income, seemingly by itself. In the Enterprise Autumn, the company starts hiring more and more staff. Survival of the company is getting less and less dependent on individual contribution or individual decisions of any single employee. There will be more and more process. At this point, CEOs realize that they have finally achieved the nirvana they had envisioned so eagerly in their sleepless nights before, and start focusing on the conservation of the status quo. Minimizing or at least managing risks of destroying the flywheel is prioritized above trying some new ways earning money. Every single department is culturally trimmed to minimize risks and avoid mistakes. As a result, any major innovation ceases.

Usually, at this point, more and more people playing corporate politics are hired.

Remember the feeling of people before the 20th century? The mankind was so small compared with the nature that no one made any second thought cutting the last tree in the forest or spilling waste into a river. The “Well, when this forest is cut down, we’ll just move on to the next forest” attitude. Only in the 20th century, people have finally realized that the Earth is a closed and pretty limited ecosystem. The Enterprise Summer is just like ecological thinking – everybody is aware that any single major fuck-up can end up with a global meltdown. Everybody is an Entrepreneur. On the contrary, in the Enterprise Autumn companies, there are a lot of people with the middle age attitude. They know that the momentum is huge and flywheel is big, so that they can allow putting their own career interests above the interests of the enterprise.

This is why Autumn Enterprises are so full of corporate politics. And from some particular point of view, one can at least understand it. After all, the well-being of a living, breathing person should be valued more than some abstract 0,01% uplift in revenues of some soulless corporate monster, earning money for some minority to allow them to buy a second yacht. So no wonder some people feel it ethical to do corporate politics and enjoy playing politic games. Others have to participate to protect themselves. Yet another just go under radar and opt out.

Another consequence of the corporate politics is the rise of huge locality problems, where the narrow focus on the KPIs of my own department prevails, often at the expense of the overall revenue, and there is nobody who can untangle these problems.

But no momentum can be forever. Either the too much of locality problems, or some external sudden market shift damages the flywheel, so that it cannot rotate so effortlessly than before. This is the time of the Enterprise Winter. At this point, the company usually has a long history of corporate politics, so that

a) all of its most important posts are occupied by corporate politicians with a non-ecological thinking, and

b) most of ecologically thinking Entrepreneurs have either left the company, or remained on an outsider role without any real influence.

To fix the flywheel, or to find out a new one, the enterprise needs (more) Entrepreneurs. But the corporate politicians (correctly) see them as a danger for themselves and fight them.

Different things can now happen depending on balance of power between the two groups. Entrepreneurs might win the battle, or at least manage to fix the flywheel while being constantly under attack. Or personal interests of corporate politicians might accidentally be best represented by a project that also fixes the flywheel. Or the flywheel has so much energy that it allows the company to survive for years and years, even in the damaged state, and then, another lucky external influence might fix it. Microsoft’s flywheel has been severely damaged around 10 years ago, and they have demonstrated both spectacular flywheel repairs and awful additional flywheel damages since than. Apple had experienced a similarly long period, the 12 years without Jobs.

But in the worst case, if the flywheel is weak and the corporate politics prevails, the agony might start, with all possible short-term potentials being sucked out of the flywheel, then staff getting laid off, and then all remaining assets being sold.

On corporate politics

My father has been living in the USSR for 54 years, before moving to Germany. During all this time, he has only owned two cars.

Owning a car in the soviet union was something only for people with big balls. The story started with the impossibility to buy a car. You could not just save money, go to a shop and buy a car. There were simply no dealerships. New cars were one of the most scarce articles in the country, so that they weren’t sold, but rather distributed. Many state-owned companies had got a contingent of several cars per year, and the local trade union committee had distributed them personally across the most politically active employees.

And no, “distributed” didn’t mean they were for free – only the right to buy was for free, but for the car itself, you still had to pay the full price. Which was around four to six years of salary for a lead engineer – an exorbitant price. Nevertheless, there were more people who wanted to buy a car than the yearly car contingent, so that a waiting list had been organized.

When I was 10 years old, I’ve asked my father why didn’t we have a car yet, and he told me he is on the waiting list, and considering the current situation, we will get the right to buy a car in around 10 years. Somehow, this was a satisfying answer for me. First, we had enough time to spare money. Second, I’ve figured out that I will be around 20 years old when we’ll get a car, so if I get the driving license with 18, I’d only have to wait for two years.

Of course, there also were used cars in the soviet union. But first, they were only available on an illegal black market (for some ideological reasons, the government didn’t like the idea), and second, their price was not much different from the new car price, considering it was virtually impossible to buy new cars.

But, even after buying a car, your story just started. The car assembly quality was awful. Therefore, after buying a brand-new car you had to go through each and every part of it, and fix it, because many parts weren’t properly installed, or nuts not screwed to the end. But there is more: the car design was even worse. Easily corroding materials have been used, without proper coating. Parts that had to be serviced regularly, were not designed to be easily removable and installable. For other parts, lifetime increasing improvements have been developed by car-owners and popularized under the car owners community. Therefore, the usual procedure after buying a car was to uninstall many of its parts (including dismounting and opening the engine), check them up, fix the defects, apply improvements, coat all surfaces with anti-corrosion agent, and assembling them back – this time properly.

The improvements, as well as proper procedures for dismounting and mounting of parts, have been popularized by the magazine “Za Rulem”, which was one and only car magazine in the USSR and had 4 millions subscribers. For many car owners, this was the only way to use the car – this means, to service it by themselves. Well, there were some government-owned car services in the USSR, but they were even more challenging to use than buying a car. You had to wait for months, until you will get an appointment. And on the appointment, you were typically told that some scarce spare part needed to service your car is currently not on stock, so you either had to buy an exorbitant bribe (around monthly salary) so that this part will be “magically” found on stock, or bring your own part, obtained illegally on a black market.

Because of this reasons, my father avoided going to the service at all, and serviced his car himself in the garage. To be able to do that, he had welding machine, lathe, milling tool, car lifting and tilting device, and all kinds of saw, drills, hammers and wrenches. As well as all kinds of liquids, raw materials and spare parts. His garage neighbors went to my father whenever they needed some tool, and my father went to his neighbors, whenever he needed another pair of helping hands.

If the term “garage neighbor” doesn’t tell anything to you: from where we’d lived, to get to the garage, my father had to walk 30 minutes to the nearest bus stop, then take a bus for 20 minutes, then walk another 20 minutes, until he reached a big industrial park. In the part of it, several thousands of garages were built. One of those belonged to my father. He would typically unlock the door, drive out the car, check it up, quickly fix whatever new problem he’d found, then drive the car all the way back to our house to pick up my mom and me, and then we were off to drive to whatever destination we were heading to (eg. to a food market to buy for the next week). After arriving home, the reverse procedure had to be done.

On every car ride, my father had to spend around 3 hours of walking, taking a bus, and servicing the car. During all the time our family had a car, I can’t remember any single day my father had time for hobbies, sport, culture or any other recreation in his free time. When I speak with him about it, he becomes very sorrow and regretful and complains that all his life has been miserably wasted in an effort to make a decent living, including “having a car”.

I love my father, and with all my heart, I hate everything responsible for his sorrow. One of the factors is the communism. Communism means that a state has to be organized like a huge corporation. There is no room for free market, nor for any agile mechanics in the communism. Living in a communistic state means you are part of a huge bureaucratic corporation, with a lot of politics going on. And the corporate politics is the second, most important factor I hate. Everyone who has ever lived in a corporate state, can immediately give a lot of very explicit examples, how corporate politics directly translates into a miserable life.

It is not that a plan economy cannot even theoretically provide a reasonable car supply. Just plan enough cars, invest enough money, carefully plan to overproduce to take into account all kinds of defects as well as demand spikes. Everything sounds manageable.

The problems start when some middle-level boss in the weapon ministry starts playing power games against his colleague from the car ministry, and wins, and therefore the state spends more money on producing tanks than on producing cars. Not that the weapon boss genuinely thinks his motherland really urgently needs more tanks than cars. It is all about his power versus the power of the car boss. And because, being a middle-level boss, he already owns the best possible car, he is not personally interested in having any more. And the folk? The folk doesn’t play any role in his corporate politics game. In a communistic, corporate state, the folk doesn’t have anything to say.

Corporate politics is in my opinion the major ultimate source of unhappiness, dissatisfaction, depression, health problems, and deaths caused by improper handling of patients; source of all kinds of waste, including waste of not renewable energy and materials, all kinds of cultural and knowledge loses, ecological dangers (Fukushima was not a technical, it is a corporate problem) and many, many more.

The only people who think they profit from corporate politics are the one who is playing this game; but statistically, most of them would lose the battle most of the time. And even the ones who win, only have more power and more money, but not more happiness and more life. Because you can’t be truly happy until your conscience is clean, and theirs is not.

How M should an MVP be?

Minimum Viable Product is now mainstream. But what exactly does it mean?

In my opinion, MVP is just an example of a more generic principle: Fail Fast. In other words, if you have to fail, it is better to fail in the very beginning, reducing the amount of burned investment.

If my idea is good, using MVP is counterproductive: some early adopters will get bad first impression due to lack of some advanced features or overall unpolishness, and we will need to spend much more money later just to make them to give us another chance.

If my idea is bad, MVP will save us a lot of money.

Because there is no sure way to know if my idea good or bad beforehand, it is safer to assume it is bad and go with the MVP.

But how exactly minimal the product should be? Do we want to reduce the feature set? Or don’t care about usability? Or save on proper UX and design? Does it mean it may be slow, unresponsive, unstable? Can its source code be undocumented and unmaintainable?

Well, the reason of MVP is reducing the overall investment. The principle behind it, is investing just that much to achieve a sound and valid market test, and not more. This means, when deciding about MVP, you tend to cut the area what costs you most.

For example, let’s assume we have a product development team that needs only 1 day to design a screen, 3 days to develop the backend for that screen, and 10 days to develop the frontend. It is naturally, that MVPs produced by this team would tend to have great visuals combined with an awful and buggy UI and a very good backend.

Let’s assume now that a team needs a week to design one screen, 1 day to develop the frontend, and 5 days to develop the backend. MVPs of that team would tend to have ugly, but responsive and user-friendly UI that would often need to show the loading animation because of a sluggish backend.

What does it mean?

This means that a double advantage is given to teams capable of designing and fully developing one screen per day: not only their MVP will be released sooner (or alternatively can have more features, better look and performance and more user-friendly UI), but also it can be a well-balanced and therefore mature-looking product (that’s an advantage to be mature-looking).

And this also means, if you want to identify where your business has capacity issues, just look at your typical MVPs: if some areas of them are substantially worse than other, you know what areas of the product team can be improved.

Client Driven Development

When I first tried out the test-driven development (it was around 1998, I think), I was fascinated how it helped me to design better APIs. My unit tests were the first clients of my code, so that my classes obtained a logical and easy-to-use interface, quite automatically.

Some time later I’ve realized that if you have a lot of unit tests, they can detect regressions and therefore support you during refactoring. I’ve implemented two projects, each took a couple of years, and have written around 200 unit tests for each.

And then I’ve stopped writing unit tests in such big counts. My unit tests have really detected some regressions from time to time. That is, around 5 times a year. But the efforts writing and maintaining them were much higher than any advantages of having detected a regression before manual testing.

But still, I was missing the first advantage of TDD, the logical and easy-to-use interfaces. So I’ve started to do Client Driven Development.

The problem with the unit tests is that they don’t have any direct business value per se. They might be helpful for business goals, but in a very indirect way. I’ve replaced them with a client code that does have some direct business value.

For example, I’m developing a RESTful web service. I roughly know what kind of queries and responses it must support. I start with developing a HTML page. In there, I write an <a> tag with all the proper parameters to the web service. I then might write some text around it, documenting the parameters of the service. Then I open this page in the browser and click on the link, which predictably gives me a 404 error, because the web service is not yet implemented. I then proceed with implementing it, reloading my page and generally using it in place of a unit test.

Of course, this approach has the drawback that, unlike in a unit test, I don’t check the returning values and thus this page cannot be run automatically. If you want, you still can replace this link with an AJAX call and check the returning values – I personally don’t believe that these efforts would pay off at the end of the day. More important is that this page has an immediate business value. You can use it as a rough and unpolished documentation for your web service. You can send it to your customer, or another team writing some client, etc.

If the web service is designed in a way that it is hard to get away with <a> and <form> tags, I would write some JavaScript or Silverlight code to call it properly. In this case, the page might have more business-relevant functions. For example, when it loads, it might request and display some data from the web service, in a sortable and scrollable grid, and allow you to edit them, providing you with a very low-level “admin” interface to the service.

This approach is not constrained by web development. I’ve used it for example for inter-process communication, and if my code has not yet been refactored out, it is flying now in passenger airplanes, and running inside of TV sets in many living rooms. In this variant, I start developing the inter-process communication by creating a bash script or a trivial console app that would send the messages to another process. I implement corresponding command-line options for them. When I’m ready, I start developing the receiving part, inside of some running process. This has the similar effect on the API design as unit testing, but has the advantage that you can use it during debugging, or even in production, for example in some startup scripts.

I’m not an inventor of this approach, indeed I often see this approach in many open source projects, but I’m not aware of any official name for it.