Four Seasons of Enterprise

In the beginning, there is no enterprise, just a couple of founders fascinated by a single idea and working hard to realize it. The startup does not earn much money, and there are barely any employees, so that I suppose it might feel just like a (very hardcore) hobby. Or a side gig. There are no formally defined roles. Everybody is doing everything, and everybody is responsible for everything, and everybody can see the real contribution of each other. There is an Enterprise Spring feeling, full of the can-do mentality.

The Enterprise Summer begins, when the enterprise starts earning substantial amount of money and hires their 20th employee. The founders, now CEOs, suddenly realize that “they” (their company) are earning much more money than they would have been ever able to earn by their own. And they are responsible that this revenue increases, not decreases. Also, they realize that dozens of their employees trusted them and build on stability of the company to plan their life, pay off mortgages and so on. This is a huge responsibility and huge pressure. And for sure, a lot of sleepless nights, with a single thought running through your head: “how are we going to survive?”

I had a chance to observe several founders in several companies, hitting this level. They were all good-hearted, creative, smart, modest and ethical people. But I could see, day for day, how this pressure had melted, squeezed or at least severely bent their personality. At some point, you have to ignore interests of your friends, in the sake of the enterprise. At some point, you have to make unpopular, hard decisions and stop some projects, because your enterprise can’t handle too much projects at once and has to focus more sharply to survive. You have to cut parts of the body to save the rest. And on some day, you have to lay off somebody for the first time. If you didn’t had grey hairs before, this is the time for the first one.

At this stage, enterprises usually have a very loyal staff, and everyone has a very entrepreneurial approach: everybody knows exactly, how are we earning money, what does he or she has to do to help earning money, and what will happen if someone stops earning money. Summer Enterprises that don’t have enough staff of this kind, die very quickly.

First formally defined roles appear, out of a very practical and extremely transparent reason (that everyone can follow): that the division of work will reduce overhead, and thus help earning more money, and thus help the enterprise to survive. With roles comes responsibility, and some formal processes. The individual contribution of every single person starts getting fuzzy, because of division of labor, so that first non-monetary KPIs appear. Non-monetary KPIs lead to the first “locality problems”, where some people tend to over-optimize their own KPI, at the expense of some other departments, and the overall revenue. But because the company is still on the profitability edge and is fighting for its survival, these problems are usually timely detected by CEOs and fixed.

At some point, the enterprise gets a momentum. Some kind of a flywheel appears, generating ever more revenue and income, seemingly by itself. In the Enterprise Autumn, the company starts hiring more and more staff. Survival of the company is getting less and less dependent on individual contribution or individual decisions of any single employee. There will be more and more process. At this point, CEOs realize that they have finally achieved the nirvana they had envisioned so eagerly in their sleepless nights before, and start focusing on the conservation of the status quo. Minimizing or at least managing risks of destroying the flywheel is prioritized above trying some new ways earning money. Every single department is culturally trimmed to minimize risks and avoid mistakes. As a result, any major innovation ceases.

Usually, at this point, more and more people playing corporate politics are hired.

Remember the feeling of people before the 20th century? The mankind was so small compared with the nature that no one made any second thought cutting the last tree in the forest or spilling waste into a river. The “Well, when this forest is cut down, we’ll just move on to the next forest” attitude. Only in the 20th century, people have finally realized that the Earth is a closed and pretty limited ecosystem. The Enterprise Summer is just like ecological thinking — everybody is aware that any single major fuck-up can end up with a global meltdown. Everybody is an Entrepreneur. On the contrary, in the Enterprise Autumn companies, there are a lot of people with the middle age attitude. They know that the momentum is huge and flywheel is big, so that they can allow putting their own career interests above the interests of the enterprise.

This is why Autumn Enterprises are so full of corporate politics. And from some particular point of view, one can at least understand it. After all, the well-being of a living, breathing person should be valued more than some abstract 0,01% uplift in revenues of some soulless corporate monster, earning money for some minority to allow them to buy a second yacht. So no wonder some people feel it ethical to do corporate politics and enjoy playing politic games. Others have to participate to protect themselves. Yet another just go under radar and opt out.

Another consequence of the corporate politics is the rise of huge locality problems, where the narrow focus on the KPIs of my own department prevails, often at the expense of the overall revenue, and there is nobody who can untangle these problems.

But no momentum can be forever. Either the too much of locality problems, or some external sudden market shift damages the flywheel, so that it cannot rotate so effortlessly than before. This is the time of the Enterprise Winter. At this point, the company usually has a long history of corporate politics, so that

a) all of its most important posts are occupied by corporate politicians with a non-ecological thinking, and

b) most of ecologically thinking Entrepreneurs have either left the company, or remained on an outsider role without any real influence.

To fix the flywheel, or to find out a new one, the enterprise needs (more) Entrepreneurs. But the corporate politicians (correctly) see them as a danger for themselves and fight them.

Different things can now happen depending on balance of power between the two groups. Entrepreneurs might win the battle, or at least manage to fix the flywheel while being constantly under attack. Or personal interests of corporate politicians might accidentally be best represented by a project that also fixes the flywheel. Or the flywheel has so much energy that it allows the company to survive for years and years, even in the damaged state, and then, another lucky external influence might fix it. Microsoft’s flywheel has been severely damaged around 10 years ago, and they have demonstrated both spectacular flywheel repairs and awful additional flywheel damages since than. Apple had experienced a similarly long period, the 12 years without Jobs.

But in the worst case, if the flywheel is weak and the corporate politics prevails, the agony might start, with all possible short-term potentials being sucked out of the flywheel, then staff getting laid off, and then all remaining assets being sold.

How M should an MVP be?

Minimum Viable Product is now mainstream. But what exactly does it mean?

In my opinion, MVP is just an example of a more generic principle: Fail Fast. In other words, if you have to fail, it is better to fail in the very beginning, reducing the amount of burned investment.

If my idea is good, using MVP is counterproductive: some early adopters will get bad first impression due to lack of some advanced features or overall unpolishness, and we will need to spend much more money later just to make them to give us another chance.

If my idea is bad, MVP will save us a lot of money.

Because there is no sure way to know if my idea good or bad beforehand, it is safer to assume it is bad and go with the MVP.

But how exactly minimal the product should be? Do we want to reduce the feature set? Or don’t care about usability? Or save on proper UX and design? Does it mean it may be slow, unresponsive, unstable? Can its source code be undocumented and unmaintainable?

Well, the reason of MVP is reducing the overall investment. The principle behind it, is investing just that much to achieve a sound and valid market test, and not more. This means, when deciding about MVP, you tend to cut the area what costs you most.

For example, let’s assume we have a product development team that needs only 1 day to design a screen, 3 days to develop the backend for that screen, and 10 days to develop the frontend. It is naturally, that MVPs produced by this team would tend to have great visuals combined with an awful and buggy UI and a very good backend.

Let’s assume now that a team needs a week to design one screen, 1 day to develop the frontend, and 5 days to develop the backend. MVPs of that team would tend to have ugly, but responsive and user-friendly UI that would often need to show the loading animation because of a sluggish backend.

What does it mean?

This means that a double advantage is given to teams capable of designing and fully developing one screen per day: not only their MVP will be released sooner (or alternatively can have more features, better look and performance and more user-friendly UI), but also it can be a well-balanced and therefore mature-looking product (that’s an advantage to be mature-looking).

And this also means, if you want to identify where your business has capacity issues, just look at your typical MVPs: if some areas of them are substantially worse than other, you know what areas of the product team can be improved.

Client Driven Development

When I first tried out the test-driven development (it was around 1998, I think), I was fascinated how it helped me to design better APIs. My unit tests were the first clients of my code, so that my classes obtained a logical and easy-to-use interface, quite automatically.

Some time later I’ve realized that if you have a lot of unit tests, they can detect regressions and therefore support you during refactoring. I’ve implemented two projects, each took a couple of years, and have written around 200 unit tests for each.

And then I’ve stopped writing unit tests in such big counts. My unit tests have really detected some regressions from time to time. That is, around 5 times a year. But the efforts writing and maintaining them were much higher than any advantages of having detected a regression before manual testing.

But still, I was missing the first advantage of TDD, the logical and easy-to-use interfaces. So I’ve started to do Client Driven Development.

The problem with the unit tests is that they don’t have any direct business value per se. They might be helpful for business goals, but in a very indirect way. I’ve replaced them with a client code that does have some direct business value.

For example, I’m developing a RESTful web service. I roughly know what kind of queries and responses it must support. I start with developing a HTML page. In there, I write an <a> tag with all the proper parameters to the web service. I then might write some text around it, documenting the parameters of the service. Then I open this page in the browser and click on the link, which predictably gives me a 404 error, because the web service is not yet implemented. I then proceed with implementing it, reloading my page and generally using it in place of a unit test.

Of course, this approach has the drawback that, unlike in a unit test, I don’t check the returning values and thus this page cannot be run automatically. If you want, you still can replace this link with an AJAX call and check the returning values — I personally don’t believe that these efforts would pay off at the end of the day. More important is that this page has an immediate business value. You can use it as a rough and unpolished documentation for your web service. You can send it to your customer, or another team writing some client, etc.

If the web service is designed in a way that it is hard to get away with <a> and <form> tags, I would write some JavaScript or Silverlight code to call it properly. In this case, the page might have more business-relevant functions. For example, when it loads, it might request and display some data from the web service, in a sortable and scrollable grid, and allow you to edit them, providing you with a very low-level “admin” interface to the service.

This approach is not constrained by web development. I’ve used it for example for inter-process communication, and if my code has not yet been refactored out, it is flying now in passenger airplanes, and running inside of TV sets in many living rooms. In this variant, I start developing the inter-process communication by creating a bash script or a trivial console app that would send the messages to another process. I implement corresponding command-line options for them. When I’m ready, I start developing the receiving part, inside of some running process. This has the similar effect on the API design as unit testing, but has the advantage that you can use it during debugging, or even in production, for example in some startup scripts.

I’m not an inventor of this approach, indeed I often see this approach in many open source projects, but I’m not aware of any official name for it.

 

Smart TV application software architecture

Someone come to my blog searching the phrase in the title of this post. To avoid disappointments of future visitors, here is a gist of what the architecture looks like.

First of all, let’s interpret the architecture very broadly as “most important things you need to do to get cool software”. According to this, here is what you need to do:

1) Put a TV set on the developer’s desk. And no, not “we have one TV set in the nearby room, he can go test the app when needed”. And no, not “there is a TV set on a table only 3 meters away”. Each developer must have an own device.

2) Get a development firmware for the device so that you’ll get access to all log files (and ideally, access to the command line). A TV set is a Linux running WebKit or Opera browser.

3) Most Smart TVs support CE-HTML and playing H.264 / AAC videos in MP4 format. Just read the CE-HTML standard and create a new version of your frontend. Alternatively, you might try to use HTML5, because many Smart TVs would translate remove control presses as keyboard arrow key presses; and some Smart TVs support the <video> tag.

4) If you’re interested in a more tight integration with the TV, eg. be able to display live TV in your interface, or switch channels, or store something locally, you need to choose a target ecosystem, because unfortunately there is no standard app API spec today.

MUSEing

For some reason, I meet people every day who don’t agree with my MUSE framework or at least implicitly have a different priority framework in their minds. Usually it looks like this:

“Let us conceive, specify, develop, bugfix and release the product, and then ask the marketing guys to market it”. Well, what if we first ask marketing, what topics can bring us the cheapest leads and then conceive the product around them, or at least not against them?

“Solution A is better from the usability standpoint than solution B, therefore we should do A”. Well, B is better for motivation, because it looks more beautiful, and beauty sells. I don’t care if something is easy to use, if it looks so ugly that nobody would want to use it.

“So does this bug prevents users from using the feature, or it is just some optics, and the functionality is all in place and working?” Well, users first needs to have a reason to use our product, second must be able to understand the product. Unless these two requirements are satisfied, it doesn’t matter if the functionality is working. It is different from, say, enterprise software, where users are in the work setting and have to use the software. In the entertainment market, nobody has to read the book, listen the song, or watch the movie to the end. Or use our web site.

“MVC is a great idea, because it allows us to decouple logic from view. Let’s quickly find and use some MVC framework for HTML5!” Yes, MVC is a great idea for enterprise software, because it makes UI easier to test, allows designers and developers to work in parallel, and provides reusable components in a very natural and effortless way. But one of the MVC drawbacks is that you can easily hit a performance issue in the UI, because this architecture prevents you from squeezing 100% of the performance from the UI technology. Besides, HTML5 MVC frameworks often wrap or abstract away DOM objects and events, and therefore make it complicated to define exactly, what is shown to the user. Another MVC drawback is that it helps you to believe that reusing exactly the same UI components overall in the app is always a good idea, because hey it is good for implementation and good for usability. But a seductively looking beautiful design is more important. Even if it looks a little different on different screens.

And sometimes it is quite hard for me to understand these opinions, because MUSE sounds for me so natural and logical that I can’t imagine any other consistent priority framework.

MUSE — an attempt of product priority framework

Steve Jobs said, product management is deciding what not to do.

But how do you decide?

This is the priority framework I’m trying to live today. It works like a Maslow’s pyramid: until the first level is solved, it is not possible or not necessary to solve the second level.

Motivation. People must be motivated to start using the product. If they don’t even see a reason to start using it, nothing else matters.

  • Good marketing or packaging.
  • Product / Market fit.
  • Seducing optic (design for conversion).

Usability. People must be able to use the product, and have fun using it. If they fail to use the product, nothing else matters.

  • It must be not too hard to learn how to use the product.
  • People must understand the product.
  • Any unnecessary task people have to do must be eliminated.
  • If possible, limit the product with the functions that are easy to make easy to use. Or at least start with these features.

Service. This is where the functionality and features come.

  • Not just a function, but a service, solving a user’s problem.
  • Not just a service, but a trusted first-class high-quality service, with a sincere smile, and the good feeling of having made a right choice.

lovE. This is the ultimate dream.

  • People that are attached to your product.
  • People that identify part of themselves with your product.
  • People that not only recommend your product, but start flame wars defending its drawbacks.

 

Software Architecture Quality

What is a good software architecture?

Too many publications answer to this question by introducing some set of practices or principles considered to be best or at least good. The quality of the architecture is then measured by the number of best practices it adheres.

Such a definition might work well for student assignments or in some academia or other non-business settings.

In software business, everything you ever spend time or efforts on, must be reasonably well related with earning money, or at least with an expectation to earn or save money.

Therefore, in business, software architecture is a roadmap how to achieve functional requirements, and non-functional requirements, such as run-time performance, availability, flexibility, time-to-market and so on. Practices and principles are just tools that may or may not help achieving the goals. By deciding, which principles and practices to use, and which tools to ban, a software architect creates the architecture, and the better it is suited for implementing the requirements, the better the architecture quality is.

Like everything in the nature, good things don’t appear without an equivalent amount of bad things. If some particular software architecture has a set of advantages helping it to meet requirements, it has to have also a set of drawbacks. In an ideal world, software architect is communicating with business so that the drawbacks are clearly understood and strategically accepted. Note that any architecture has its disadvantages. The higher art and magic of successful software development is to choose the disadvantages nobody cares about, or at least can live with.

For already implemented software, the architecture can be “reverse engineered” and the advantages and disadvantages of that architecture can be determined.

And here is the thing. Implemented software is something very hard to change, and so its architecture. Thus, the set of advantages and disadvantages remains relatively stable. But the external business situation or even company’s strategy might change, and change abruptly. Features of the architecture that were advantages before, might become obstacles. Drawbacks that were irrelevant before, might become show stoppers.

The software architecture quality might change from being good to being bad, in just one moment of time, and without any bit of actual software being changed!

Here are two examples of how different software architectures can be. This is for an abstract web site that has some public pages, some protected area for logged-in users, some data saved for users, a CMS for editorial content and some analytics and reporting module.

Layered cake model

On the server side, there is a database, a middleware generating HTML, and a web server.

When a HTTP request comes, it gets routed to a separate piece of middleware responsible for it. This piece uses an authentication layer (if needed) to determine the user, the session layer to retrieve the current session from a cookie, persistence layer to get some data from the database (if needed), and then renders the HTML response back to the user.

Because of these tight dependencies, the whole middleware runs as a single process so that all layers can be immediately and directly used.

If AJAX requests are made, it is handled in the same way on the server side, except that JSON is rendered instead of HTML. If the user is has to input some data, a form tag is used in HTML, which leads to a post to the server, which is handled by the server-side framework “action” layer, extracting the required action and the form variables. The middleware logic writes then the data into the database.

CMS writes editorial data in the database, where it can be found by middleware and used for HTML rendering.

A SQL database is used, and tables are defined with all proper foreign constraints so that data consistency is always ensured. Because everything is stored in the same database, analytics and reporting can be done directly on production database, by performing corresponding SQL statements.

Advantages:

  • This architecture is directly supported by many web frameworks and CMSes.
  • Very easy to code due to high code reuse and IDE support, especially for simple sites.
  • Extremely easy to build and deploy. The middleware can be build as a single huge executable. Besides of it, just a couple of static web and configuration files have to be deployed, and a SQL upgrade script has to be executed on the database.
  • AJAX support in the browsers is not required; the web page can be easily implemented in the old Web 1.0 style. Also, no JavaScript development is required (but still possible).
  • Data consistency.
  • No need for a separate analytics DB.

Disadvantages:

  • Because of the monolithic middleware, parts of the web site cannot be easily modified and deployed separately from each other. Every time the software gets deployed, all its features have to be re-tested.
  • On highly loaded web sites, when using the Web 1.0 model, substantial hardware resources have to be provisioned to render the HTML on the server side. If AJAX is introduced gradually and as an after-thought, the middleware often continues to be responsible for HTML rendering, because it is natural to reuse existing HTML components. So that the server load doesn’t decrease.
  • Each and every request generates tons of read and write SQL requests. Scaling up a SQL database requires a very expensive hardware. Scaling out a SQL database requires very complicated and non-trivial sharding and partitioning strategies.
  • The previous two points lead to a relatively high latency of user interaction, because each HTTP request, on a loaded web site, tends to need seconds to execute.

Gun magazine model

Web server is configured to deliver static HTML files. When such a static page is loaded, it uses AJAX to retrieve dynamic parts of the user interface. CMS generates them in form of static html files or eg. mustache templates.

The data for the user interface is also retrieved with AJAX: frontend speaks with web services implemented on the server side, which always respond with pure data (JSON).

There are different web services, each constituting a separate area of API:

  • Authentication service
  • User profile service
  • One service for each part of the web site. For example, if we’re making a movie selling site, we need a movie catalog service, a movie information service, a shopping cart service, a checkout service, a video player service, and maybe a recommendation engine service.

Each area of API is implemented as a separate web application. This means, it has its own middleware executable, and it has its own database.

Yes, I’ve just said that. Every web service is a fully separate app with its own database.

In fact, they may even be developed using different programming languages, and use SQL or NoSQL databases as needed. Or they can be just 3rd-party code, bought or used freely for open-source components.

If a web service needs data it doesn’t own, which should be per design a rare situation, it communicates with other web services using either a public or a private web API. Besides, a memcached network can be employed, storing the session of currently logged in users that can be used by all web services. If needed, a separate queue such as RabbitMQ can be used for communication between the web services.

Advantages:

  • UI changes can be implemented and deployed separately from data and logic changes.
  • Different web services can be implemented and deployed independently from each other. Less re-testing is required before deployment, because scope of changes is naturally limited.
  • Different parts of the web site can be scaled independently from each other.
  • Other frontends can be plugged in (mobile web frontend, native mobile and desktop apps, as well as Smart TV optimized web frontend)
  • Static files and data-only communication style ensure lowest possible UI latency. This is especially true for returning users, who will have almost anything cached in their browsers.

Disadvantages:

  • Build and Deployment is more complicated: at very least, some kind of package manager with automatic dependency checking has to be used.
  • Middleware code reuse is more complicated (but still possible).
  • Data might get inconsistent, and software has to be developed in a way it can still behave in a reasonable and predictable way in case of inconsistencies.
  • AJAX support on the client side is a requirement.
  • A separate data-warehouse or at least a SQL mirroring database is required to run cross-part and whole-site analytics.

Tabu search

There is no simple solution how to achieve the best or at least a good conversion rate. Usually, one just makes an assumption, develops a page, tracks user behavior, analyses the data — and then makes another assumption. This repeats until one can find a satisfying conversion rate, or one runs out of time.

Interestingly enough, there is a quite large amount of mathematical work regarding a similar problem. In the mathematical slang, they call it the global optimization problem.

Mathematically, conversion rate optimization can be described as finding local (or at best, global) maximum of function f(X), where X is a vector of different factors influencing conversion rate. Unlike a typical mathematical optimization problem, the function f is not known beforehand, and both the function f and the factors included in the vector X might change after each optimization step.

There are a lot of metaheuristics formulating different strategies that can be used for the optimization. I’ve just read about a very small subset of them, and I find that the strategies are very interesting. Especially, their names. Who has said mathematicians are not creative?

This is my interpretation of them, as applied to the conversion rate optimization problem.

Gradient descent
You make a small change of all factors in X at the same time, to maximize the conversion rate as you see it. Then you measure it. Then you try to understand, what factor change worked positively and what negatively, and revert changes in wrong factors while increasing the change in the right factors. Iterate, until a local maximum is found.

Hill climbing
Start with a baseline of factors X. Now change the first factor x1 in X. Measure conversion rate. Revert x1 and change second factor x2 in X. Measure. Repeat testing, until all factors in X are tested. This procedure can also be done parallelly using a multi-variate A/B testing. However the testing is done, at the end of the day we can find out, which winning factor xi has made the largest improvement in conversion rate. Now establish a new baseline X’, which is just like X, but with the improved winning factor xi. Continue iterating from this new point, until a local maximum is found. It is quite probably that in each iteration a different factor will be improved.

These two strategies have the following drawbacks:

  • Walking on ridges is slow. Imagine the N-dimensional space of factors X, and the function f as a very sharp ridge leading to a peak. You might be constantly landing on a side of ridge, so that you are forced to move to the peak in zigzag. This takes valuable time.
  • Plateaus are traps. Imagine that no change of the current baseline X is measurably better than X. You know you’re on a plateau. Perhaps it is also the highest possible point. Or perhaps, if you make some steps, you’ll find an even higher point. The problem is, in which direction should you go (i.e. which factors do you want to change?)
  • Only local maximum might be found. If your starting point is nearer to a local maximum than to a global maximum, you’ll reach the smaller peak and stop, because any small change of the current baseline X will be measurably worse than X.

To fight these problems, a number of strategies has been developed.

In the shotgun hill climbing, when you’re stuck, you just restart the search from a random position that is sufficiently different from your latest one.

In the simulated annealing, you are allowed to go into directions what actually show worse conversion rates, but for some short time. This can help to jump over abysses in function f, if they are not too wide.

And in tabu search, if one is stuck, one explicitly ignores all the positive factor changes and explores all the other factors (that previously didn’t play a big role).

How to handle errors

It seems I’m going through my old articles and re-writing them. The first version of How to estimate has been written in 2008, and I wrote the first version of this article around 2004. Originally it was focused around exceptions, but here I want to talk about

  • Checking errors
  • Handling errors
  • Designing errors

Checking errors

In OCaml programming language, you can define so called variant types. A variant type is a composite over several other types; an expression or function can then have the value belonging to either one of those types:

type int_or_float = Int of int | Float of float

(* This type can for example be used like this: *)
let pi_value war_time =
    if war_time then Int(4) else Float(3.1415)

# pi_value true
Int 4

# pi_value false
Float 3.1415

(you can try this code online http://try.ocamlpro.com/)

OCaml is a very statically typed, very safe language. This means, if you use this function, it will force you to handle both the Int and the Float cases, separately:

# pi_value true + 10  (* do you expect answer 14? no, you'll get an error: *)
Error: This expression has type int_or_float
       but an expression was expected of type int

(* what you have to do is for example this: *)
# match pi_value true with
      Int(x) -> x + 10
    | Float(y) -> int_of_float(y) + 10
int 14

The last line checks if the returning value of pi_value true is actually an Int or a Float, and executes different expressions in each case. If you forget to handle one of the possible return types in your match clause, OCaml will remind you.

One of the idiomatic usages of variant types in OCaml is error handling. You define the following type:

type my_integer = Int of int | Error

Now, you can write a function returning a value of type “integer or error”:

let foobar some_int =
    if some_int < 5 then Int(5 - some_int) else Error

# foobar 3
Int 2

# foobar 7
Error

Now, if you want to call the foobar, you have to use the match clause, and you should handle all possible cases. For example:

let blah a b =
    match (foobar a, foobar b) with
          (Int(x), Int(y)) -> x + y
        | (Error , Int(y)) -> y
        | (Int(x), Error)  -> x
        | (Error , Error)  -> 42

Not only this language design feels very clean, but also it helps to understand that errors are just return values of functions. They are part of the function codomain (function range), together with the “useful” return values. From this point of view, not checking and not being able to process error values returned by a function, should feel equally strange as if we wouldn’t be able to process some particular integer return value.

Still, often I don’t check for errors. I think, it is related to the design of many mainstream languages, making it harder to emulate variant types or to return several values. Let’s take C for example:

int error_code;
float actual_result = 0;

error_code = foo(input_param, &actual_result);

if(error_code < 0) {
  // handle error
} else {
  // use actual_result
}

and compare it with OCaml, which is both safer and more concise:

type safe_int = Int of int | ErrorCode of int

match foo input_param with
      Float(f) -> (* use result *)
   |  ErrorCode(e) -> (* handle the error *)

Unfortunately, most of us have to use mainstream languages. Error checking makes source code less readable, therefore I try to counteract it by using a uniform specific code style for error handling (eg. same variable names for error codes and same code formatting).

Recap: checking for errors is the same as being able to handle the whole spectrum of possible return values. Make it part of your code style guide.

Handling errors

In Smalltalk, exceptions (any many other things) are implemented not as a magical part of the language itself. They are just normal classes in the standard library. (if you’re ok with Windows, try Smalltalk with my favorite free implementation, otherwise go for the cross-platform market leader. In Smalltalk, REPL is traditionally called a Workspace)

Processor "This global constant gives 
           you the scheduler of Smalltalk 
           green threads"

Processor activeProcess "This returns the green thread 
                         being currently executed"

Processor activeProcess exceptionEnvironment "This gives the 
                                              current ExceptionHandler"

my_faulty_code := [2 + 2 / 0] "This produces a BlockClosure, 
                               which is also known as closure, 
                               lambda or anonymous method 
                               in other languages"

my_faulty_code on: ZeroDivide 
               do: [ :ex | Transcript 
                              display: ex; 
                              cr] "This will print the 
                                   ZeroDivide exception 
                                   to the console"

The latter line of code does roughly the following:

  1. The method #on:do: of the class BlockClosure creates a new object ExceptionHandler, passing ZeroDivide as the class of exceptions this handler cares about, and the second BlockClosure, which will be evaluated, when the exception happens.
  2. It temporarily saves the current value of Processor activeProcess exceptionEnvironment
  3. Sets the newly created ExceptionHandler as the new Processor activeProcess exceptionEnvironment
  4. Stores the previously saved value of exception handler in the outer property of the new ExceptionHandler.

This effectively creates a stack of ExceptionHandlers, based on a trivial linked list, and having its head (the top) in Processor activeProcess exceptionEnvironment.

Now, when you throw an exception:

ZeroDivide signal

the signal method of the Exception class, which ZeroDivide inherits from, starts with the ExceptionHandler currently defined in Processor activeProcess exceptionEnvironment and loops over the linked list, until it finds an ExceptionHandler suitable for this. Then, the exception object passes itself to the handler.

Not only it looks very clean and is a brilliant example of proper OOD, but also it helps to understand that exceptions is just a construct allowing you not to check for the error in the immediate function caller, but propagate it backwards in the call stack.

Now why is it important?

Because one thing is to check for error, and another thing is to handle it, meaningfully. The latter is not always possible in the immediate caller.

Deciding how to handle errors, meaningfully, is one of the advanced aspects of software development. It requires understanding of the software system I’m working on, as a whole, and the motivation to make code as user-friendly as possible — in the most generic sense: my code can be used by linking and calling it from another code; or an end-user would execute it and interact with it; or somebody will try to read, to understand, to debug and to modify my code.

What makes things worse is the realization that most of time, sporadic run-time errors happen in a very small percentage of use-cases, and therefore they are usually associated with a quite small estimated business value loss. Therefore, from the business perspective, only a small time budget can be provided for error handling. We all know that and when under time pressure, the first thing most developers compromise is error handling.

Therefore, every time I decide how to handle an error, I try to answer all of the following questions:

  • How far should we go trying to recover from the error, given the limited time budget?
  • If the user is blocked waiting for results of our execution, how to unblock him, but (if possible) not to make him angry?
  • If the user is not blocked, should we inform him at all?
  • If we assume a software bug being the reason of an error, how to help testers to find it, and developers to fix it?
  • If we assume an issue with installation or environment, how to help admins to fix it?

Usually, this all boils down to one of the following error handling strategies (or a combination of them):

  • Silently swallow the error.
  • Just log the exception.
  • Immediately fully crash the app.
  • Just try again (max. N times, or indefinitely).
  • Try to recover, or at least to degrade gracefully.
  • Inform the user.

I’ll try to describe a typical situation for each of the handling strategies.

I’m using a third-party library that throws an exception in 20% of cases when I use it. When this exception is thrown, the required function will still be somehow performed by the library. I will catch this specific class of exceptions and swallow them, writing a comment about it in the exception handler.

I’m writing a tracking function, which will be used 10 times a second to send user’s mouse position from the web browser back to the web server. When posting to the server fails for first time, I will log the error (including all information available), and either swallow all other errors, or log every 10th error.

The technology I’m using for my app allows me to define what to do, if an exception is unhandled. In this handler, I will implement a detection if my app is running on development or staging; or in production. When running on production, I will log the error and then swallow it. If it not possible to swallow the error, I’ll inform the user about unexpected error (if possible using some calming down graphic). Not on production, I will crash the app immediately and write a core dump, or at least log the most accurate and complete information about the exception and the current app state. This will help both me and testers to detect even smaller problems, because it will help creating a no-bug-tolerance mindset.

I’m writing a logger class for an app working in the browser. The log is stored on the web server. If sending the log message fails, I will repeat the post 3 times with some delay. If it still fails, I’ll try to write it into the local offline storage. If writing in this storage fails, I will output it to the console.

I’m writing some boring enterprise app with a lot of forms. User clicks on a button to pre-populate the form with data from the previous week. In this event handler, I will place a try/catch block, and my exception handler will log the exception. I will then go chat with the UX designer to decide if and how exactly to inform the user about the issue.

Recap: handling errors is not trivial — there are no hard and fast rules, and you need to know about the whole system and think about usability do to it properly.

Desiging errors

Designing errors is deciding when and how to signal error conditions from a reusable component (framework). If handling errors is complicated, because you have to know the overall context and think about usability, designing errors is in order of magnitude more complicated, because you have to imagine all possible systems, contexts and situations where your code will or can be used, and design your errors so that they can be handled easily (or at the very least, can be handled reasonably).

Frankly speaking, I haven’t designed an error system (yet) I were particularly proud about, and I think this complicated topic is pretty subjective and a question of your style.

My personal style is to believe that my framework or library is just a guest, and the calling code is a host. As a guest, one must respect decisions of the host and do not try to force any specific usage pattern. This is why most (but not all) of my properties and methods are public. I don’t know how the host is going to communicate with me, and I’m not going to force one specific style over him, or declare some topics taboo. I still specifically mark preferred class members though, so that I can indicate my own communication preferences (or suggested API) to the host. I also warn the host in the comments that all members outside of the suggested API are subject to change without notice. But ultimately, it is up to host what to use and how.

This approach has the drawback that the developer writing the calling code has to understand must more about my component, and that he has less support from IDE and compiler detecting usages of class members that I don’t recommend to use. But it has the advantage of giving the greatest possible freedom to the host. And I believe that I have to trust in host that he won’t use his freedom to shoot himself in the leg.

When designing errors, I think I should follow the same idea. This means:

  • If possible, do not force the caller to check for errors. It is his choice. Maybe he is just prototyping some quick and dirty idea and needs my component for a throw-away code.
  • If possible, do not handle errors in the component, but let the calling code to handle them. Or at least, make it configurable. Specifically, do not retry or try to recover from an error, because retrying or recovering takes time and resources, and the host might have a different opinion about what error handling is appropriate. Provide an easy API for retrying/recovering though, so that if the caller decides to recover, it would be easy for him. Another example: do not log errors, or at least make the logging configurable. In one of my recent components I’ve violated this recommendation. When the server didn’t respond timely, 3 lines were added to the log instead of one: the first one has been added by a transport layer, the second one from the business logic that actually needed data from the server, and the third one from the UI event handler. This is unnecessary. Logs must be readable, and usability of the logging is one of the most important factors separating well designed error handling from the bad designed one.
  • Do not provide text messages describing the error. The caller might be tempted to use them “as is” to show them in a message box. And then he’ll get problems, when his app will need to be translated into another language.
  • Provide different kinds of errors, when you anticipate that their handling might be different. For example, if the component is a persistency layer, provide one kind of exceptions for problems related with network communication with the database, and other kind of exceptions for logical problems like non-unique primary key when inserting a new record, or non-existent primary key when updating a record.
  • Add as much additional information into the error as possible. In one of my projects I went so far: when downloading and parsing of some feed from the web server failed, I’ve added the full http request (including headers and body) and full http response, along with the corresponding timestamps, into the error.
  • If possible, always try to signal errors using one and the same mechanism. In my recent project, some of my functions have signaled the error by returning 0, other functions by returning –1, and yet another one has accepted a pointer to the result code as argument.

Recap: Designing errors is even more complicated than handling them.

My Web Toolkit

Sooner or later, every developer creates an own toolkit of small utilities, which is reused from project to project. I cannot share sources of my toolkit, because I don’t own them. But I think, describing some of my ideas can help others. Here is top 3 of my favorite web development ideas.

3. MagicConf

Magic numbers are normally replaced with named constants. But if you think about it, making those constants to parameters often makes more sense in web apps.

Unlike system code, where magic numbers might represent some port address or byte offset in MP4 file, typical web app magic numbers are, for example, timeouts and number of retries, or animation duration, or URLs to various backend web services (who has said magic “numbers” cannot be strings?). Making all those things to be parameters can suddenly both make development and debugging much more efficient and enjoyable; and create more business value a for very little investment.

Normally I gather all those things as properties of a singleton object. When this object gets created, its properties get some reasonable default values. Next, when the app is fully loaded and starts initialization, I overwrite the default property values by reading the corresponding app configuration. Next, I parse the URL query string parameters of current document. When the name of a query string parameter can be found as a property of my singleton object, I overwrite its value again.

Why is it cool? It is useful in uncountable number of ways. Here are just some examples.

  • I can change the backend my web app is speaking to, by appending &BackendUrl=xxx in my browser address box. Very handy for development, or for checking how the new frontend version behaves with the current live version of the backend.
  • When developing a complicated animation, I can set its duration to several seconds, to see exactly, how it goes from key point to key point.
  • When developing an error overlay telling the user about the timeout, I can set the timeout to 10 milliseconds (I always have all time and duration parameters in milliseconds) and get the timeout for each request. No need to put Thread.Sleep() somewhere in the backend, recompile it, then waiting 30 seconds every test run, then removing the sleep again…
  • A designer can come by to me, we run an animation, she tells me what to change, I write another parameter in the URL, reload the page — here we go, we can see the change in action. And after several iterations, when she has prepared the perfect UX, we can copy the browser address line and just send it to some other decision maker for final approval.
  • You need to demo your app to someone important, but your backend is not fully operational. You just send them a link with BackendUrl parameter pointing to your test server containing mock data.
  • Because I tend to replace most of important literals with parameters, someone can change almost anything about how my app works or looks like, just by changing its configuration file. No need to bother me with trivial change requests.
  • Finally, the most important one. My web app on production behaves badly. I cannot reproduce it locally, but its reproducible with the app on the live server. I go to the live server, add URL parameter LoggingVerbosity=255, and then read a LOT of things in the console.

Yes, I leave MagicConf in production code — for purpose.

Q. But is this secure (for us)?
A. As soon as your visitor has obtained the web app, it belongs to him. Changing any source code is almost as easy as changing a URL parameter. The backend web services have to make sanity and security check anyway.

Q. But is this secure (for our customers)? eg. XSS, CSRF, other JavaScript injection things…
A. As soon as we don’t eval the parameters, we should be reasonably safe. When in doubt, perform a security review of your code, or use your common sense. For example, I wouldn’t use MagicConf in a web page for credit card processing.

Q. But this means we lose control over the app!
A. As soon as your visitor has obtained the web app, it is his app, not yours.

Q. But visitors can change our UX and then share a link on Facebook, damaging our CI.
A. Some mega corporations spend millions of dollars to animate folks on the Internet to remix their logos or advertisements. If you have visitors that are ready to spend time remixing your app, it is a good problem to have.

2. Mockend

I prefer having a pure API between the web app and the backend: the backend provides data, and is responsible for authentication, authorization and payment. Everything else happens in the frontend. Having it this way, normally results in a very concise, truly RESTful communication. And as a side effect, it is easy to mock.

When I develop web apps, I always mock the backend, unless I’m also the one who develops the backend. In the cases when I’m also backend developer, I mock the backend only in 50% of cases.

The process can for example look like this:

  1. I create a new file with .json (or .xml — it depends on project) extension and open it in my text editor. I also create a new Word file, and write there “Project X API Specification”.
  2. Then I write first piece of the JSON data I need my web app to get from backend, and save the file.
  3. Then I configure the directory with this file in my local web server, and point my web app to this directory to use it as mock backend (or Mockend, for short)
  4. Next, I write the web app code to get and use the JSON data in my UI.
  5. Run the app, see the data in the UI. Possibly change the data, add or remove data parameters or re-shuffle fields.
  6. When I’m satisfied, I copy the JSON from this mock file into my Word file, and create another chapter for this RESTful endpoint. If I need to, I describe the data exchange format in the Word more thoroughly, for example, provide minimal and maximal limits for integers.
  7. Send a copy of the Word file to backend for development.
  8. Rinse and repeat for the next RESTful endpoint.

Why is it cool?

  • My development timeline doesn’t depend on the backend developer.
  • My software quality and stability don’t depend on the backend developer. When I’m ready, I can immediately show my app, finely working on my mock files.
  • My web app can be delivered to QA and customer approval much sooner, if backend development happens to be slower than me.
  • I define the API, and I define it to be perfect for the web app. While the backend can usually generate data in any format, they can’t tell a comfortable format from a not very comfortable one. I can.

The process can also be backend driven, if needed:

  1. I call the existing backend with proper parameters, and save the result into my local Mockend. Now I have the full control over the data, and can for example create some other mock files testing extreme conditions (no result, too large result, format errors, etc).
  2. I write the web app to get data from my local mock file and use it in the UI.
  3. Rinse and repeat.

1. Modularity

Last time I’ve developed a browser-based web app for money, I’ve used Microsoft Silverlight. It is much cleaner than HTML5 in that, that a) there are apps, which are downloadable zip archives of graphical assets and compiled code, just like mobile apps, and b) it has a traditional control model, with control tree, user-defined controls, and all that stuff. There is also a plugin framework allowing to package separate controls into apps, and then download one app from another and get access to all its code and assets — dynamically in the run-time.

This has enabled the following coolness:

  1. You develop a control, for example a carousel with wet floor reflections (who has said iTunes?)
  2. You package it into an own app, and in the Main method of it, you instantiate it using some dummy data (of course, parametrized with Magic Conf)
  3. At this point, you can already show this app to your customer and QA, unit test it, let UX designers tinker with its animations, PMs approve it, and so on.
  4. At the same time, in your main app (the real app) you develop loading of this carousel plugin, and instantiate the control feeding it with real data from the backend (or your Mockend).
  5. When you’re ready with that, you can show the carousel as part of the real app, and get final approval for it.

A neat side effect of this approach is that all your modules are built as separate app files, so that if you take some care defining the proper interfaces between your main app and the controls, you are off for a modular deployment. This means, if somebody suddenly needs to replace carousel with a iPad-like grid, you can just develop a new control, package it into the app, and then deploy just this app on the server, replacing the old carousel app with it. Nothing will get hurt.

Oh, and it improves load time of your app, because plugins are loaded on-demand.

As for the HTML5 world, I’m still lacking hands on experience. Web components als things like Mustache seem to go to proper direction. I’m still not sure though, if they are not against the HTML spirit and the ideology of most popular frameworks.

Categories

Recent Tweets

  • New blog post: How to merge, 3/3 http://t.co/gFjDEJVs 5 years ago
  • Oh boy, how quick time goes by. Do you recognize that puppy toy language of 90ies in this young dynamic professional http://t.co/J4k53xn4 5 years ago
  • New blog post: How to merge, 2/3 http://t.co/wXIUAEZZ 5 years ago
  • New blog post: How to merge, 1/3 http://t.co/1lOQUp56 5 years ago
  • Just posted a photo http://t.co/pCUA758O 5 years ago

Follow Me on Twitter

Powered by Twitter Tools

Archive