Four Advices For Product Managers of Machine Learning Products

The hype around Big Data and Machine Learning goes on and on, and more and more businesses seem to obtain competitive edge by developing data-based products. As a product manager, everyone is considering to use this new tech in their area.

Having made some first experiences designing data-based products, I want to share the lessons learned.

Artificial Intelligence is about having Less Control

In a traditional software product, we fully control its internal logic. On contrary, with a data-based product, we give up some part of the logic to the Machine Learning model. This is fully intended. In fact, this is why we want to use an A.I. at all.

For example, if we want to recognize images of kittens, we could define exact rules of how to process and transform the colors of pixels, by ourselves. But this would be a tedious and complex, if not impossible task, at least for human software developers. Instead, we would train a ML model that would accept images on its input, and would output the detection result, “somehow”.

And here is the dark side of the coin: this “somehow” means that we
1) cannot explain how exactly the model has made its decision, and
2) cannot find “just that one single screw” in the model to tune its behavior, because ML Models can contain millions of “screws” you can tune, and as a human, you cannot find the right one; and finally
3) we have to accept that the ML model is making right detections often, but not all the time.

In the practice, this all leads to the following advices:

1. Design for a Mistake

Most ML models produce results that are only statistically correct – i.e. only some big part of the users will obtain correct or at least satisfactory results. Howewer, some sizable part of the users will not get satisfactory results, and this part could be uncomfortably large (compared to the traditional products), especially because the ML model can make mistakes also in the situations, which for us humans would appear inacceptable, for example it could recognize a kitten in the photo of your CEO.

It is our task as product manager to proactively counteract the possible user dissatisfaction. Here are some ideas how to do so.

Human Moderation

We are working on a gallery of images with kittens and we want that only kittens can be published.
Wrong: if no kitten is detected in the uploaded image, prevent the user to post it.
Right: inform the user that the image is sent to manual moderation process, and hire moderators.
Alternatively: post the image in an “unsorted” category, let users tag inappropriate images, and automatically hide the image detected as “no kittens” after even the very first user complain.

Sort Instead of Hide

We have a baby products shop. We want to recognize the possible age of the users baby and to show them only items suitable for their age.
Wrong: hide the items of an inappropriate age
Right: sort the items of an inappropriate age “below the fold”
Alternatively: hide the inappropriate items, but provide good visible buttons “<<< Articles for younger babies” and “Articles for older babies >>>”

Suggest Instead of Fill

We have a marketplace for used products and we want to generate the headline for new listings automatically, to speedup and simplify the listing publishing process
Wrong: make the headline not editable and generate it automatically
Right: Fill the generated headline as predefined (default) value of the text box, remove it as soon as user starts typing something else.

Note that to be able to implement this idea, the overall publishing workflow has to be changed: instead of starting with the entering a headline, the user can start by uploading article images or filling some structured data, which is needed to generate the headline for him.

Upsell

We want to help users to estimate the value of their real estate.
Wrong: ask user to fill a form, then output the result of the model.
Right: output a wide range of valuations as soon as user has typed the location, and let the user either to enter more data to reduce the valuation range, or, if measuring or getting the data is too complicated for them, suggest them to order a human evaluation service.

Feedback to Calm Down

We want to show some ads that are as relevant to the users as possible.
Wrong: just show the ads produced by a recommender model.
Right: additionally, show a button allowing the users to give us the feedback (“Less of this topic”) or to turn off the ads (see the Upsell idea).

2. Test in Production

A traditional QA process includes comparing the documented intended logic of the product with the actual product behavior. But because we don’t know the exact logic of the A.I. models, we also cannot document it so the testers also cannot check it. Besides, even if we could document the logic, this document would be so complicated that it would be infeasible to test all the cases.

This leaves us with the following options:

Predefined Mockups

In your services, define some “magic” identifiers that would prevent going through the A.I. pipeline and return predefined results back. For example, we can add a logic into our kitten recognition service that would always return “kitten” for images of 1×2 pixels, and “no kitten” for images of 2×1 pixels, given that it is unprobably that real users would use this image sizes in production. A tester would be able at least to test the non-A.I. parts of the product (uploading, publishing, searching etc)

Exploratory Testing

The traditional exploratory testing is still possible with data-based products, but is always more expensive. For example, the testers would need to prepare testing sets with images of kittens, dogs, horses, lions, dolphins, NSFW-images, etc.

The exploratory testing is especially laborous for products that act on the previous user behavior, such as recommenders, especially if recommenders use some out-of-band data like current date and time, the weather outside, or any interesting shows running currently on TV.

A/B Testing

Therefore, the data-based products rely on A/B testing much more often than any traditional product, because that would compensate for the lack of the traditional QA, which is often infeasible due to time or budget constraints.

3. Document for Reproducibility

Software documentation is often part of prescribed software development processes. In traditional products, the documentation of their business logic is often considered to be the main and most important part.

As a product manager of a data-based product, you can often be confronted with the question, “Where I can find the documentation how the product decides to do this and that in a such and such situation”. And then you will need to explain that there is no documentation, because nobody knows the logic and nobody is able to know it either.

Here, we have to make one step back and remind ourselves, that the main goal of documentation is to enable the maintenance and further development of the software. For the data-based products, the key factors for this is being able to reproduce the model training, and being able to re-train the model with some new, or modified data. There are some tools on the market to manage the datasets (the most popular being Pachyderm), while we have also created our very own framework for this: https://github.com/Immowelt/iwlearn

4. Create Product Ideas with Data Mining, or Be Flexible

The usual process of product idea discovery includes doing interviews with the users and other UX research, filling out business canvas and performing SWAT analysis, scheduling brainstormings, and a lot of other important and effective tools.

What it does not include, is, to check whether the data in your data lake really has the quality level required to implement the Machine Learning model you need.

To give you some examples of what can diminish your data quality:

something that won’t be collected from user and cannot be inferred from other data. For example, if we don’t ask how many rooms there is in the apartment, it is very hard or impossible to realiably infer it from text or images,
something that is not validated during collection, for example users can enter literally anything as their zip code,
something that has been collected, but disappeared in further data processing steps. For example, the user id could be actually collected during his item search process, but then be removed in the later steps due to the data minimization principle of EU GDPR. In this case we cannot create recommendation models based on what else items this user was interested in the past. This is to demonstrate that low data quality not always means some computer bugs that can be quickly fixed, but could also be a well intended state.

Unexpected low data quality can hit you badly. In the worst case scenario you would successfully pitch an idea, get it on your company roadmap, start implementing it and only after investing 80% of the data science efforts you would recognize that the resulting ML Model cannot be used as expected, for example because it has a very low accuracy, so too many users would be annoyed by it.

Be Flexible

This is why you need to have a plan B for this kind of situations. One typical solution would be using some traditional business logic instead of the A.I., and accept the possible less than expected uplift of the products KPIs. Another solution would be to understand, what exactly part of the available data has good quality, and to quickly conceive, pitch, prioritize and implement a completely different product, possibly with some other target group, product focus or KPIs, but at least technically feasible as it uses a good quality data.

Mine Data

Another, a fundamentally different way to approach this problem is to generate product ideas with Data Mining.

Your data scientists would start with looking at data about some topic (for example, the behavior of users who like kittens) and start to mangle it in different ways, for example creating clusters of users (white kitties versus black kitties lovers), visualizing user retention, cohort analysis and so on. As a by-product, the data scientists will identify and eliminate the bad data: all that bots and crawlers, and users posting NSFW pictures, and can generate some insights that hopefully would be useful to generate an idea.

What if they can discover that a sizable part of the users “like” good kitten images within minutes after they appear online? You can make a new product feature out of it!

For example, for any new kitten image you would find out the users that most probably would like it, and send them a push notification, so that they can enjoy and like the image immediately, thereby increasing their retention on the web site.

The advantage of this process is that you’re guaranteed that you have data at least in some reasonable quality, before you start pitching and prioritizing your product idea.

The disadvantage is that you cannot guarantee that data mining would produce any insights related to the current company goals and focus user groups, so that you need to be flexible here too and be able to make good products, even though they not always contribute to the current company focus.