Maxim Fridental

On Death of the Russian Culture

I was born in the USSR. When my country died, I didn’t feel anything special, maybe just a little hope: this was a new opportunity to do better, to learn from our mistakes, to become a member of a peaceful global world.

Besides, I’ve clearly separated the country – a bureaucratic, infected with the communist ideology, useless monster that have destroyed lakes and rivers and killed millions of its own citizens in Gulag – from the Russian culture.

With over 1000 years of history, the Russian culture came up with novel philosophical ideas to answer major questions of life, discovered natural laws, invented useful technology and created art on the international level.

The Russian Culture has survived enslavement by the Mongols, Tsarist Regimes, the Communist Revolution, and two World Wars, so I didn’t need to worry back then, when USSR fell apart. I’ve had my Russian Culture. It would live with me and I’d pass it on to the new generations.

Well, not after February 24th, 2022.

All the dear old fairy tales my parents have read to me in my childhood, all the lullaby songs they sang to me, all their explanations of what is good and what is bad, all the Russian ways I know how to live life and how to be successful and how to resolve conflicts and how to solve problems, all the religion and philosophical ideas, all my favorite, deeply meaningful songs, smart movies, tender cartoons, awesome books, and finally, the most important, my understanding of what is Love and how to love that I have learned from my mom – all of it became obsolete.

Yes, it still lives within me. But how am I supposed to pass it over to the new generations? The culture of the very that nation that did Bucha, Mariupol, Charkow, that has raped, beaten, tortured, and murdered children, women, elderly and innocent civilians in the hundreds of towns and villages in Ukraine. The culture of people who lied about their military involvement in Krim and Donbass, who poisoned their opponents worldwide, who threaten the whole world with their nuclear weapons, who let their leaders to brainwash them, who have shelled pregnant women, murdered tiny girls in front of their mothers, destroyed homes of hundreds of thousands of people and inflicted hunger and energy emergency around the world?

How can I ask my nephew if he wants me to read a Russian fairy tale, if I am deeply ashamed of being Russian and I am calling myself “German” in public now?

Millions of brilliant Russians had ideas, discoveries, lessons learned, and art the whole world needed to hear and to benefit from. Now, all of it is gone. Now, a big part of me is dead.

Data Lakes are Technical Debt

I’ve been working on big data since 2014 and I’ve managed to avoid taking the technical debt of data lakes so far. Here is why.

Myth of reusing existing text logs

For the purpose of this post, let’s define: a data lake is a system allowing a) to store data from various sources in their original format (included unstructured / semistructured) and b) to process this data.

Yes, you can copy your existing text log files into a data lake, and run any data processing on them in the second step.

This processing could be either a) converting them into a more appropriate storage format (more about it in just a minute) or b) working with the actual information – for example, exploring it, creating reports, extracting features or execute business rules.

The latter is a bad, bad technical debt:

Text logs are not compressed and not stored by columns, and have no secondary indexes, so that you waste more storage space, RAM, CPU time, energy, carbon emissions, upload and processing times, and money if you casually work with the actual information contained in there.
Text logs doesn’t have a schema. Schemas in data pipelines play the same role as strict static typing in programming languages. If somebody would just insert one more column in your text log somewhere in the middle, then your pipeline will in the best case fail weeks or months later (if you execute it for example only once a month), on in the worst case it will produce garbage as a result, because it wouldn’t be able to detect the type change dynamically.

Never work off text logs directly.

A more appropriate format for storage and for data processing is a set of relational tables stored in compressed, columnar format, with a possibility to add secondary indexes, projections, and with a fixed schema checking at least column names and types.

And if we don’t work off the text logs directly, it makes no sense to copy them into a data lake – first to avoid the temptation to use them “just for this one quick and dirty one-time report”, but also because you can read the logs from the system where they are originally stored, convert them into a proper format, and ingest into your relational columnar database.

Yes, data lakes would provide a backup for the text logs. But YAGNI. The only use-case where you would need to re-import some older logs is some nasty hard-to-find bug in the import code. This happens rarely enough to be willing to use much cheaper backup solutions than the data lake.

Another disadvantage of working with text logs in data lakes is that it motivates to produce even more technical debt in the future.

Our data scientist needs a little more information? We “just add” one more column into our text log. But, at some point, the logs become so big and bloated that you won’t be able to read them with your naked eye in any text editor, so that you’d lose the primary goal of any text log: tracing the state of the system to enable offline debugging. And if you add this new column in the middle, some old data pipelines can silently break and burn on you.

Our data scientist needs information from our new software service? We will just write it in a new text log, because we’re already doing it in our old system and it “kinda works”. But in fact, logging some information:

logging.info('Order %d has been completed in %f ms' % (order_nr, time))

takes roughtly as much effort as inserting it into the proper, optimal, schema-checked data format:

db.insert(action='order_completed', order=order_nr, duration_ms=time)

but the latter saves time, energy, storage and processing costs, and possible format mistakes.

Myth of decoupled usage

You can insert all the data you have now, store it in the data lake, and if somebody needs to use it later, they will know where to find it.

Busted. Unused data is not an asset, it is a liability:

you pay for storage
you always need to jump over it when you scroll down a long list of data buckets in your lake,
you might have personal data there so you have one more copy to check if you need to fulfill GDPR requirements,
the data might contain passwords, secure tokens, company secrets or some other sensitive information that might be stolen, or could leak.
Every time you change the technology or the clould provider of your data lake, you have to spend time, effort and money to port this unused data too.

Now, don’t get me wrong. Storage is cheap, and nothing makes me more angry at work than people who would delete or not store data, just because they want to save storage costs. Backup storage is not so expensive as data lake storage, and de-personalized parts of data should be stored forever, just in case we might need them (but remember: YAGNI).

Storing unused data in a data lake is much worse than storing it in an unused backup.

Another real-world issue preventing decoupled usage of data is how quickly the world change. Even if the data stored in the data lake is minutiously documented up to the smallest detail – which is rarely the case – time doesn’t stand still. Some order types and licensing conditions become obsolete, some features don’t exist any more, and the code that has been producing data is already removed, not only from the master branch, but also from the code repository altogether, because at some point the company was switching from SVN to git and they had decided to drop the history older than 3 years, and so on.

You will find column names that nobody can understand, and column values that nobody can interpret. And this would the best case. In the worst case, you would find an innocent and fairly looking column named “is_customer” with values 0 and 1, and you will mistake it for a paying user and you will use it for some report going up to the C-level, only to painfully cringe, after somebody would suddenly remember that your company had an idea to build up a business alliance 10 years ago, and this column was used to identify potential business partners for that cooperation.

I only trust the data I collect myself (or at least I can read and fully understand the source code collecting it).

The value of the most data is exponentially decaying with time.

Myth of “you gonna need it anyway”

It goes like this: you collect data in small batches like every minute, every hour or every day. Having many small files makes your data processing slow, so you re-partition them, for example into monthly partitions. At this point you can also use a columnar, checked store and remove unneeded data. These monthly files are still to slow to be used for online, interactive user traffic (with expected latency of milliseconds) so you run next aggregation step and then shove the pre-computed values into a some quick key-value store.

Storing the original data in its original format in the lake in the first step feels to be scientifically sound. It makes the pipeline uniform, and is a prerequisite for reproducability.

And at the very least, you will have three or more copies of that data (in different aggreation state and formats) somewhere anyway, so why not storing one more, original copy?

I suppose, this very widespread idea comes from some historically very popular big data systems like Hadoop, Hive, Spark, Presto (= AWS Athena), row-based stores like AWS Redshift (=Postgresql) or even document-based systems like MongoDB. Coincidentally, these systems are not only very popular, but also have very high latency and / or waste a lot of hardware resources, given the fact that some of them written on Java (no system software should be ever written in Java), or use storage concepts not suitable for big data (document or row-stores). With these systems, there is no other way than to duplicate data and store it pre-computed in different formats according to the consumption use-case.

But we don’t need to use popular software.

Modern column-based storage systems based on the principles discovered with Dremel and MonetDB are so efficient that in the most use-cases (like 80%) you can store your data exactly once, in a format that is suitable for a wide variety of queries and use-cases and deliver responses with sub-second latency for simple queries.

Some of these database systems (in alphabetical order):

Clickhouse
DuckDB
Exasol
MS�SQL Server 2016 (COLUMNSTORE index)
Vertica

A direct comparison of Clickhouse running in an EC2 instance with data stored in S3 and queried by Athena (for some specific data mix and query types that are typical at my current employer Paessler AG) has shown that in this particular case Clickhouse is 3 to 30 times quicker and at the same time cheaper than the naive Athena implementation.

Is it possible to speed up the Athena? Yes, if you pre-aggregate some information, and pre-compute some other information, and store it in the DynamoDb. You’ll then get it cheaper than Clickhouse, and “only” 50% to 100% slower. Is it worth having three copies of data and employing a full time DBA monitoring the data pipelines health for all that pre-aggregating and pre-computing, as well as using three different APIs to access the data (Athena, DynamoDB and PyArrows)? YMMV.

Summary

Data lakes facilitate technical debt:

Untyped data (that can lead to silent, epic fuck-ups)
Waste of time
Waste of money
Waste of hardware
Waste of energy and higher carbon footprint
Many copies of the same data (that can get out of sync)
Can be against of the data minimization principle of GDPR
Can create additional security risks
Can easily become data grave if you don’t remove dead data regularly

Avoid data lakes if you can. Mind the technical debt you are agreeing on and be explicit about it, if you still have to use them.

Flywheel Triangles

For a business to survive and become somewhat sustainable, it needs a self-sustaining business process to earn money, such as it would be very hard to destroy it by management errors or market changes.

I’ve heard it to be called “Flywheel” at StayFriends.

Business flywheels are positive feedback loops leading to business growth, and can be depicted as triangles. Here is for example how the flywheel of StayFriends looked like:

Users generate content. Content could be used to generate ads, or sold to other users, and the resulting revenue could be used to buy ads, bringing new users.

This effect was called “viral loop” back then, but now I understand that this kind of flywheels exist in any successful business and are not limited to social networks.

This is for example the flywheel of Axinom in its early years:

Any custom projects developed based on the AxCMS resulted to more generalized features being added to this CMS, and to more good looking references for it, so that it has attracted more customers, and thus it has generated new projects. Note how the quality of the produced software was not part of this triangle. Theoretically, you could run projects that have left customers dissatisfied, but you had still added something to AxCMS and could use it to win other customers.

Winning new customers might be harder than winning new projects of an existing customers, so that Axinom was working on a second flywheel:

Having several flywheels supporting each other seems to be a feature of companies demostrating sustainable growth. Here is for example the landscape of flywheels of Immowelt (including a potential new one):

It is interesting to see that even a damaged flyweel can support the business for decades. I can demonstrate it on example of Metz, a TV manufacturer. Initially, when television was not yet ubiquitos, the company was participating in the following flywheels:

Where Metz has owned only the lower triangle, but the growth has happened in the upper two: people discovered some new cool show they wanted to see, then needed to buy their first TV set for that, having done that, they became TV Viewers, that motivated creation of new TV shows both directly as well as because of more money from the advertisers. This worked until virtually every family had a TV set. From this time on, the link between TV Sets and TV Viewers became broken, so Metz remained with this:

Basically, they had only the pair of dealer -> TV sets, and a very weak third corner: by producing very high-quality TV sets, they could consistently win various Tests (eg. by Stiftung Warentest) and this could helped to win a little more electronics dealers.

I guess, the company has survived over 50 years in this state.

Every company is interested in having a healthy flywheel and in participating in to several flywheels at the same time. I think, the most realistic way adding a new flywheel would be reusing existing one or two nodes and adding a new triangle.

For example, Metz could try to grow true Metz fans, having basically the strategy of Apple and XiaoMi:

Well, in theory it looks good, but we all know that in the practice, there are all kind of problems, starting from missing investment budget, not enough innovation talents in R&D department, law and regulations preventing some flywheels, strong market competition, etc.

The reason I’ve written this post is that the idea of depicting the flywheels by triangles came to me in a dream, and for some reason, I was very sure in my dream that it absolutely must have at least three corners. I cannot explain this logically, but anecdotally, if we look at the Marx’s formula:

It is striking that it doesn’t provide any non-trivial insights of how to start growing business.

An alle Extrovertierten

Liebe Extrovertierte!

Die Ausgangsbeschr�nkungen haben mein Alltagsleben gar nicht ver�ndert. Ich mache die gleichen Sachen wie zuvor, treffe (fast) die gleichen Menschen wie zuvor, bin gleich oft drau�en wie zuvor. Und das, obwohl ich den Ausgangsbeschr�nkungen folgeleiste.

Ich bekomme nun mit, dass einige von euch es nicht mehr aushalten k�nnen, alleine zuhause zu sein, euch gar einsam f�hlen und eure Veranstaltungen und Networking vermissen.

Es tut mir sehr leid f�r euch. Aber.

Endlich wisst ihr, wie sich Introvertierte in der von euch regierten Welt f�hlen.

Ich w�re euch also dankbar, n�chstes Mal:

– bevor ihr mich als Kunde anruft, obwohl ich mir klar die Kommunikation per E-Mail gew�nscht hatte, weil ihr eure Finger nicht wundtippen wollen, und ein Telefonat f�r euch einfacher und angenehmer ist,

– bevor ihr mich als Dienstleister zu einem Workshop bei euch vor Ort einl�dt, und mir eine PPT mit dem Auftrag vorliest, anstatt die PPT per E-Mail zu verschicken und dann zwei-drei Fragen per E-Mail zu beantworten, nur weil ihr mich aus reiner Neugier pers�nlich kennenlernen m�chtet,

– bevor ihr mich als euren Mitarbeiter zu einer Teambuilding-Ma�nahme einl�dt, wo zwangzig gut bezahlte Erwachsene den ganzen Tag einander farbige Plastikballe zuwerfen, und sp�ter in Schwimmflossen r�ckw�rts gegeneinander rennen m�ssen, weil ihr denkt, es macht ja so viel Spa�,

– bevor ihr mich quer durch Deutschland anfliegen l�sst und euch mein Diagramm aus einer PDF-Datei vorlesen lasst, weil ihr es anders nicht versteht, denn ihr habt es trotz Grundschule, mittleren Reife, Abitur und Studium nicht geschafft zu lernen, wie man Deutsch selbstst�ndig lesen und verstehen kann,

– also bevor ihr n�chstes Mal all solche typisch extrovertierten Dinge macht, denkt mal bitte an diesen einen Monat, als ihr zuhause alleine sitzen musstet und euch unwohl gef�hlt habt, und fragt euch, ob es vielleicht in eurem Alltag den introvertierten Mitmenschen gegen�ber respektvoller gewesen w�re:

– den gleichen Kommunikationsweg zu nutzen, den sie f�r sich gew�hlt haben, auch wenn ihr eure Finger etwas wundtippen, Informationen selbstst�ndig lesen statt vorgelesen bekommen und eure Neugier b�ndigen m�sstet,

– das Programm von euren Firmenfeier und Teambuilding-Veranstaltungen im Voraus zu ver�ffentlichen, damit die Introvertierten sich rechtzeitig darauf einstellen und ggf. ihre Bedenken melden k�nnen.

Danke f�r eure Aufmerksamkeit.

Sanit�r f�r IT-ler

Irgendwann hat man angefangen, Wasserleitungen nicht nach Kundenvorgabe, sondern in einer Reihe von fest definierten Gr��en herzustellen.

Der Bauherr konnte zwar nicht mehr auf Millimeter genau festlegen, wie gro� seine R�hre waren. Konnte daf�r aber Wasserr�hre von unterschiedlichen Herstellern miteinander kombinieren und von der Konkurrenz der Hersteller profitieren. Die Hersteller haben auch davon profitiert, weil sie a) die R�hre als Massenware herstellen, und b) auf Vorrat und nicht mehr auf Bestellung produzieren konnten – nimmt ein Kunde eine Rohr nicht ab, so kann es ein beliebiger anderer Kunde kaufen.

Nun, abgesehen von der L�nge haben runde R�hre zwei weitere Dimensionen: Innendurchmesser und Au�endurchmesser. Da damals die Differenz dazwischen immer gleich blieb, hat man eine der zwei Dimensionen f�r �berfl��ig gehalten und R�hre immer nach dem Innendurchmesser gekennzeichnet. Also eine 1” Rohr hatte den Innendurchmesser von 1 Zoll. Das war aus der Usability Sicht und aus Sicht der Kundenorientierung damals vorbildlich. Erstens, es hat den Kunden eher der Durchsatz des Wassers in seinem Netz interessiert, also im Endeffekt der Innendurchmesser. Wie viel Platz die Rohr in oder an der Wand nimmt, war f�r den Bauherr zweitr�ngig (damals waren W�nde ja auch dicker). Zweitens, alle Unterlagen wie Preislisten, Angebote, Rechnungen, Buchungen und Lieferscheine waren einfacher zu schreiben und zu lesen, weil eben nur eine Zahl statt zwei darin kommuniziert werden musste.

So. What could possibly go wrong?

Dann kam man aber auf die Idee, dass es einfacher und schneller ist, Wasserleistungen mit Gewinde miteinander zu verbinden, als sie immer vor Ort schwei�en oder loten zu m�ssen. F�r Gewindeverbindungen braucht man aber Fittings – also kleine Rohrst�cke in der gew�nschten Form (ein Winkel, ein T-St�ck, eine Muffe usw). Wenn man Fittings auf oder einschraubt, gibt es dann immer eine Innen- und eine Au�engewinde. Wenn ich z. B. eine Rohr verl�ngern m�chte, scheide ich an deren Ende eine Au�engewinde auf und schraube darauf ein Fitting auf, der entsprechend Gr��e Innengewinde haben muss. Wenn ich eine 1” Rohr habe, wie gro� muss die Innengewinde des Fittings sein? Sie muss so gro� sein, wie der Au�endurchmesser der Rohr. Also bei einer 1” Rohr war es damals 1”5/16. Also man h�tte die passende Fittings damals auch so bezeichnen k�nnen: ein 1”5/16 Fitting. Tja, das Usability w�re dann aber schlecht. Wenn der Kunde eine Rohr mit 1 Zoll hatte, musste er daran denken, dass der passende Fittig die Gr��e 1”5/16 hat, und woher soll er das denn wissen, man hat ihm ja die ganze Zeit zuvor den Au�endurchmesser der Rohr nirgendswo kommuniziert (siehe oben, aus Usability-Gr�nden).

Deswegen hat man sich damals wirklich daf�r entschlossen, die Fittings so wie die passende R�hre zu kennzeichnen. Wenn ich also eine 1” Muffe kaufe, also ein Fitting mit zwei Innengewinden, dann hat diese Muffe keine einzige Gr��e, die sich auf 1 Zoll bel�uft. Sondern der Fitting ist so gro�, dass er auf eine Rohr mit Innendurchmesser 1” passt, wenn man denn darauf eine Au�engewinde aufschneidet.

Soo. Des passd scho. War aber knapp. Als ein IT-ler sp�rt man hier schon langsam ein Geruchlein.

Und dann kam man aber auf die Idee! Darauf, dass Gusseisen nicht das einzige Material f�r Wasserr�hre sein kann, und hat angefangen, Stahl, Kupfer, Messing usw. zu verwenden.

Nun, da g�be es aber ein kleines Problemchen. Man braucht weniger von den neuen Materialien, um gleichwertig stabile R�hre zu bekommen. Also man hatte damals zum Beispiel einen kleineren Au�endurchmesser bei gleichem Innendurchmesser machen k�nnen. Hat man aber nicht gemacht. Warum nicht? Weil man ja schon so viele Fittings produziert und verbaut hat, die indirekt einen bestimmten Au�endurchmesser verlangen. Wir erinnern uns, die 1” Muffe hat die Innengewinde von 1”5/16, damit sie auf eine Gusseisen-Rohr mit 1” Innendurchmesser passt. Eine Stahl-Rohr mit einem Innendurchmesser von 1 Zoll h�tte den Au�endurchmesser von nur ca. 1”1/8. Der alte Fitting w�re dann um 3/16” zu gro�.

Deswegen hat man sich damals wirklich daf�r entschlossen, bei den neuen R�hren den Innendurchmesser zu vergr��ern! Und den Au�endurchmesser zu behalten! Und die R�hre immer noch nach dem nicht mehr vorhandenen Innendurchmesser zu kennzeichnen!

Das ist so herrlich, deswegen jetzt nochmals zum mitschreiben.

Wenn ich heute eine 1” Stahlrohr kaufe, hat sie weder den Innen- noch den Au�endurchmesser von einem Zoll. Sondern, ihr Au�endurchmesser ist so gro�, wie er bei einer Gusseisenrohr irgendwann mal war, und diese Gusseisenrohr hatte damals den Innendurchmesser von einem Zoll.

Tja. Der einfachste Weg, zu sich einen Berater in einem Baumarkt zu holen, ist in die Sanit�r-Abteilung mit einer Schiebelehre zu marschieren. “(Oh, oh, oh, der Kunde misst gerade die Gewinde eines Fittings mit der Schiebelehre, das kann nur schief gehen�) Guten Tag, kann ich Ihnen helfen?”.

Sooo.

Und dann sind die Franz�sen gekommen. Die mit ihrem Metre. Und haben zurecht moniert, dass sich die ganze Welt auf das metrische System geeinigt hat, und was haben hier alle diese Z�lle zu suchen? Deswegen haben die R�hre in den neuen Anwendungen (also nicht in der Haustechnik, aber z.B. f�r Raketen) tendenziell metrische Gr��en und metrische Gewinden. Es ist also gar nicht so unwahrscheinlich, ein Druckmessger�t zu kaufen, dessen Anschluss eine M20 Gewinde hat. Es hat also einen Durchmesser von exakt 20 mm. Der n�chste Fitting w�re dann der 1/2” Fitting, der mit 18,5mm etwas zu klein gewesen w�re. Abgesehen davon haben die metrischen Gewinden eine andere Dichte, was die Anzahl von Faden angeht.

Da m�ssen wir uns endlich gl�cklich sch�tzen, dass die Gewindeverbindungen in der Haustechnik nicht mehr zum State of the Art geh�ren.

Endlich gibt es Mehrschichtverbundr�hre, die mit ihrem Au�endurchmesser und St�rke, mit metrischen Einheiten gekennzeichnet werden. Also z.B. 16×2,2 f�r eine Rohr mit 16mm Au�endurchmesser und 2,2mm stark. Diese R�hre kann man von Hand biegen, und in Sekunden durch Verpressen miteinander verbinden. Und eine 100 Meter Rolle kann von einer Person getragen werden.

Es gibt nur ein kleines Problemchen. Alle Hersteller haben den Wikipedia-Eintrag �ber den Walled Garden gelesen und versuchen, auf ihr System zu locken. Wer “fremde” R�hre oder Fittings verwendet, kriegt keine Garantie. Also wenn die Rohr oder die Fittings alle sind, ist da nichts mit “mal schnell in den Baumarkt fahren und neue holen”, man muss im System bleiben und Nachschub dort besorgen, wo man es fr�her gekauft hat. Es gibt um 36 unterschiedliche Presskonturen, die jeweils nur auf passende Fittings angewendet werden k�nnen. Sie k�nnte man alle mit einer Presszange verpressen (mit auswechselbaren Pressbacken). Das kostet aber 25 Euro pro Pressbacke. Und dann verliert man ebenfalls die Garantie. Man soll das Originalger�t des Herstellers verwenden, das f�r den gleichen Job pl�tzlich das 10-fache kostet (wir sprechen hier um mehr als 1000 Euro f�r ein kleines Akku-Ger�t).

Ich w�re daf�r, dass man in Urheber- und Patentrecht auch den Tatbestand eines “Gestaltungsmisbrauchs” einf�hren w�rde. Ich habe n�mlich den Verdacht, dass man exakte Gr��e des Fittings nicht preisgibt, nicht weil es auf diese Gr��en ankommt, um das Fitting besser als von der Konkurrenz machen (stabiler, bequemer, preiswerter), sondern einzeln und allein um zu verhindern, dass die Konkurrenz preiswerte und bessere kompatible Tools und Zubeh�r herstellen kann.

Sooooo.

Als w�re das alles. Isses aber ned.

Frage: ich habe eine Rohr, die mit 1/2” gekennzeichnet ist. Ich m�chte sie durch eine Wand durchf�hren. Wie gro� muss das Loch in der Wand sein? Antwort: 40mm. Wie komme ich darauf? Nun, die 1/2” betr�gt bekanntlich 12,25mm. Diese Zahl ist ja aber wie wir gelernt haben heutzutage irrelevant und wir schmei�en sie gleich weg. Wir schauen stattdessen in die Tabelle an und finden heraus, dass der Au�endurchmesser einer 1/2” Rohr 20mm betr�gt. Nun, laut EnEV m�ssten wir die Rohr noch d�mmen. Hierf�r ist die D�mmst�rke von 50% des Rohrdurchmessers erforderlich. 50% von 20mm ist 10mm. Die 10mm Schicht rund um die Rohr von 20mm herum, macht die finale Gr��e von 40mm.

Ich finde diese L�sung auch sehr elegant (nicht!). Die Gesetzgeber haben sich vielleicht �berlegt, ob sie gleich die Hersteller vorged�mmte R�hre herstellen lassen. Als Verbraucher h�tte man dann eine 40-er Rohr gekauft, die man mit der Schiebelehre messen und 40mm bekommen kann, die man auch mit einem 40er Fitting verbinden und in ein 40-er Loch reinschieben kann, usw. Doch so einfach ist es nicht – es gibt ja noch sehr viele alte Installationen, die man sowieso nachtr�glich d�mmen m�sste. Und �berhaupt, wo k�men wir denn hin, dann w�rden wir doch dieses wunderbares seit Jahrhunderten sich bewehrtes Zoll-System verlieren! Das geht ned! So ungef�hr ist es wohl verlaufen. Nun m�ssen die Installateure immer mit zwei Durchmesser hantieren. Einmal f�r die Rohr und Fittings wie sie sind. Und einmal f�r das Ganze, nachdem es fertig isoliert ist.

Es gibt viele IT-ler, sie sich f�r ihren Quellcode sch�men. Zu viele. Eine Kur davon k�nnte es sein, mal mit den eigenen H�nden eine alte Sanit�r-Installation zu renovieren.

Four Advices For Product Managers of Machine Learning Products

The hype around Big Data and Machine Learning goes on and on, and more and more businesses seem to obtain competitive edge by developing data-based products. As a product manager, everyone is considering to use this new tech in their area.

Having made some first experiences designing data-based products, I want to share the lessons learned.

Artificial Intelligence is about having Less Control

In a traditional software product, we fully control its internal logic. On contrary, with a data-based product, we give up some part of the logic to the Machine Learning model. This is fully intended. In fact, this is why we want to use an A.I. at all.

For example, if we want to recognize images of kittens, we could define exact rules of how to process and transform the colors of pixels, by ourselves. But this would be a tedious and complex, if not impossible task, at least for human software developers. Instead, we would train a ML model that would accept images on its input, and would output the detection result, “somehow”.

And here is the dark side of the coin: this “somehow” means that we
1) cannot explain how exactly the model has made its decision, and
2) cannot find “just that one single screw” in the model to tune its behavior, because ML Models can contain millions of “screws” you can tune, and as a human, you cannot find the right one; and finally
3) we have to accept that the ML model is making right detections often, but not all the time.

In the practice, this all leads to the following advices:

1. Design for a Mistake

Most ML models produce results that are only statistically correct – i.e. only some big part of the users will obtain correct or at least satisfactory results. Howewer, some sizable part of the users will not get satisfactory results, and this part could be uncomfortably large (compared to the traditional products), especially because the ML model can make mistakes also in the situations, which for us humans would appear inacceptable, for example it could recognize a kitten in the photo of your CEO.

It is our task as product manager to proactively counteract the possible user dissatisfaction. Here are some ideas how to do so.

Human Moderation

We are working on a gallery of images with kittens and we want that only kittens can be published.
Wrong: if no kitten is detected in the uploaded image, prevent the user to post it.
Right: inform the user that the image is sent to manual moderation process, and hire moderators.
Alternatively: post the image in an “unsorted” category, let users tag inappropriate images, and automatically hide the image detected as “no kittens” after even the very first user complain.

Sort Instead of Hide

We have a baby products shop. We want to recognize the possible age of the users baby and to show them only items suitable for their age.
Wrong: hide the items of an inappropriate age
Right: sort the items of an inappropriate age “below the fold”
Alternatively: hide the inappropriate items, but provide good visible buttons “<<< Articles for younger babies” and “Articles for older babies >>>”

Suggest Instead of Fill

We have a marketplace for used products and we want to generate the headline for new listings automatically, to speedup and simplify the listing publishing process
Wrong: make the headline not editable and generate it automatically
Right: Fill the generated headline as predefined (default) value of the text box, remove it as soon as user starts typing something else.

Note that to be able to implement this idea, the overall publishing workflow has to be changed: instead of starting with the entering a headline, the user can start by uploading article images or filling some structured data, which is needed to generate the headline for him.

Upsell

We want to help users to estimate the value of their real estate.
Wrong: ask user to fill a form, then output the result of the model.
Right: output a wide range of valuations as soon as user has typed the location, and let the user either to enter more data to reduce the valuation range, or, if measuring or getting the data is too complicated for them, suggest them to order a human evaluation service.

Feedback to Calm Down

We want to show some ads that are as relevant to the users as possible.
Wrong: just show the ads produced by a recommender model.
Right: additionally, show a button allowing the users to give us the feedback (“Less of this topic”) or to turn off the ads (see the Upsell idea).

2. Test in Production

A traditional QA process includes comparing the documented intended logic of the product with the actual product behavior. But because we don’t know the exact logic of the A.I. models, we also cannot document it so the testers also cannot check it. Besides, even if we could document the logic, this document would be so complicated that it would be infeasible to test all the cases.

This leaves us with the following options:

Predefined Mockups

In your services, define some “magic” identifiers that would prevent going through the A.I. pipeline and return predefined results back. For example, we can add a logic into our kitten recognition service that would always return “kitten” for images of 1×2 pixels, and “no kitten” for images of 2×1 pixels, given that it is unprobably that real users would use this image sizes in production. A tester would be able at least to test the non-A.I. parts of the product (uploading, publishing, searching etc)

Exploratory Testing

The traditional exploratory testing is still possible with data-based products, but is always more expensive. For example, the testers would need to prepare testing sets with images of kittens, dogs, horses, lions, dolphins, NSFW-images, etc.

The exploratory testing is especially laborous for products that act on the previous user behavior, such as recommenders, especially if recommenders use some out-of-band data like current date and time, the weather outside, or any interesting shows running currently on TV.

A/B Testing

Therefore, the data-based products rely on A/B testing much more often than any traditional product, because that would compensate for the lack of the traditional QA, which is often infeasible due to time or budget constraints.

3. Document for Reproducibility

Software documentation is often part of prescribed software development processes. In traditional products, the documentation of their business logic is often considered to be the main and most important part.

As a product manager of a data-based product, you can often be confronted with the question, “Where I can find the documentation how the product decides to do this and that in a such and such situation”. And then you will need to explain that there is no documentation, because nobody knows the logic and nobody is able to know it either.

Here, we have to make one step back and remind ourselves, that the main goal of documentation is to enable the maintenance and further development of the software. For the data-based products, the key factors for this is being able to reproduce the model training, and being able to re-train the model with some new, or modified data. There are some tools on the market to manage the datasets (the most popular being Pachyderm), while we have also created our very own framework for this: https://github.com/Immowelt/iwlearn

4. Create Product Ideas with Data Mining, or Be Flexible

The usual process of product idea discovery includes doing interviews with the users and other UX research, filling out business canvas and performing SWAT analysis, scheduling brainstormings, and a lot of other important and effective tools.

What it does not include, is, to check whether the data in your data lake really has the quality level required to implement the Machine Learning model you need.

To give you some examples of what can diminish your data quality:

something that won’t be collected from user and cannot be inferred from other data. For example, if we don’t ask how many rooms there is in the apartment, it is very hard or impossible to realiably infer it from text or images,
something that is not validated during collection, for example users can enter literally anything as their zip code,
something that has been collected, but disappeared in further data processing steps. For example, the user id could be actually collected during his item search process, but then be removed in the later steps due to the data minimization principle of EU GDPR. In this case we cannot create recommendation models based on what else items this user was interested in the past. This is to demonstrate that low data quality not always means some computer bugs that can be quickly fixed, but could also be a well intended state.

Unexpected low data quality can hit you badly. In the worst case scenario you would successfully pitch an idea, get it on your company roadmap, start implementing it and only after investing 80% of the data science efforts you would recognize that the resulting ML Model cannot be used as expected, for example because it has a very low accuracy, so too many users would be annoyed by it.

Be Flexible

This is why you need to have a plan B for this kind of situations. One typical solution would be using some traditional business logic instead of the A.I., and accept the possible less than expected uplift of the products KPIs. Another solution would be to understand, what exactly part of the available data has good quality, and to quickly conceive, pitch, prioritize and implement a completely different product, possibly with some other target group, product focus or KPIs, but at least technically feasible as it uses a good quality data.

Mine Data

Another, a fundamentally different way to approach this problem is to generate product ideas with Data Mining.

Your data scientists would start with looking at data about some topic (for example, the behavior of users who like kittens) and start to mangle it in different ways, for example creating clusters of users (white kitties versus black kitties lovers), visualizing user retention, cohort analysis and so on. As a by-product, the data scientists will identify and eliminate the bad data: all that bots and crawlers, and users posting NSFW pictures, and can generate some insights that hopefully would be useful to generate an idea.

What if they can discover that a sizable part of the users “like” good kitten images within minutes after they appear online? You can make a new product feature out of it!

For example, for any new kitten image you would find out the users that most probably would like it, and send them a push notification, so that they can enjoy and like the image immediately, thereby increasing their retention on the web site.

The advantage of this process is that you’re guaranteed that you have data at least in some reasonable quality, before you start pitching and prioritizing your product idea.

The disadvantage is that you cannot guarantee that data mining would produce any insights related to the current company goals and focus user groups, so that you need to be flexible here too and be able to make good products, even though they not always contribute to the current company focus.

Law of the Architecture Decomposition by User Interface

Abstract

One of the key aspects of the software architectures is the choice of the decomposition principle – how the software is divided into parts and via which interfaces these parts communicate to each other. There are several factors influencing the decomposition of software, some of them are technical or logical by nature, others are not related to computer science (such as politics, time and budget constraints, etc). In this post, I’m discussing yet another factor influencing the software architectures.

Prior Work

M. Conway stated in 1967:

Organizations are to produce software architectures which are copies of the communication structures of these organizations.

Before reading this law, I didn’t know that software architecture can be influenced by factors not related to its purpose (stated in technical and business terms). Even after reading it, I was sure that it was just some cynical rant and in reality, most of architectures are not influenced by it. I was wrong and you were right, Mr. Conway.

Still, I think I have noticed yet another non-CS factor influencing architectures – unrelated to politics and organizational charts.

The Law

Software architectures are decomposed using the same interfaces between the modules, that the software developers are using to communicate with their computers.

Examples

The punch cards generation

Software developers who have used punch cards as the primary user interface to their computers, tend to think in terms of memory maps. You can in fact visualize the memory map of computer code by putting several stacks of punch cards on top of each other in the right order. So, the typical interface between the modules either uses some well known RAM address location to pass parameters and to receive the results, or uses concepts like stacks or decks or ring buffer, which can also easily be modeled using the punch cards.

Typical programming language: Assembly.

Typical architecture decomposition tool: a big piece of paper hanging on the wall representing the memory map, with pencil marks on it defining the usage of each memory cell.

The command line generation

Software developers who worked with terminals in the command line see their computers as something accepting a command with some parameters. So their typical programming language is C – a C function is more or less a command with some parameters. In the UNIX world it is indeed often the case that an API C function and the command line executable bear the same name, so having learned how to use it in the shell you could easily use it exactly the same way in your C program.

Typical programming language: C

Typical architecture decomposition tools: structured programming, functions, modules with several related functions, ony some of them are publicly available to other modules and build the formal “interface” of the module that can be statically checked by the linker.

The GUI generation

Software developers who think about computers as something having a GUI, tend to think in terms of objects and containers of objects. The desktop is a container of windows, and each window is a container for other UI elements, and an UI element is a container of methods that are to be executed on mouse click.

Typical programming language: Smalltalk, C++, Java, C#, a lot of others

Typical architecture decomposition tools: OOP, polymorphic classes with virtual methods and inheritance. The modules can now contain several classes, and only some of them are publicly available to other modules. There is a granular control of what parts of software are visible to whom and what can be overridden. One or several modules can be put together in a package, and be installed and managed by package managers. They are versioned, have explicit dependency tracking, and semi-automatic documentation.

The web generation

Software developers who spend most of their time in the web browser or in the mobile apps like chats and social networks, tend to think in terms of hosts and resources available on that hosts.

Typical programming language: NodeJs, PHP, but actually they don’t matter that much any more

Typical architecture decomposition tools: RESTful web services (like web pages) and Message Busses (like Chats). Just like the computer device doesn’t matter for the web generation (as they interact not with their device, but with some web site somewhere in the cloud), their software can both run in a 1 inch by 1 inch small device on their wrist, or have its parts running in 10 different countries all around the globe – without changing anything in the architecture.

The generation Future

Observing the exponential growth of artificial intelligence and virtual reality, we could assume that the next generation of software developers will neither interact with a computer, nor interact with some internet resource. Instead, their own cognitive abilities will be augmented by silicon-based intelligence and some combination of local and global information networks. We can imagine that this augmentation could be seamless, so that they don’t even recognize, whether their order of a salad instead of a burger was their own decision, or an advice of their health monitor, worried by the blood sugar levels. The will not interact with any hardware, software or resource – these things will be just parts of the peoples personalities.

Typical programming language: actually, not a programming language, but a machine learning framework. Probably, something implementing the Neural Turing Machine.

Typical architecture decomposition tools: something from the areas related to AI psychology, AI parenting, AI training, education, self-control, self-esteem etc.

Zombieland

Ich habe schon mal geh�rt, wie einige deutsche Betriebe als Zombieland bezeichnet werden. Fr�her habe ich immer gedacht, die Bezeichnung w�rde nur darauf anspielen, dass zu viele Vorruhest�ndler besch�ftigt sind und dass alles so langsam und tollpatschig abl�uft.

Je mehr Unternehmen ich aber detailliert kennenlerne, desto mehr weitere �hnlichkeiten ich finde.

Einheitlichkeitszwang

Zombies werden ja von einem Nekromanten gesteuert und bewegen und verhalten sich also einheitlich. So auch in einem Zombieland-Betrieb: es wird auf Teufel komm raus daf�r gek�mpft, dass alles einheitlich ist. Und das trotz des gesunden Menschenverstandes, denn es ist ja offensichtlich, dass je flexibler man auf pers�nliche Befindlichkeiten und Anforderungen jedes einzelnen Mitarbeiters eingeht, desto produktiver und motivierter sie sind. Als Scheinbegr�ndung wird entweder vorgegeben, dass es zu viel Aufwand sei, jeden pers�nlich zu managen. Das z�hlt nicht, weil gro�e Teams immer in kleinere aufgeteilt, und die Zust�ndigkeiten delegiert werden k�nnen. Oder es wird behauptet, dass die Einheitlichkeit von Prozessen, Abl�ufen und Zielen das einzig faire w�re. Das widerspricht dann aber nicht nur dem normalen Menschenverstand, sondern allen Best Practices, die in der Rechtswissenschaft existieren. Das Gesetz soll f�r alle gleich sein, nicht die Rechte und Pflichten! Die Gesetzgebung bem�ht sich n�mlich sehr wohl darum, dass Menschen in unterschiedlichen Ausgangssituationen auch unterschiedliche Anspr�che und Verpflichtungen haben.

Das aggressive Zombiewerden

Zombies greifen gesunde Menschen an, damit daraus weitere Zombies entstehen. Das ist im Interesse des Nekromanten, denn er bekommt eine Situation, in der seine Macht automatisch gr��er wird. F�r gew�hnlich geht die Zombie-Gefahr von einer Abteilung oder einem Standort aus. Gesunde Teile des Unternehmens, die mit Zombies zusammenarbeiten m�ssen oder wollen, werden nach und nach zombiesiert. Das passiert zum Beispiel auf Terminen mit 20 Beteiligten, die mehrere Tage dauern und wo eine Fragestellung diskutiert wird, die innerhalb einer Stunde von 3 Menschen beschlossen werden kann. In Wirklichkeit geht es auf solchen Terminen darum, dass Zombies die Gehirne von ihren gesunden Kollegen auffressen.

�blicherweise gibt es auch keine M�glichkeit, einen Zombie zu heilen – zumindest so lange der Nekromant an der Macht ist. Gesunde Menschen wehren sich, entweder indem sie sich von den betroffenen Teilen der Firma abschirmen, oder indem sie die Firma verlassen.

Unfreiheit

Zombies treffen niemals ihre eigene Entscheidungen, sondern setzten die Entscheidungen von den Vorgesetzten um. Damit Zombies unter Kontrolle gehalten werden k�nnen, bekommen sie keinerlei wichtigen Informationen dar�ber, was im Betrieb passiert. Jegliche wichtigen �nderungen und Entscheidungen werden unter vorgehaltenen Hand, geheim, von einem kleinen Gremium von f�hrenden Nekromanten beschlossen.

Fehlende Produktivit�t

Niemand kann einen Zombie vorstellen, der mal etwas konstruktives oder produktives tut – ein Haus baut, eine Stra�e sauber macht oder einen wissenschaftlichen Experiment durchf�hrt. Die meiste Zeit verbringen Zombies damit, die gesunden Abteilungen anzugreifen, sich von ihrem Nekromanten indoktrinieren lassen, oder mit Vort�uschung einer produktiven T�tigkeit.

Wenn eine Webseite innerhalb einer Woche entwickelt und live genommen wird, sprechen wir �ber eine normale Produktivit�t. Wenn es 4 Wochen dauert, kann es daran liegen, dass das Team zu wenig kompetente Mitarbeiter hat oder st�ndig von anderen Aufgaben abgelenkt wird. Wenn man aber die gleiche Seite 12 Monate baut, dabei Tonnen von Dokumentation und Tests produziert, Dutzend Male quer durch Deutschland zu notwendigen Abstimmungen f�hrt, und dann trotzdem nicht live geht, weil noch die Zuarbeit von 8 weiteren Teams n�tig ist, und die Software-Architektur der gesamten Firma angepasst werden m�sste – dann wei�t man, was man hat.

Ausweg

Wir in IT Bereich haben vergleichsweise viel Gl�ck damit, dass nur wenige Menschen sich damit abfinden k�nnen, Zombie zu werden. Es ist in unserem Job normal zu erwarten, dass man jeden Tag etwas Produktives tun kann – zum Beispiel eine Software erstellen, die der Firma ein bisschen hilft und die auch die Welt etwas besser macht.

Und obwohl ich meinen aktuellen Arbeitgeber nicht sch�nreden m�chte und wir auch hier und da Zombie-Befall haben, muss ich ehrlich sagen, dass ich letzte Woche viel Spa� an sehr kreativen Arbeit in einem gesunden Team hatte, und es nicht abwarten kann, morgen wieder zur Arbeit zu kommen, um mein kleines Projekt abzuschlie�en und mit den Kollegen zu besprechen.

Shameless Plug: we’re hiring

Ruhe vor dem digitalen Storm

Deutschland steuert auf seinen Untergang zu, weil hier die Digitalisierung scheinbar nur aus diesen zwei Teilen besteht:

fl�chendeckendes schnelles Internet,
mehr PC in den Schulen.

Und das Problem besteht nicht einmal darin, dass es zunehmend weniger Bedarf an Internet auf dem Land gibt, weil mehr und mehr Menschen in die Stadt ziehen. Und nicht darin, dass es nicht mehr zeitgem�� ist, PC als �ffentliche Kommunikationsmitteln anzusehen: sie sind mittlerweile so pers�nlich wie Zahnb�rste, so dass man die Nutzung von eigenen PCs und Handys in der Schule zulassen sollte, statt das Geld von Steuerzahler f�r schuleigene PCs auszugeben.

Das eigentliche Problem ist, dass die Digitalisierung eine Revolution der Gesellschaft ist – viel gr��er, als die Erfindung von Buchdruck war – und dass die Politiker in Deutschland das nicht begreifen.

Die Erfindung von Buchdruck spaltete die Kirche (was so “nebenbei” den 30-j�hrigen Krieg mit 6 Mio. Toten verursachte) und war eine Voraussetzung f�r die nachfolgende B�rger-Revolutionen und Untergang von Monarchien, weil Menschen durch die Alphabetisierung und Entstehen vom Massen-Bildungswesen nicht mehr so leicht regierbar geworden sind. Die Wissenschaft entwickelte sich, was die enorme Weiterentwicklung von Medizin und Technik zur Folge hatte. Viele etablierten Gesch�ftsmodelle und Berufe gingen unter, weil im Prinzip jeder alles lernen konnte, sobald es ein Lehrbuch dazu gab.

B�cher gab es auch vor dem Buchdruck. Der Buchdruck hat nur die Verbreitung von B�chern billiger gemacht. Die Digitalisierung ist gewaltiger. Nicht nur ist die Verbreitung von Informationen praktisch kostenlos geworden. Sondern auch das Erstellen von der Information hat sich verbilligt. Viele Informationen lassen sich automatisch durch Algorithmen, teilweise auch mit K.I. oder neuerdings mit IoT erstellen. Wir steuern auf die Welt zu, wo jeder Zugriff auf alles haben kann, was �berhaupt m�glich zu erfahren ist.

Und das wird alles noch einmal umkrempeln (oder, wie man in Silicon Valley sagt, disrupten).

Nein, die Digitalisierung von Massenmedien ist nicht, die Web-Seiten mit elektronischen Kopien von Zeitungen zu erstellen. Eine Zeitung als solche ist nur deswegen entstanden, weil es billiger war, mehrere Artikel auf einem gro�en Blatt Papier zu drucken und zu liefern, als die Artikel einzeln zu verbreiten. Das ist nicht mehr so. Somit abonniert man keine Zeitungen mehr, sondern einzelne Autoren. Und ja, man tut es auf Facebook oder Youtube, und nicht auf faz.de.

Nein, die Digitalisierung vom Finanzamt hei�t nicht, dass man die gleichen Formulare auch online abschicken kann. Sondern, dass man die Werbungskosten und die au�ergew�hnliche Belastungen direkt in seinem Online-Banking an das Finanzamt weiterleitet. Und dass man Sekunden sp�ter eine Steuererstattung hierzu �berwiesen bekommt.

Nein, die Digitalisierung von Bildung besteht nicht darin, Schulen zu digitalisieren, sondern sie abzuschaffen! Nichts ist schlimmer f�r die Kinder, als eine Massenausbildung. Millionen von Menschen regen sich wegen Massentierhaltung auf, schicken aber ihre Kinder artig in die Schulen, wo ihr Gehirne mental vergewaltigt werden. Das aktuelle Schulsystem ist schlimm f�r die schnelldenkenden Sch�ler, die sich zu Tode langweilen m�ssen. Es ist aber viel schlimmer f�r die langsamdenkenden Kinder und f�r Legastheniker, die st�ndig frustriert sind und irgendwann anfangen, sich zu sch�men und sich als vermeintlich “minderwertig” abzuschreiben. Stattdessen wird jedes schulpflichtige Kind an MOOC Kursen teilnehmen, in dem Tempo und Umfang, die f�r es am optimalsten ist, und die digitale Nachweise dessen an die Jugend�mter schicken. Das Erlernen von praktischen Fertigkeiten und die Beherrschung von dem eigenen K�rper (Sport, Tanzen, Malen, Hobeln, Dechseln, Mauern, Radfahren, Reitern usw) geh�rt meiner Meinung nach nicht zur Allgemeinbildung und kann auf Kursen oder ggf. auf Praktika in den Firmen erlernt werden.

Nein, die Digitalisierung von privaten Unternehmen besteht nicht nur aus papierlosem B�ro und Verteilung von Informationen �ber Slack statt mit einem Aushang. Stellt euch vor, die Arbeitnehmer w�rden die M�glichkeit erhalten, vor der Einstellung sowohl in die Firmenb�cher zu schauen, als auch zu lernen, wie ihre zuk�nftige Chefs ihre Entscheidungen treffen und sich generell verhalten. Mit nur ein wenig mehr Transparenz w�rden wir all die Office-Politiker dort schicken, wo sie hin geh�ren – in die Geschichtsb�cher.

Nein, die Digitalisierung von Demokratie ist nicht, dass jede Partei eine Web-Seite mit ihrem Programm erstellt. Die Parteien sind nur deswegen entstanden, weil die Wahl-Kampagnen f�r Einzelpersonen zu teuer sind. Sie sind zu teuer, weil a) es f�r die W�hler so kompliziert und aufwendig ist, w�hlen zu gehen und b) weil die W�hler f�r mehrere Jahre eine Wahl treffen m�ssen. Das erste Problem ist mit Online-Wahlen bereits gel�st. Das zweite Problem k�nnte gel�st werden, wenn man nicht mehr jede 4 Jahre, sondern t�glich jeden Politiker (ab)w�hlen kann. Das klingt kompliziert f�r den W�hler, ist es aber nicht, weil er nicht mehr riesige mehrj�hrige Programme lesen muss, sondern nur noch das Geschehen der letzten Woche bewerten muss. Die Wahlen sind deswegen so selten, weil es fr�her zu teuer war, die Wahlzettel zu drucken, sie zu verteilen, dann sie wieder einzusammeln, zu z�hlen, die Wahlprogramme zu drucken, und verteilen usw. Das alles ist heute kostenlos m�glich.

Und weil unsere Politik und unsere Gesellschaft gar nicht begreifen k�nnen (begreifen wollen???), wie gewaltig die Digitalisierung ist, die �berall in der Welt passiert und von manchen Global Players auch entsprechend umfassend umgesetzt wird (z.B. in China), denke ich, dass wir uns auf die schlimme Zeiten und mehrj�hrige hybride Kriege einstellen m�ssen. Ja, irgendwann geht die heutige Generation von Politikern und kommt die n�chste, die ihre Pubert�t online ausgelebt hat. Aber dann wird es unaufholbar zu sp�t.

Und es hilft nichts, Milliarden von Steuergelder f�r F�rderung von K.I. auszugeben. Denn das Hauptproblem ist bei uns nicht, dass wir zu wenig K.I. haben, sondern dass niemand unter den Entscheidern bereit ist, es auch wirklich disriptiv einzusetzen.

MVP, oder?

Ich mag es nicht, wenn Begriffe falsch verwendet werden. Menschen, die sorglos mit W�rter ungehen, sind auch sorglos in ihrem Denkprozess, ziehen deswegen h�ufiger falsche R�ckschl�sse und bringen sich und ihre Umgebung (also auch mich) unn�tig in Gefahr.

Heute m�chte ich �ber das MVP sprechen (Minimum Valiable/Viable Product).

Zun�chst einmal ist es ein Product – also etwas, f�r was Kunden Geld oder Geld-Ersatz (z.B. Informationen, Leads, Traffic usw) bezahlen k�nnen. Also nicht etwas, wo die Kunden nur sagen k�nnen, sie h�tten daf�r sogar Geld bezahlt. Und nicht etwas, was die Kunden lieben, oder was sie gut finden oder den Freunden empfehlen. Nein. Ein MVP muss von den Kunden tats�chlich gekauft werden k�nnen – und wird dann auch tats�chlich geliefert und seiner Bestimmung nach verwendet. Wir merken uns: Fake Doors sind kein MVP, weil sie nicht gekauft und nicht geliefert werden. A/B Tests sind meistens auch kein MVP, weil sie kein Produkt, sondern nur eine Seite oder ein Feature des Produkts testen. Nur wenn man einen A/B Test so durchf�hrt, dass man selbst ein eigenes Bier braut und es neben einem Markenbier in einen Ladenregal stellt und verkauft, und danach die Conversion berechnet, kann man �ber einen MVP sprechen.

Eine weitere Konsequenz, die wir uns merken sollten: wenn eine Organization zu viele Produkt-Managers besch�ftigt, was meiner Erfahrung nach des �fteren der Fall ist, verantwortet nur ein Bruchteil von denen ein Produkt. Solche Sachen wie Startseite, Login, Payment, Detail-Seite sind alle keine Produkte, weil sie von den Kunden nicht gekauft werden. Somit werden die Produkt-Managers zu den Feature-Managers degradiert und k�nnen eigentlich nur in seltensten F�llen MVPs entwickeln.

Dann gibt es auch dieses “Minimum Valuable”. Das Produkt Auto hat einen Motor, R�der und einen CD-Player. Ohne CD-Player wird dieses Produkt zu einem MVP. Ohne R�der oder Motor, wird das Auto wom�glich auch verkauft (das ist dann die Frage des Preises), ist aber kein MVP, weil das Produkt hiermit nicht die Masse der Kunden trifft. Anders gesagt, wenn die Kundenbed�rfnisse als eine mehrdimensionale Gau�-Kurve vorstellbar sind, beinhaltet das MVP die Features innerhalb einer Standardabweichung von dem Mittelwert – also nicht so viele Features, dass die Mehrheit der Kunden zufrieden sind (das w�re dann das klassische Produkt), sondern nur die Features, mit denen man mit dem geringsten Aufwand eine erhebliche Kundengruppe erreichen kann. Wir merken uns: wenn wir weniger Sprints haben, als n�tig w�re, um etwas Vern�nftiges sinnvoll zu entwickeln, wird das Ergebnis nicht automatisch zu einem MVP. Sondern wom�glich zu einem Auto ohne R�der – etwas, was 0,3% der Menschen kaufen w�rden und ist somit kein valides MVP.

Und zu guter Letzt, geh�rt zu jedem Begriff auch die bestimmungsgema�e Verwendung. Wann setzt man ein MVPs ein? Man hat eine Produktidee und m�chte testen, ob sie gut ankommt, ohne dabei Geld und Zeit f�r Marketing-Research und Umfragen, oder vollwertige klassische Produkte ausgeben zu wollen – in der Situation, wo Fake Doors keine aussagekr�ftige Testergebnisse liefern. Wir merken uns: wenn ein Fake Door ausreichend ist, soll man lieber es machen, weil MVP teurer ist, denn es muss gebaut und geliefert werden. Fake Doors k�nnen aber z.B. dann nicht eingesetzt werden, wenn Kunden zuerst eine kostenlose Version nutzen m�ssten, um �berhaupt f�r sich das Bed�rfnis zu entwickeln, so ein Produkt zu verwenden – bevor sie zu einer kostenpflichtigen Version gef�hrt werden.

Es ist wichtig zu verstehen: MVP ist deswegen ein Produkt, weil man eben eine Produkt-Idee damit testet. Soll eine Feature-Idee getestet werden, gibt es daf�r andere M�glichkeiten: der klassische A/B Test oder eben die Fake Doors. Und anders herum: muss die Produkt-Idee nicht getestet werden (weil sie z.B. von der Konkurrenz bereits benutzt wird), braucht man auch kein MVP – nicht jede Produktentwicklung soll unbedingt eine MVP-Phase beinhalten.

Warum ist es mir wichtig, zwischen A/B Test und MVP zu unterscheiden? Wenn ein A/B Test erfolgreich ist, wird ein bestehendes Produkt um 3% bis 10% verbessert. Wenn ein MVP erfolgreich ist, bekommt man einen weiteren Standbein – ein weiteres Core-Business, was im Idealfall sogar unabh�ngig von den bestehenden Produkten funktioniert und so die Marktrisiken diversifiziert. Auch wenn 100% von einem MVP in Euro ausgedruckt weniger sein k�nnen, als die 3% von dem Core-Business, sprechen wir hier um ganz unterschiedliche Entwicklungen. Im ersten Fall will man mit viel Aufwand von 99,99% auf 99,999% aufsteigen. Im zweiten Fall geht man in die Breite und macht quick wins.

Die traurige Realit�t ist in Deutschland so, dass MVP sich fast zu einem Kargo-Kult avanciert hat. Man sieht sich nur so gerne als der zweite Steve Jobs, und m�chte mit den gleichen Toys spielen, wie die Big Boys in Silicon Valley. Der beste Weg zur Besserung w�re, zun�chst einmal mit sich selbst ehrlich zu sein und aufzuh�ren, seine eigene Deliverables als MVPs zu bezeichnen.