Category Archives: decision theory

Microsoft Dynamics 365 Fraud Protection public preview is now available – Microsoft Dynamics 365

We are pleased to announce the Dynamics 365 Fraud Protection public preview is now available! With Dynamics 365 Fraud Protection we’re providing a cloud-based service for online merchants to help increase revenue, lower fraud-related costs, and improve customer experience. In this release, Fraud Protection preview supports payment fraud detection and account creation protection.

Source: Microsoft Dynamics 365 Fraud Protection public preview is now available – Microsoft Dynamics 365

Good work team – very proud of our achievement!

 

2019 Edelman Finalist: Microsoft | ORMS Today News

Source: 2019 Edelman Finalist: Microsoft | ORMS Today News

 

Microsoft’s Fraud Management System is one of the finalists competing in the 2019 INFORMS Franz Edelman Award, which recognizes ways that operations research and analytics are improving how people live and work around the globe.

As part of the Resoundingly Human series, we recently did a podcast with Ashley Kilgore from INFORMS, where we talk about the science and technology of our work.

https://pubsonline.informs.org/do/10.1287/orms.2019.02.31p/full/

#INFORMS #Microsoft #FraudManagement

“In this episode, we are joined by Jay Nanduri, Distinguished Engineer, and Anand Oka, Principal Group Program Manager with Microsoft to learn how Microsoft leveraged O.R. to create a fraud detection system that identifies and reduces online fraudulent activity, while protecting legitimate consumer purchases and saving tens of millions of dollars.”

Flawed analysis, failed oversight: How Boeing and FAA certified the suspect 737 MAX flight control system | The Seattle Times

The discrepancy over this number is magnified by another element in the System Safety Analysis: The limit of the system’s authority to move the tail applies each time MCAS is triggered. And it can be triggered multiple times, as it was on the Lion Air flight.One current FAA safety engineer said that every time the pilots on the Lion Air flight reset the switches on their control columns to pull the nose back up, MCAS would have kicked in again and “allowed new increments of 2.5 degrees.”“So once they pushed a couple of times, they were at full stop,” meaning at the full extent of the tail swivel, he said.

Source: Flawed analysis, failed oversight: How Boeing and FAA certified the suspect 737 MAX flight control system | The Seattle Times

 

Highly recommend reading this – a case study of how many errors and mistakes that may individually seem small cascade into a catastrophe: A single point of failure (little/no redundancy) with a faulty sensor, a shockingly poorly designed automatic control algorithm that accumulates errors and overrides human intervention, lack of visibility and training for pilots, criminally poor documentation, a desire to go to market fast by avoiding bringing attention to the novelties of the system that would need extra certification … this reminds me of the Challenger disaster which was caused by a similar litany of missteps.

We live in a complex world of machines and artificial intelligence. We trust our lives on a daily basis to these systems –  in planes, in cars, even in our homes. Complex systems have complex paths to failure, and have the very real problem that small errors can add up to disasters without coming to notice individually.

Knee-jerk reactions like blaming all artificial intelligent control or automation is not the right answer. We cannot become Luddites. But we do need to become much more cognizant of what it takes to build safe and reliable complex systems. This is not like shipping a social networking or photo sharing app. These are mission critical systems and people’s lives depend on them! Our education system, regulatory system, and work ethics all need to held to a much higher standard, or unfortunately we are going to see many more of these types of disasters.

 

Looking Beyond the Internet of Things – The New York Times

And products that respond to their owner’s tastes — something already seen in smartphone upgrades, connected cars from BMW or Tesla, or entertainment devices like the Amazon Echo — could change product design.

Source: Looking Beyond the Internet of Things – The New York Times

Nice to see that the thoughts I have been expressing in my series of posts on modern data in formed product development resonate with those of other leading luminaries … 🙂

Cars’ Voice-Activated Systems Distract Drivers, Study Finds – The New York Times

The research shows that the technology can be a powerful distraction, and a lingering one.

Source: Cars’ Voice-Activated Systems Distract Drivers, Study Finds – The New York Times

This article talks about an interesting study that draws some very intriguing conclusions. For example:

A surprising finding was that the off-task performance in the DRT task differed significantly from single-task performance. Given that drivers were not engaged in any secondary-task activities during the off-task portions of the drive, it suggests that there were residual costs that persisted after the IVIS interaction had terminated.

Some things that I like about the study are the fact that they use actual telemetry gathered from video camera installed in the car (and analyzed later) to measure reaction time etc. They also combine the data with qualitative feedback from the drivers. They also give statistical significance (error bars).

Cognitive workload was determined by a number of performance measures. These were derived from the DRT task, subjective reports, and analysis of video recorded during the experiment. 

All that is good.

What bothers me though, is the small sample size which can potentially lead to bias (the statistical significance test cannot cure the bias problem). And they do not even do an AB experiment i.e. people using the voice tech vs people not using it. Rather they just give an absolute measurement. This is probably the single biggest counter argument. We cannot really prove “causation” without an AB test.

 

Here are some excerpts from the paper mentioned in the article that could raise flags:

Two hundred fifty-seven subjects participated in a weeklong evaluation of the IVIS interaction in one of 10 different model-year 2015 automobiles.

Sample size too small?

Following approval from the Institutional Review Board, participants were recruited by word of mouth and flyers posted on the University of Utah campus. They were compensated $250 upon completion of the weeklong study.

Maybe people who respond to the surveys are predominantly bad (or good) drivers, that would immediate pollute the study.

 

Next, participants completed one circuit around the 2.7-mile driving loop, located in the Avenues section of Salt Lake City, UT in order to become familiar with the route itself. The route provided a suburban/residential driving environment and contained seven all-way controlled stop signs, one two-way stop sign, and two stoplights.

This environment may be too artificial and may not really represent the real world conditions. What about weather, street events, pedestrians, accidents, stalled cars on the road … ?

All my objections could be very completely solved by using in-app telemetry from real users out in the wild who number in hundreds of millions. By properly choosing control groups (controlling for all factors like car type, age etc, while  only leaving voice tech as the differentiation) we could then very easily establish r disprove causality. Also in-app telemetry could include many more this like braking, acceleration, erratic swerving etc rather that just reaction time. Lastly the data will be collected in-situ, hence will reflect real world variety of conditions like rain, traffic, pedestrians etc. You get my drift. Then the conclusions of such studies will be beyond reproach and can lead to appropriate laws and regulations rather than pointless bickering by uninformed guests on Fox or CNN!

 

Decision making in agile product development

In my series of posts on data informed agile product development,  I have thus far talked about

Today I would like to talk about decision making based on metrics. The decisions that a business needs to make are usually in the following categories

  • What new products and  features to build?
  • When and how to ship those features?
  • How to react to events happening in products in the wild?

While these questions  may seem very varied, there is actually a consistent underlying theme to how you can decide on them using data and telemetry. The main point I want to make today is that when making any decision, there is always a tension between agility and stability. Data informed decisions allow a better resolution  of this tension.

Agility is the ability to adapt and capitalize on changing conditions of the business environment.  Reacting to a competitor and coming out with a similar feature, or detecting bugs or vulnerabilities with the product and releasing patches or making recalls, or reacting to social media and changing the terms of use of a product are all facets of agility.

Stability is the ability to maintain the quality of business in terms of reliability, reputation and trustworthiness. Shipping new features in software without breaking old ones, adding new hardware that remains compatible with older systems and interfaces, verifying stray or long term impact of new features or products before making them available broadly (somethings that is very important in the pharma industry), are all facets of stability.

It may seem obvious to you but it is still worthwhile stating that you cannot have perfect agility and perfect stability at the same time. The trade-off between agility and stability is quantitatively described in terms of an operating curve on the axes of mean time to failure (that measures stability) and mean time to recovery (which measures agility).

For example in the car industry, the mean time to making a major recall is a measure of stability, while the amount of time required to execute the recall on the affected population of cars is a measure of agility. Why are they in tension? Because performing a recall in a hurry or in a slapdash manner without doing  a proper fix of an underlying systemic problem will inevitably mean that a new recall will be needed pretty soon, as the same problem manifests in another manner.

agility-vs-stability

What operating point you choose on this operating curve is entirely up to you, and marks you as an aggressive or conservative business.  So this is obviously a control knob you have. Surprisingly, this is not actually the most important control knob!

There is another, more important, control knob you have, namely that you can choose the operating curve itself! 

When you make decisions without using any data you are operating on the curve marked as “pure guessing” in the figure shown above, which means that you are essentially making random decisions.  The operating curve of a business that is not data driven is very poor indeed. On this curve you may simulate a lot of agility (take actions quickly, hence have a small MTTR) but an alarming number of of your actions result in failure. Or, you may choose to sit on your hands and take few bold actions,  in which case your failure rate is small but you can never react well to changing circumstances.  And then there are all the mediocre combinations in between. So basically this operating curve is awful and businesses that live here don’t tend to last very long!

However as you start using data to inform your business decisions (and in particular your engineering decisions) you start moving up a ladder of operating curves. The more comprehensive and real time your data gets, the better is your operating curve. This is shown by the the thick arrow in the figure.

Finally if you actually have an omniscient Oracle to help you, who knows everything about every event, and who knows it instantaneously, you can operate on the best possible operating curve, where mean time to failure is infinity (because you always anticipate every failure) and mean time to recovery is very small, limited only by the physics of designing, manufacturing and shipping your product. (The fact that even an Oracle driven business cannot have perfect agility is note worthy. If a service company such as an electricity company suffers outages and discovers them precisely and instantaneously, it is still going to take a non-zero number of hours to send out the crews and do the repairs. Thus the time to recovery cannot be zero.)

It is not surprising that practically every business operates far away from the ideal oracle curve, though many business do a decent job of getting within a shouting distance.  More frightening and surprising is the fact that a large proportion of businesses operate on curves that are only marginally better than random guessing. (Think of all the ridiculously expensive and catastrophic mergers and acquisitions that big corporations make!)

Real time acquisition of in product telemetry allows a corporation to work on an operating curve that is very decent, and this acts as a great competitive advantage. They can innovate quickly, but without trashing their existing products and systems. They can  react to world events in a jiffy but still manage not to ship bummers frequently.

Once you see telemetry as giving a competitive advantage in terms of the stability-vs-agility operating curve, you sudden realize that it has Darwinian consequences.

If you recall, many posts ago I said I had an epiphany of sorts about modern product development- well this was that epiphany:

The successful modern company uses every possible source of data about its product – especially in-product telemetry. It tries to gather that data as quickly and comprehensively as it can – in real time if possible. And it has systems in place to process that data and produce actionable metrics with a matching rapidity. It is not as if the metrics are infallible – they can at times guide the business to making a wrong decision. But the probability of this happening is much smaller than if you just made ad-hoc random decisions. The modern data informed company therefore avoids making large earth-shattering decisions. Instead it takes small incremental actions, and always checks the consequences by observing the telemetry. And most importantly, it makes these incremental innovations very frequently, in effect integrating and leveraging this probability advantage.

So that is the recipe to success:

  • Use telemetry to ensure you work on a good operating curve of MTTF vs MTTR
  • Make frequent incremental innovations to integrate and realize the probability of making good decisions that is inherent in that curve!

 

Postulating the Hypothesis

As promised last time, I would like to take you on an illustrated tour of “data driven agile product development”. The scenic view points we will be visiting on this tour are hypothesis formulation, instrumentation, data staging, experiment setup, sampling, statistical modelling, metric development, and finally hypothesis resolution and decision making. And of course, as I mentioned in my earlier post, this is an iterative process, so there is always going to be rinse and repeat.

Obviously, we cannot have an illustrated tour without an example. I was first tempted to use a comfortable example like web search, recommendation engines, social networking, productivity applications or any one of such online web services. But on second thought, that is not challenging enough. Firstly it may lead you to think that the arguments I am making are applicable to web based services only and not to brick and mortar services and products. That is definitely not the case. Secondly it does not allow me to showcase the enabling power of the Internet of Things technology, which I strongly feel is one of the trifecta of technologies that will drive the future of agile product development. Thirdly, by using a physical commonplace product I want to test whether my ideas, admittedly inspired by agile development of online services, stand the test of generalization to the offline world. It is easy to explain special cases with specialized arguments. True joy comes from discovering a generic principle that is applicable with few caveats, if any.

So I have decided to take more accessible example of a product, namely the ubiquitous automobile.

There are plenty of interesting questions a car maker or the transportation authority could ask about an automobile, and the answers to those could drive how the new models of the cars are designed, how they are regulated and how car  related infrastructure (roads, gas stations, traffic lights etc) is laid out. Out of those many questions, I have decided to take one particularly relevant one.

You may have heard the recent debate about whether there should be heads-up displays (HUDs) on the windshields of cars. The technology poses a hard question (or, in smarty pants language, “poses an ambiguous hypothesis”). On one hand one may argue that displaying information on the windshield may encourage the user to keep his/her eyes pointed straight ahead rather than surreptitiously glancing at the cellphone in his/her lap while driving. This may improve safety, and reduce the number of distracted driving accidents. On the other hand, many people, including many knowledgeable physiologists and experts in man-machine interfaces, contend that having something in your field of vision is no guarantee of it being noticed. There is a well known selective attention trick that our brain can play on us. If the stuff displayed on the screen is sufficiently engaging to the user, he may just mentally block out the stuff happening beyond the screen on the road itself, with disastrous consequences.

And of course there is third type of argument, namely that any kind of distraction from driving is bad, be it digital displays (on the windshield, on a phone screen or in-dash), playing music, phone calls (with hands-free device or not), and even eating or talking with a co-passenger (not to mention falling asleep at the wheel)!

Clearly the hypothesis is not an easy one, and discussions about it can easily degenerate into polemic. If the question was something simpler like “is it worth having brakes in a car?”, we would have near complete unanimity. And hence the decision to put in brakes in a car was probably taken  by some executive in the 19th century based on “gut feeling”.  But how about questions like: is wearing a seat belt useful? is a blood alcohol level of 0.07 safe for driving? Does presence of traction control really reduce accidents? They get progressively nuanced, and it starts getting difficult to instantly denounce either answer as a clear “flat earth” delusion. The HUD hypothesis is probably far out on this type of difficult scale and hence it definitely merits data driven answer.

So we have climbed the first step of the data driven journey! We have formulated a hypothesis that deserves data driven decision making. This hypothesis can quickly and easily be exploded into many variants and iterated upon : If HUDs do indeed tend to reduce crash risk, how much of the screen should they cover? What kind of things should be displayed? What colors and fonts should they be rendered with? Is animation worse that static rendering? Should they be automatically turned off in certain situations like bad weather, heavy traffic or tiredness of the driver? On the other hand if any kind of HUDs tend to increase the risk of a crash, how do they fare as compared to cell phone use? Is the presence of both  types of displays being used together more harmful than the presence of each one in isolation? Does the age of the driver matter to the outcome? Does the market (country) make a difference? Does left hand drive vs right hand drive have any bearing? (You may laugh at this last question, but consider that our brains and our bodies have a “handedness” – does this handedness extend to the field of vision, and hence have an interaction with the handedness of the driver’s location in the car and on the road?”)

The purpose of flooding you with all these variant hypothesis is to show you that while we first asked a nice clean question (does HUD make a car safer?), it may be getting impacted by a whole bunch of other factors, and we may not even anticipate many of them! So how can you solve our original problem?

The answer is: let the data speak for itself. And I really mean this in a statistical sense, and not simply as a call to action. An unbiased sample of data will automatically contain the influence of all these factors, both those that are detectable and those that are not. Detectable factors like fonts and colors in the HUD, or handedness of the drive can, and should, be logged, so that we can later pivot our analysis on them. (I will talk about instrumentation in the next post). But even the factors that are not detectable easily, say whether the driver was tired or whether the weather was bad,  will still be present in the data in proportion to their occurrence in real life, even if they could not be logged explicitly, and so a data based decision will be able to account for them.  For example, if HUDs are no problem in good weather but are a serious hazard in (rare) stormy weather, then a sufficiently large unbiased sample of data should pick up those rare but dangerous stormy weather situations too, and a good metric will tease out badness of those case.

Next stop: Instrumentation. Now that we have identified a question that needs a data driven answer, how do we decide what data to look at, and how do we get hold of that data?

A Math Problem From Singapore Goes Viral: When Is Cheryl’s Birthday? – NYTimes.com

A Math Problem From Singapore Goes Viral: When Is Cheryl’s Birthday? – NYTimes.com.

A nice problem to hone your probabilistic/logical reasoning. The trick is to see every piece of data for what it is and how it adds to previous knowledge. Also important to keep in mind at every step three types of knowledge:

  1. What Albert knows
  2. What Bernard knows
  3. What is common knowledge (i.e what you, the reader, knows.)

Happy reasoning!

PS: It is possible to produce probabilistic variants of this problem by tweaking the dates so that certain uniqueness properties are not satisfied. In that case the riddle could be posed as: What single binary question can you ask Bernard to resolve the birthday?