Category Archives: instrumentation

Looking Beyond the Internet of Things – The New York Times

And products that respond to their owner’s tastes — something already seen in smartphone upgrades, connected cars from BMW or Tesla, or entertainment devices like the Amazon Echo — could change product design.

Source: Looking Beyond the Internet of Things – The New York Times

Nice to see that the thoughts I have been expressing in my series of posts on modern data in formed product development resonate with those of other leading luminaries … 🙂

Cars’ Voice-Activated Systems Distract Drivers, Study Finds – The New York Times

The research shows that the technology can be a powerful distraction, and a lingering one.

Source: Cars’ Voice-Activated Systems Distract Drivers, Study Finds – The New York Times

This article talks about an interesting study that draws some very intriguing conclusions. For example:

A surprising finding was that the off-task performance in the DRT task differed significantly from single-task performance. Given that drivers were not engaged in any secondary-task activities during the off-task portions of the drive, it suggests that there were residual costs that persisted after the IVIS interaction had terminated.

Some things that I like about the study are the fact that they use actual telemetry gathered from video camera installed in the car (and analyzed later) to measure reaction time etc. They also combine the data with qualitative feedback from the drivers. They also give statistical significance (error bars).

Cognitive workload was determined by a number of performance measures. These were derived from the DRT task, subjective reports, and analysis of video recorded during the experiment. 

All that is good.

What bothers me though, is the small sample size which can potentially lead to bias (the statistical significance test cannot cure the bias problem). And they do not even do an AB experiment i.e. people using the voice tech vs people not using it. Rather they just give an absolute measurement. This is probably the single biggest counter argument. We cannot really prove “causation” without an AB test.

 

Here are some excerpts from the paper mentioned in the article that could raise flags:

Two hundred fifty-seven subjects participated in a weeklong evaluation of the IVIS interaction in one of 10 different model-year 2015 automobiles.

Sample size too small?

Following approval from the Institutional Review Board, participants were recruited by word of mouth and flyers posted on the University of Utah campus. They were compensated $250 upon completion of the weeklong study.

Maybe people who respond to the surveys are predominantly bad (or good) drivers, that would immediate pollute the study.

 

Next, participants completed one circuit around the 2.7-mile driving loop, located in the Avenues section of Salt Lake City, UT in order to become familiar with the route itself. The route provided a suburban/residential driving environment and contained seven all-way controlled stop signs, one two-way stop sign, and two stoplights.

This environment may be too artificial and may not really represent the real world conditions. What about weather, street events, pedestrians, accidents, stalled cars on the road … ?

All my objections could be very completely solved by using in-app telemetry from real users out in the wild who number in hundreds of millions. By properly choosing control groups (controlling for all factors like car type, age etc, while  only leaving voice tech as the differentiation) we could then very easily establish r disprove causality. Also in-app telemetry could include many more this like braking, acceleration, erratic swerving etc rather that just reaction time. Lastly the data will be collected in-situ, hence will reflect real world variety of conditions like rain, traffic, pedestrians etc. You get my drift. Then the conclusions of such studies will be beyond reproach and can lead to appropriate laws and regulations rather than pointless bickering by uninformed guests on Fox or CNN!

 

The fallacy of perfection

In this post I would like to round out my series on agile data informed product engineering by talking about how to drive the right kind of data culture in your organization. The main message I want to get across is that data does not have to be perfect in order for it to be useful. I am basically going to expand  a theme that has been aptly disused in Douglas Hubbard’s book “How to measure anything “.

An in-product telemetry system is very different from transaction processing systems like those used in banking or airline reservation or medical records management. The only thing that is common to them sometimes is that they often are big data systems. But that is hardly of much interest to us.  As I have said before, data being “big” is simply a matter of scale and not necessarily a matter of quality or significance. The differences between transaction and telemetry systems, on the other hand, are of critical importance.

In a transactional a system there is little to no tolerance for error of any kind. For example if you deposit 20 checks in your bank and only 19 of them show up in your account, that is likely not acceptable to you. Even a minor mistake cannot be tolerated – like you paying in $200 and the system only acknowledging  $199. Or suppose you book  airline tickets on 20 occasions and once in a while the airline simply forgets your booking and tells you to go home from the airport – that is probably going to give you a very bad day too. Or suppose your doctor orders several important blood count measurements and the report is missing or mistaken about some of the counts – that could lead to a life threatening situation. In all these cases the expectation of data quality is very high and tolerance for error is very low.

But from the point of view of data driven product engineering these kind of systems are over-engineered and downright boring in an intellectual sense. It is like buying a million dollar car to go to the grocery store every day. Yes the ride is great and catches attention, but it is a very poor use of resources. Businesses cannot afford to indulge in such luxury because that prevents them from investing resources in actual product innovation. Great cars are those that give you a real bang for the buck. Those are the cars that have been engineered with care, where trade-offs have been made in a well thought out manner. Anybody can build a great car with a  million dollars. It takes a talented engineering team to build great car with 20K dollars.

Similarly, the aim of a telemetry system for agile product development is to gather enough data and with enough fidelity to make good and timely decisions. It is perfectly okay to not gather all data (i.e. sampling your data) and to even make occasional errors in data collection (allowing some noise), as long as the answers to critical questions are not impacted. And it takes some significant data science to figure out what “enough” means and what the optimal sampling strategy is.

Historically, telemetry and AB experimentation originated in Web services like search and music downloads. In these cases since the back-end sees every transaction, it can do very rich logging on the server side itself.  If such systems, if you can log 10% of the data you might as well log 100% of the data without too much overhead. (Storage is essentially free.) And the investment you make in achieving a certain data quality for 10% of the data would also immediately accrue to 100% data collection too without additional cost.

But the equation changes completely when you are going to collect data from software clients that run in machines on the end-user’s hands, or more generally from physical products that are present in the user’s life – on his body, in his home, in his workplace, and generally in his environment. In these scenarios, the users really pays a cost for collecting telemetry. For example, some of his network bandwidth is consumed, which, for metered devices, means actual dollars spent. It also consumes battery power and hence reduced usable life on a single charge. Finally it may actually degrade the user experience with the product because the product is spending some of its resources like CPU and memory in collecting and transporting telemetry.  So sampling the telemetry in a careful and well thought out manner is extremely critical for in-product telemetry.

But the good news is that, if done properly, heavily (down) sampled data can still give you the same quality of insights as the full data. Suppose you want to find out the average height of all the people who work in your organization. Do you really need to measure each of them from head to toe before answering the question? Obviously not.  Why? Because you probably do not need an accuracy beyond, say, +/- 1 cm, and that error margin can be accomplished from a random sample of a much smaller size. This is elementary statistics, and yet is often overlooked. (The “random”  part is of course very important, because by choosing a non-random sample, such as only men, or only people who work in the the boiler room, you can easily bias your answer.) Asking for an unreasonably high level of accuracy and completeness in data such as 99.999% may sound like being very responsible and diligent. But if your hypothesis could be answered say with only a 1% sample, then your insistence on 99.999% is in quite wrong and irresponsible!

For one, it may be an unrealistically high a standard to be met even after throwing the state of the art technology at the problem. Secondly, this can grossly delay your product life cycle and put you at a competitive disadvantage. And thirdly, the insistence of unrealistic level of completeness from existing signals may cause you to lose focus on collecting new, hitherto untouched, types of signals. A cardinal lesson from information theory and machine learning is that new types of signals often give more additional information that more densely sampled data from existing signals. For example, instead of trying to drive the fidelity of click telemetry in you app/web-page to five nines, first consider other types of types of telemetry such as cursor hovers, swipes, pagination and other control.

So next time someone says to you “we need 99.999% data completeness!”, do vigorously push back and ask them “why do you need 5 nines? What kind of question are you asking that requires this level of comprehensiveness?”. Almost always you will find that they have not thought about the scenario in much depth at all.

Selecting a good bouquet of signals for in-product telemetry, and selecting a good sampling rate and a sampling strategy is almost an art as it is a science. It is something that you do better and better as you work in this field.  You need to have a sense of the hypotheses that are being posed and the type of accuracy and confidence that you need in your answers. You also need to have a good instinctive feel for how your users are actually using your product and what type of signals are most informative about their experience. And finally you need to have a good sense for the competitive landscape – how fast are your competitors evolving and what level of agility is needed to catch up and overtake them (if you are behind) or to maintain a safe lead (if you are already ahead). This is where you start putting on multiple hats – sometimes that of a data scientist, sometimes that of a product/program manager and some times simply as the end user of your product because he/she does deserve your empathy and understanding!

As Head-Up Displays Become Common, Distraction Becomes an Issue – The New York Times

To automakers, the technology makes for safer driving because the driver does not need to look down for information. The illuminated graphics, which may be white or colored, are transparent, so that the driver actually looks through them onto the road ahead.But to skeptics, head-up displays are yet another informational distraction for the already data-overloaded driver.

Source: As Head-Up Displays Become Common, Distraction Becomes an Issue – The New York Times

Those of you following my series of posts on data informed engineering, this article should not come as a surprise. It is the working example I have been using all along.

Many more such generic pieces are likely to appear on this topic. But the problem is, no one is quoting any real data driven insights yet.

Car makers say “user feedback has been good”. Skeptics say “it is obviously a distraction”. But what does the the actual telemetry say? I have yet to see any actual data or metrics. So just now we are only stuck in the limbo of he-said-she-said.

When you buy a car a bunch of arcane numbers get thrown at you like mpg here and mpg there, and torque and horsepower and crash rating. But it is mostly around system health metrics. As I have remarked before, similar system health metrics are generated (more or less) no matter who sits in the drivers seat.

But what we really need are driver experience and driver success metrics. And these are likely to be extremely dependent on the type of person behind the wheel. Bring the human into the picture!

Unless we study the impact of these displays on the driving habits of a broad representative selection of real drivers, the jury is really out. And the only way to get those metrics is to deeply instrument the car from a user interaction perspective. Rather than showing me mpg, tell me how many hard brakes that car model experiences per mile driven. Rather than telling me the peak torque, tell me how often that car model runs a red light. Juxtaposition these numbers to well selected control groups. And then let me , as a buyer, make an informed decision.

Instrumentation – where the rubber meets the road

Last time we talked about formulating the hypothesis (asking the question) about our product that needs a data driven resolution. We saw that ambiguous questions necessarily need data driven answer because our intuition often fails due to the complexity and subtlety of factors.

Once the the hypothesis has been formulated, instrumentation of the product is the way to gather data. In the old days of industrial engineering, instrumentation actually referred to measurement instruments such as gauges and dials places at strategic points in a plant (such as a nuclear power plant  or a chemical plant), and these apparatuses would typically measure gross quantities like temperature, pressure, flow and density. Subsequently, in the day of Web 1.1 and 1.2, instrumentation was also interpreted as the metering of web based services like search or file sharing, where the service provider would count things like clicks, views, and downloads. Of late the word instrumentation has started to recover its old fashioned meaning wherein we mean measurement of physical properties of a product related to its location, usage and proximity to other entities. This has happened primarily due to the advent of Internet of Things which allows various kinds of sensors to inexpensively placed in products and which can push their data into the cloud, often over wireless channels.

So yes, instrumentation for us means metering of a product or a service as well as measurement of its physical properties and processes. The problem of course is that there is no upper bound to how much, and in what detail, one can instrument a product. Recalling our example of a car, we could instrument diverse system health properties of the car such as

  • tire pressure and wear
  • engine diagnostics (fuel consumption, richness of mixture, timing of valves)
  • properties of the battery
  • properties and fluid levels of hydraulic, cooling and lubrication systems
  • and so on.

We could also instrument its parking location and ambient proximity properties (home/office/parking lot), including duration of parking.

Finally we could instrument a rich set of attributes about how the driver and passengers of the car are using the vehicle such as

  • speed and acceleration
  • braking, skids, hydroplaning, swerving
  • how many times, and which, doors are opened and closed, including moon and sun roofs
  • how are seats, mirrors, windows and other user specific control adjusted

Be assured that I am merely scratching the surface here. Every leading car manufacturer probably has teams of PhD thinking deeply about what signals to instrument, why and how.

But the truth of the matter is that there is always the possibility that you forgot or neglected to instrument something that in fact does have a significant impact on answering your question. That is the nature of the beast – there is no such thing as perfect instrumentation. Which bring me to the most important message I want to get across – our instrumentation system, above all, needs to be adaptive and adaptable. The incremental cost of instrumenting a new sensor or metering a new property should be minimal and should never become the bottle neck. So for example during our data analysis phase about safety of HUDs, if we come to the conclusion that knowing the weather conditions is an absolutely important piece of knowledge, we should be able to add a sensor to the car (during a routine recall or service appointment, or in the next model of the car) that can measure presence of moisture or snow or fog on the windshield and get the telemetry of the signal flowing into our cloud with minimal effort. This kind of extensible instrumentation will be at the heart of the future data driven product. It will encapsulate, in miniature, the same principle of extreme flexibility and agility, that the entire product development cycle is driving towards.

Having said that, we do need to use some domain knowledge and common sense to choose the set of signals to instrument in the first iteration. One general principle to follow is to instrument all extrinsic signals that are generated due to particular choices that the user of the product makes when using the product. These signals are first class citizens and a veritable goldmine of intelligence because they come closest one can get to actually reading the mind of the user. Thus in our example case, the user interaction signals like speed, acceleration, braking, starting and stopping etc are all highly valuable and must be instrumented. Furthermore these signals need to be instrumented at a rate that is faster than the typical time constant of human actions (which is of the order of a second). Thus speed and braking need to be instrumented at a sub-second level, otherwise we risk throwing away critical information.

Intrinsic signals are typically instrumented around system level processes that the user is not directly aware of and does not directly influence, such as the air-fuel mixture ratio of the engine, minute adjustments made by the traction control system, timing of the engine valves, fluid levels etc. These signals are obviously important to understand product health but they are not directly informing us much about the user. That is, almost any user could sit in the driver’s seat and produce practically the same intrinsic signals, in similar road and weather conditions. This is the same reason they can  be sampled at a variety of rates depending solely on the physics of each signal. For example fluid levels could be measured once a day, while valve timing has to be measured at millisecond accuracy. It does not really have anything to do with human time constants.

Historically such intrinsic/system-health signals have received inordinate attention because they could be directly used in product engineering and fine tuning. They allowed the engineers to check and ensure that the product they built was functioning in the field according to their a-priori expectations and specification. But these signals did not actually give much bang for the buck in terms of product innovation because they did not tell them if they had designed and built the right product in the first place. So for example, the engineers could know if the efficiency of their internal combustion engine was good enough, but it would not tell them that certain users could greatly benefit from a hybrid or even a a fully electric vehicle due to the nature of their driving habits. This is the classic dont know what I dont know problem

Therefore, in my view, while intrinsic signals are important and need to be logged at a certain level, we need to start paying much more attention to extrinsic signals which directly measure the man-machine interface.

Coming back to our HUD problem, I would for example, invest the money needed to track the users eye gaze. If this means putting in a stereoscopic camera and depth sensor in the car, then so be it. Similarly  I would want to pay close attention to how the user changed his driving habits when the HUD display was show to him, so I would minutely instrument acceleration, braking, swerving, as well as minor crashes and dents (fender benders).

Designing good instrumentation is a very hard and very crucial problem. Just concentrating on instrumenting some mundane signals simply because you have the technical know how to be able to do it, while ignoring other difficult signals because they are outside your comfort zone, is a recipe for disaster. (I highly recommend reading the book How to measure anything by Douglas Hubbard!). Treating instrumentation and measurement as first class citizens of your product, and not merely a band-aid that you put on as an after thought, is crucial for modern data driven engineering.  People who bravely tackle difficult instrumentation problems should be encouraged and rewarded.

Generating reams and ream of useless data is no panacea to solving product engineering problems. (Which is the reason I feel pretty uncomfortable with the phrase BIG data.) Having diverse and relevant data is much more crucial. When the product attains scale and market share, the BIG part will automatically follow!