Last time we talked about formulating the hypothesis (asking the question) about our product that needs a data driven resolution. We saw that ambiguous questions necessarily need data driven answer because our intuition often fails due to the complexity and subtlety of factors.
Once the the hypothesis has been formulated, instrumentation of the product is the way to gather data. In the old days of industrial engineering, instrumentation actually referred to measurement instruments such as gauges and dials places at strategic points in a plant (such as a nuclear power plant or a chemical plant), and these apparatuses would typically measure gross quantities like temperature, pressure, flow and density. Subsequently, in the day of Web 1.1 and 1.2, instrumentation was also interpreted as the metering of web based services like search or file sharing, where the service provider would count things like clicks, views, and downloads. Of late the word instrumentation has started to recover its old fashioned meaning wherein we mean measurement of physical properties of a product related to its location, usage and proximity to other entities. This has happened primarily due to the advent of Internet of Things which allows various kinds of sensors to inexpensively placed in products and which can push their data into the cloud, often over wireless channels.
So yes, instrumentation for us means metering of a product or a service as well as measurement of its physical properties and processes. The problem of course is that there is no upper bound to how much, and in what detail, one can instrument a product. Recalling our example of a car, we could instrument diverse system health properties of the car such as
- tire pressure and wear
- engine diagnostics (fuel consumption, richness of mixture, timing of valves)
- properties of the battery
- properties and fluid levels of hydraulic, cooling and lubrication systems
- and so on.
We could also instrument its parking location and ambient proximity properties (home/office/parking lot), including duration of parking.
Finally we could instrument a rich set of attributes about how the driver and passengers of the car are using the vehicle such as
- speed and acceleration
- braking, skids, hydroplaning, swerving
- how many times, and which, doors are opened and closed, including moon and sun roofs
- how are seats, mirrors, windows and other user specific control adjusted
Be assured that I am merely scratching the surface here. Every leading car manufacturer probably has teams of PhD thinking deeply about what signals to instrument, why and how.
But the truth of the matter is that there is always the possibility that you forgot or neglected to instrument something that in fact does have a significant impact on answering your question. That is the nature of the beast – there is no such thing as perfect instrumentation. Which bring me to the most important message I want to get across – our instrumentation system, above all, needs to be adaptive and adaptable. The incremental cost of instrumenting a new sensor or metering a new property should be minimal and should never become the bottle neck. So for example during our data analysis phase about safety of HUDs, if we come to the conclusion that knowing the weather conditions is an absolutely important piece of knowledge, we should be able to add a sensor to the car (during a routine recall or service appointment, or in the next model of the car) that can measure presence of moisture or snow or fog on the windshield and get the telemetry of the signal flowing into our cloud with minimal effort. This kind of extensible instrumentation will be at the heart of the future data driven product. It will encapsulate, in miniature, the same principle of extreme flexibility and agility, that the entire product development cycle is driving towards.
Having said that, we do need to use some domain knowledge and common sense to choose the set of signals to instrument in the first iteration. One general principle to follow is to instrument all extrinsic signals that are generated due to particular choices that the user of the product makes when using the product. These signals are first class citizens and a veritable goldmine of intelligence because they come closest one can get to actually reading the mind of the user. Thus in our example case, the user interaction signals like speed, acceleration, braking, starting and stopping etc are all highly valuable and must be instrumented. Furthermore these signals need to be instrumented at a rate that is faster than the typical time constant of human actions (which is of the order of a second). Thus speed and braking need to be instrumented at a sub-second level, otherwise we risk throwing away critical information.
Intrinsic signals are typically instrumented around system level processes that the user is not directly aware of and does not directly influence, such as the air-fuel mixture ratio of the engine, minute adjustments made by the traction control system, timing of the engine valves, fluid levels etc. These signals are obviously important to understand product health but they are not directly informing us much about the user. That is, almost any user could sit in the driver’s seat and produce practically the same intrinsic signals, in similar road and weather conditions. This is the same reason they can be sampled at a variety of rates depending solely on the physics of each signal. For example fluid levels could be measured once a day, while valve timing has to be measured at millisecond accuracy. It does not really have anything to do with human time constants.
Historically such intrinsic/system-health signals have received inordinate attention because they could be directly used in product engineering and fine tuning. They allowed the engineers to check and ensure that the product they built was functioning in the field according to their a-priori expectations and specification. But these signals did not actually give much bang for the buck in terms of product innovation because they did not tell them if they had designed and built the right product in the first place. So for example, the engineers could know if the efficiency of their internal combustion engine was good enough, but it would not tell them that certain users could greatly benefit from a hybrid or even a a fully electric vehicle due to the nature of their driving habits. This is the classic dont know what I dont know problem
Therefore, in my view, while intrinsic signals are important and need to be logged at a certain level, we need to start paying much more attention to extrinsic signals which directly measure the man-machine interface.
Coming back to our HUD problem, I would for example, invest the money needed to track the users eye gaze. If this means putting in a stereoscopic camera and depth sensor in the car, then so be it. Similarly I would want to pay close attention to how the user changed his driving habits when the HUD display was show to him, so I would minutely instrument acceleration, braking, swerving, as well as minor crashes and dents (fender benders).
Designing good instrumentation is a very hard and very crucial problem. Just concentrating on instrumenting some mundane signals simply because you have the technical know how to be able to do it, while ignoring other difficult signals because they are outside your comfort zone, is a recipe for disaster. (I highly recommend reading the book How to measure anything by Douglas Hubbard!). Treating instrumentation and measurement as first class citizens of your product, and not merely a band-aid that you put on as an after thought, is crucial for modern data driven engineering. People who bravely tackle difficult instrumentation problems should be encouraged and rewarded.
Generating reams and ream of useless data is no panacea to solving product engineering problems. (Which is the reason I feel pretty uncomfortable with the phrase BIG data.) Having diverse and relevant data is much more crucial. When the product attains scale and market share, the BIG part will automatically follow!