Data Fusion: Combining Multiple Analysis
Executive SummaryNowadays, a lot of money is spent on advertisement on a yearly basis. For advertisers it is important to know what the pay-off of their advertisement will be. Therefore, it is important to know how many people will see the advertisement (or: how many people will be reached). Several respondent researches are available to fulfil this need for information. For example, the reach of magazines and newspapers is measured by print researches. In a print research, a so-called ‘reading probability’ is available for every respondent. This ‘reading probability’ serves as an indicator in computing the reach.
In contrast, the reach of websites is measured by an internet research that tracks the behaviour of internet respondents. The results are used to compute the probability that respondents visit a certain website in a certain period. The resulting data is published by independent agencies and serves as the currency in the market.
Advertisers show an increasing demand for combined reach figures. This is a result from the increased use of several media in a single advertisement campaign. Moreover, publishers of print media often have an accompanying website. Hence, the question is: who is reached by both an advertisement in a magazine/paper as well as an advertisement on the internet?
Data fusion
A combined research, with information on both print reach and internet reach, could be created by setting up a research that contains information about print reach as well as internet reach. Nevertheless, this is not cost-efficient. An alternative method is to complement the print research with information about the internet reach. This is done by a mathematical technique that uses overlapping information, i.e. information from both analyses. This technique is called Data fusion.
Data fusion combines the information of two analysis by using overlapping information. We used one of the data fusion techniques for generating combined print and internet data. This data fusion technique is related to a well-known statistical technique named “Imputation”. Imputation is used for complementing data in a data set. Basically, the print research can be seen as a research that misses some data.
The data fusion method consists of two sequential steps. At first, econometric models need to be estimated, based on the respondent data of the internet research. Secondly, these models need to be applied on the respondents of the print research. Since the data set contains a large amount of websites, both steps are accomplished fully automatically.





