Data science defeats intuition: twin data points is the norm, not the exception

This is an example where data science and statistical analysis is superior to intuition. Here, intuition is misleading you into the wrong conclusions.

By twin data points, I mean observations that are almost identical. In any 2- or 3-dimensional data set with 300+ rows, if the data is quantitative and evenly distributed in a bounded space, you should expect to see a large proportion – above 15% – of data points that have a very close neighbor.

Smart Data Scientists Virtualize Data

Smart data scientists use data virtualization to integrate data from many diverse sources – logically and virtualized for on-demand consumption by different data analytical applications. For example, data virtualization is used to address challenges such as rogue data marts, business intelligence apps, enterprise resource planning and content systems and portals.