What Is Data Science? [Article Excerpts]

Source PDF (O’Reilly Radar Report © 2010)

The future belongs to the companies and people that turn data into products

Page 6:

“Hadoop has been instrumental in enabling “agile” data analysis. In software development, “agile practices” are associated with faster product cycles, closer interaction between developers and consumers, and testing. Traditional data analysis has been hampered by extremely long turn-around times. If you start a calculation, it might not finish for hours, or even days. But Hadoop (and particu- larly Elastic MapReduce) make it easy to build clusters that can perform computations on long datasets quickly. Faster computations make it easier to test different assumptions, different datasets, and different algorithms. It’s easer to consult with clients to figure out whether you’re asking the right questions, and it’s possible to pursue intriguing pos- sibilities that you’d otherwise have to drop for lack of time.”

“There are many libraries available for machine learning: PyBrain in Python, Elefant, Weka in Java, and Mahout (coupled to Hadoop). Google has just announced their Prediction API, which exposes their machine learning algorithms for public use via a RESTful interface. For com- puter vision, the OpenCV library is a de-facto standard.”

Page 8:

“Data scientists combine entrepreneurship with patience, the willingness to build data products incremen- tally, the ability to explore, and the ability to iterate over a solution. They are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined prob- lems: “here’s a lot of data, what can you make from it?” ”

Page 9:

“The part of Hal Varian’s quote that nobody remembers says it all:

The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades.”

