News Picks : Data science tackles massive digital output

By: Physics Today
18 August 2014

New York Times: Because of the ever-increasing amounts of data being generated by the Web, smartphones, and other technologies, data scientists are having to wrangle with the vast output to pare it down and organize it into a usable format. “You spend a lot of your time being a data janitor, before you can get to the cool, sexy things that got you into the field in the first place,” said Matt Mohebbi, a data scientist and cofounder of Iodine, a new health startup. Several companies are writing computer software to automate the data-wrangling process. Among other challenges, the programs must be able to merge many different data formats. In much the same way that spreadsheets revolutionized data analysis in business and finance, machine-learning technology could help free data scientists from the more mundane sorting tasks so they can concentrate on the bigger picture.


