David Hill

Network Computing Blogger


Upcoming Events

Where the Cloud Touches Down: Simplifying Data Center Infrastructure Management

Thursday, July 25, 2013
10:00 AM PT/1:00 PM ET

In most data centers, DCIM rests on a shaky foundation of manual record keeping and scattered documentation. OpManager replaces data center documentation with a single repository for data, QRCodes for asset tracking, accurate 3D mapping of asset locations, and a configuration management database (CMDB). In this webcast, sponsored by ManageEngine, you will see how a real-world datacenter mapping stored in racktables gets imported into OpManager, which then provides a 3D visualization of where assets actually are. You'll also see how the QR Code generator helps you make the link between real assets and the monitoring world, and how the layered CMDB provides a single point of view for all your configuration data.

Register Now!

A Network Computing Webinar:
SDN First Steps

Thursday, August 8, 2013
11:00 AM PT / 2:00 PM ET

This webinar will help attendees understand the overall concept of SDN and its benefits, describe the different conceptual approaches to SDN, and examine the various technologies, both proprietary and open source, that are emerging. It will also help users decide whether SDN makes sense in their environment, and outline the first steps IT can take for testing SDN technologies.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

See more from this blogger

EMC's Data Science Summit 2012: Envisioning the Future of Data

Although separate and distinct, the one-day Data Science Summit 2012 was held in conjunction with EMC World 2012. The purpose of the summit was not to get buried in technical details, but rather to discuss the expanding role of data (such as big data), the analytics that need to be applied to that data, and the role of data from economic, political and social perspectives. In short, the summit broadly envisioned the future of data and its implications.

Last year, the event was called the Data Scientist Summit and focused largely on data scientist "rock star" speakers and panel members. This year's title signaled an intention to focus instead on data science teams. The Greenplum division of EMC and EMC itself have sponsored the events both years.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

What is data science? According to Wikipedia, "data science defines a discipline that incorporates applying various degrees of statistics, data visualizations, computer programming, data mining, machine learning and database engineering to solve complex data problems." The very short article goes on to say that a Data Science Journal has been published since April 2002, so data science is at least a decade old.

Why is data science relevant? Think about it this way: When questioned about the value of the first hot air balloons, Benjamin Franklin is said to have asked in response, "What is the value of a newborn baby?" Actually, data science is probably a long way from the newborn-baby stage, although it still has a long way to go before it achieves full maturity. Data science leads to technologies such as search engines like Google's, which use data outside the page itself; friendship relationships (think Facebook); big-data analysis; and product recommendation systems. In short, data science and data scientists are all about thinking creatively about what information might be useful and putting it in a useful context from which value can be derived.

Below are descriptions of some of the topics discussed at Data Summit 2012

  • Predictive modeling: Predictive modeling has been with us for a long time, but data science goes far beyond traditional regression analysis to pushing the boundaries of what is possible, often involving multiple disciplines in addition to statistical learning, such as how to mine massive data sets.
  • Data visualization: Making use of the power of our eyes to process a lot of information all at once, visualization can provide illumination where insight might not otherwise be easy to obtain.
  • Impact of data science: The individual speakers and panels were keenly aware of how collaboration and other social tools impact products developed by teams of data scientists. They were also focused on the data collected by products that are widely deployed on the Web. Such data collection may result in a conflict between convenience and privacy. For example, analyzing an aggregation of medical records from many people may result in obtaining information that can improve the treatment of disease. However, even if individuals allow their information to be pooled anonymously, effectively securing that very private information is difficult, at best.
  • Tidbits: With torrents of real-world data captured in a natural way from the Web, data conditioning rather than data quality (which is necessary in traditional enterprise systems) is often enough as the outliers may actually contain information of value. As a result, one of the key challenges of data science is being able to separate correlation from causality.

Overall, Data Science Summit 2012 was interesting and useful and should be continued in the future, but a lot of work has to go on in the field to build a superstructure that can focus and promote clear thinking about data science and its potential impacts.

The "horse and carriage" relationship between computation and information has long been expressed by the old term "data processing." Both are needed, but if the center of the IT solar system is becoming more about data, then data science as the next stage in computer science becomes more attractive and important.

However, the data science industry also requires more exposure. Data Science Summit 2012 was useful for sparking thought about the broad issues affecting data science, but its messages need to be carried to a wider audience. Why? So more people can understand and be part of a dialog that is likely to have an impact on their lives in many ways (with not all effects being necessarily beneficial).

The data science community needs to think not just in terms of individuals, teams and projects, but also in terms of how it will act as a functioning industry. The summit was a valuable starting point, but much work needs to be done before the next event. As projects lead to findings and conclusions that expand upon case studies, the results will give deeper direction and substance to the data science movement.

EMC is a client of David Hill and the Mesabi Group.


Related Reading


Network Computing encourages readers to engage in spirited, healthy debate, including taking us to task. However, Network Computing moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. Network Computing further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | Please read our commenting policy.
 
Vendor Comparisons
Network Computing’s Vendor Comparisons provide extensive details on products and services, including downloadable feature matrices. Our categories include:

Research and Reports

Network Computing: April 2013



TechWeb Careers