Using Amazon’s MapReduce Clusters

I’ve been a database analyst and designer since 1984. Over the years I created many tens of relational database solutions. Working with large sets of data has always interested me. When I first read about Hadoop for working with Big Data a few months ago, I quickly started learning as much as I could about Hadoop.

Not long ago I read  Amazon Web Services (AWS) created a new service, Amazon Elastic MapReduce and started reading about how to use it. The list below are some of the documents I used in my research:

Then in reading this recent InfoWorld article, Big data to get even bigger in 2011, I raised my priority in getting some hands-on time with Amazon’s MapReduce. Today using my Mac and my AWS Account, I worked my way through the sample application, Parsing Logs with Apache Pig and Elastic MapReduce in about one hour’s time at a cost of $0.31 of cloud server time.

That sample application uses the Apache’s Pig Latin language project. I wrote about some of my previous learning exercises with Pig. You’ll need to understand the basics of Hadoop and Pig to get the most out of the sample application above.

My software partnership, Newbound, Inc., has as part of the Newbound Software Library, a Database API for working with Hadoop data. We’ll have some announcements in this first quarter about the AWS services. Stay tuned!

The next list are some of the sample code resources I reviewed in my research on Amazon’s MapReduce:

There are some development tools available to assist:

Don’t be afraid to dive in to these new technologies. Understanding and working with them are great skills to add to your resume.

Did you like this? Share it:

About Don Larson

Using computer technology since June 1980.
This entry was posted in Amazon Web Services, Education, Hadoop, Programming Languages, Python, Technology. Bookmark the permalink.

One Response to Using Amazon’s MapReduce Clusters

  1. Pingback: Using the Amazon Elastic MapReduce Ruby Client | NewAdventures

Comments are closed.