I’ve been a database analyst and designer since 1984. Over the years I created many tens of relational database solutions. Working with large sets of data has always interested me. When I first read about Hadoop for working with Big Data a few months ago, I quickly started learning as much as I could about Hadoop.
Not long ago I read Amazon Web Services (AWS) created a new service, Amazon Elastic MapReduce and started reading about how to use it. The list below are some of the documents I used in my research:
- Get Started with Amazon Elastic MapReduce
- Amazon Elastic MapReduce FAQs
- How Do I?
Then in reading this recent InfoWorld article, Big data to get even bigger in 2011, I raised my priority in getting some hands-on time with Amazon’s MapReduce. Today using my Mac and my AWS Account, I worked my way through the sample application, Parsing Logs with Apache Pig and Elastic MapReduce in about one hour’s time at a cost of $0.31 of cloud server time.
That sample application uses the Apache’s Pig Latin language project. I wrote about some of my previous learning exercises with Pig. You’ll need to understand the basics of Hadoop and Pig to get the most out of the sample application above.
My software partnership, Newbound, Inc., has as part of the Newbound Software Library, a Database API for working with Hadoop data. We’ll have some announcements in this first quarter about the AWS services. Stay tuned!
The next list are some of the sample code resources I reviewed in my research on Amazon’s MapReduce:
- How to Create and Debug an Amazon Elastic MapReduce Job Flow
- Running Hadoop MapReduce on Amazon EC2 and Amazon S3
- Contextual Advertising using Apache Hive and Amazon Elastic MapReduce with High Performance Computing instances
- Running Hive on Amazon ElasticMap Reduce
- Additional Features of Hive in Amazon Elastic MapReduce
- Operating a Data Warehouse with Hive, Amazon Elastic MapReduce and Amazon SimpleDB
- Finding trending topics using Google Books n-grams data and Apache Hive on Elastic MapReduce
- Writing An Hadoop MapReduce Program In Python
- Python Library for Amazon Elastic MapReduce
- Getting Started with the AWS SDK for PHP
- AWS SDK for PHP
- AWS SDK for PHP Tips and Tricks
There are some development tools available to assist:
- How to Use the Hadoop User Interface
- IBM MapReduce Tools for Eclipse
- Karmasphere Studio Community Edition
- Amazon Elastic MapReduce Ruby Client
Don’t be afraid to dive in to these new technologies. Understanding and working with them are great skills to add to your resume.
Pingback: Using the Amazon Elastic MapReduce Ruby Client | NewAdventures