WHAT S AMAZON SQS:
Amazon Simple Queue Service is a distributed message queuing service introduced by Amazon.com in late 2004. It supports programmatic sending of messages via web service applications as a way to communicate over the Internet. Amazon Simple Queue Service (Amazon SQS) is a pay-per-use web service for storing messages in transit between computers. Developers use SQS to build distributed applications with decoupled components without having to deal with the overhead of creating and maintaining message queues.
Oyster Case Study
New York-based Oyster.com shares unvarnished reviews of hotels in nearly 200 destinations worldwide. The company’s own investigators visit each location to assess cleanliness, amenities, service and overall quality. What sets Oyster apart from similar sites is its extensive collection of photographs. Oyster takes hundreds of photos for each property, and every review includes dozens of untouched images (submitted by guests as well as investigators) that allow potential visitors to compare a hotel’s marketing material with reality.
It took less time to rewrite the code and do a full processing job with AWS than it took to do a single run with the old method.”
Cofounder and Vice President of Product, Oyster
Since its 2009 launch, Oyster has published more than one million high-quality digital images. When this massive volume of images became too cumbersome to handle in-house, the company decided to offload the content to a central repository on Amazon Simple Storage Service (Amazon S3). “We migrated to Amazon S3 in 2010,” says Eytan Seidman, Co-Founder and Vice President of Product. “We chose moving to the cloud and Amazon S3 because storing images in our data center would have been too costly. Amazon S3 was a more economical solution.”
Oyster reprocesses its entire collection of photographic images a few times each year to update the copyright year and, if necessary, to change the watermarks. Using their previous solution, reprocessing the entire collection of photographs required about 800 hours to complete. In addition, Oyster often recreated existing images in new formats and sizes for mobile and tablet devices. Resizing existing images and adding new ones was slowing down the rate at which the company was able to process the collection. “Our processes were slowing down,” says Seidman. “When the iPad with Retina display came out, for example, it took us more than a week to create new sizes specifically for that resolution.” Oyster considered purchasing additional hardware, but found the cost of new hardware and routine maintenance was too high, especially when the machines would sit idle most of the time.
Moreover, there were numerous software bugs in the multiprocessing solution that the company used, but since the solution didn’t scale, Oyster didn’t bother to fix them.
Why Amazon Web Services
“We were already using Amazon S3 to store the images, so using Amazon Elastic Compute Cloud (Amazon EC2) to process the images was a natural choice,” Seidman says. Chris McBride, a software engineer at Oyster, adds, “We wanted a cloud environment that could be ramped up for the large processing jobs and downsized for the smaller daily jobs.”
While the company is still running one local server, the bulk of the processing work now takes place on the AWS Cloud. Oyster is using a customized Amazon Linux AMI within Amazon EC2. Within this new environment, the company connects to Amazon S3 and Amazon Simple Queue Service (Amazon SQS) using boto, a Python interface to AWS. The images themselves are processed with the ImageMagick software available in the AMI package.
Oyster uses Amazon EC2 instances and Amazon SQS in an integrated workflow to generate the sizes they need for each photo. The team processes a few thousand photos each night, using Amazon EC2 Spot Instances. When Oyster processes the entire collection, it can use up to 100 Amazon EC2 instances. The team uses Amazon SQS to communicate the photos that need to be processed and the status of the jobs.
Oyster’s old system needed approximately 400 hours to process one million photos. By using AWS, the company can process the same number of photos in about 20 hours — a 95 percent improvement. “It took less time to rewrite the code and do a full processing job with AWS than it took to do a single run with the old method,” says Seidman. “It used to take close to a week to produce photos specifically for the iPad. With AWS, we can create the photos in just a few hours. The documentation is straightforward and the dashboards are incredibly helpful.”
Oyster has also been able to reduce in-house hardware expenses by repurposing two of its old servers, which were sitting idle more than 80 percent of the time. “We estimate that we saved roughly $10,000 in capital expenditures by moving to AWS, and reduced our operating expenses by an additional $10,000,” Seidman says. He believes that AWS is a perfect match for any company performing similar batch processing. “AWS lets us move faster without worrying about machine expenditures or maintenance, which frees us to focus on other things.”