Most of the web2.0/social networking/enterprise websites we build everyday for business/hobby gradually evolve themselves as a big bottleneck in termsof storage and scalability. If your website is a online ecommerce site which should serve tons of bytes of data 24/7 where user feedback is given utmost importance,they expect the site to be running and should not serve broken links.

{Let us choose a Web application as an example to give us better understanding of the system}

– Performance

      II-  Everyday, we have to upload several images and flash files to promote the website and to give a much intuitive look and feel to the user.Irrespective of the code{php or ruby or perl} which is being run on the server, the site suffers slow down and performance issues when the static files increase and the web server load increases as it has to parse all the images and files.

            So, we have to look out for a solution to host them either on a RAID server or a fast processor with lots of RAM

– Storage,Backup and Hosting

     II- If your website has to pull out all the daily transactions/data based on user activity, you should already have a plan to provide a storage  and backup solution. You have to carefully track the backup as day to day activity logs must be stored in a secure and safe location. Though it sounds funny, any natural disasters or crashing of storage server machines will put us in a lot of trouble.

Putting Up all these things in single sentence,

We need some thing very reliable to

–  Allow Easy Upload and Retrieval of Data From Any where and Any Time.

–  Highly Scalable,Reliable,Fast and Inexpensive Data Storage Infrastructure.

How can we achieve this :

 Get Brief on Amazon Web Services with S3

Amazon Simple Storage Service (Amazon S3) ?

Amazon S3 is intentionally built with a minimal feature set.

  • Write, read, and delete objects containing from 1 byte to 5 gigabytes of data each. The number of objects you can store is unlimited.

  • Each object is stored in a bucket and retrieved via a unique, developer-assigned key.

  • A bucket can be located in the United States or in Europe. All objects within the bucket will be stored in the bucket’s location, but the objects can be accessed from anywhere.

  • Authentication mechanisms are provided to ensure that data is kept secure from unauthorized access. Objects can be made private or public, and rights can be granted to specific users.

  • Uses standards-based REST and SOAP interfaces designed to work with any Internet-development toolkit.

  • Built to be flexible so that protocol or functional layers can easily be added. Default download protocol is HTTP. A BitTorrent™ protocol interface is provided to lower costs for high-scale distribution. Additional interfaces will be added in the future.

  • Reliability backed with the Amazon S3 Service Level Agreement.

Ref : http://aws.amazon.com/s3/

 How does it help Web Sites

Amazon S3 is based on the idea that quality Internet-based storage should be taken for granted.

It helps free developers from worrying about how they will store their data, whether it will be safe and secure, or whether they will have enough storage available.

It frees them from the upfront costs of setting up their own storage solution as well as the ongoing costs of maintaining and scaling their storage servers.

The functionality of Amazon S3 is simple and robust:

Store any amount of data inexpensively and securely, while ensuring that the data will always be available when you need it.

Amazon S3 enables developers to focus on innovating with data, rather than figuring out how to store it.

Amazon S3 Image Hosting and Storage Services

Amazon S3 was built to fulfill the following design requirements:

  • Scalable: Amazon S3 can scale in terms of storage, request rate, and users to support an unlimited number of web-scale applications. It uses scale as an advantage: Adding nodes to the system increases, not decreases, its availability, speed, throughput, capacity, and robustness.

  • Reliable: Store data durably, with 99.99% availability. There can be no single points of failure. All failures must be tolerated or repaired by the system without any downtime.

  • Fast: Amazon S3 must be fast enough to support high-performance applications. Server-side latency must be insignificant relative to Internet latency. Any performance bottlenecks can be fixed by simply adding nodes to the system.

  • Inexpensive: Amazon S3 is built from inexpensive commodity hardware components. As a result, frequent node failure is the norm and must not affect the overall system. It must be hardware-agnostic, so that savings can be captured as Amazon continues to drive down infrastructure costs.

  • Simple: Building highly scalable, reliable, fast, and inexpensive storage is difficult. Doing so in a way that makes it easy to use for any application anywhere is more difficult. Amazon S3 must do both.

        Principles:

      principles of distributed system design were used to meet Amazon S3 requirements:

  • Decentralization: Use fully decentralized techniques to remove scaling bottlenecks and single points of failure.

  • Asynchrony: The system makes progress under all circumstances.

  • Autonomy: The system is designed such that individual components can make decisions based on local information.

  • Local responsibility: Each individual component is responsible for achieving its consistency; this is never the burden of its peers.

  • Controlled concurrency: Operations are designed such that no or limited concurrency control is required.

  • Failure tolerant: The system considers the failure of components to be a normal mode of operation, and continues operation with no or minimal interruption.

  • Controlled parallelism: Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.

  • Decompose into small well-understood building blocks: Do not try to provide a single service that does everything for everyone, but instead build small components that can be used as building blocks for other services.

  • Symmetry: Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function.

  • Simplicity: The system should be made as simple as possible (but no simpler).

 Advantages

  • No capital outlay

  • Pay as you go – pricing seems competitive

  • Data replicated at multiple sites

  • Can behave like NAS

  • Can be accessed via a browser

  • Unlimited storage

  • Choose the USA or Europe for the storage location

  • Can be accessed from any location with Internet access.

  • Low storage and operational costs; it is hard to compete with their prices for large internet-visible datastores.

  • Geo-location ensures high availability.

  • EU datastore can be used to remain compliant with EU data protection legislation.

  • RESTy interface is easy to use through third party libraries and tools.

  • A public bucket can be used to serve up content direct to third parties; no need for any other hosting. All static content can be served this way.

  What People Speak About S3

II – You can’t complain at the prices, and it’s nice to know that your data can grow as large as you like without crossing any major billing boundaries, and is taking advantage of Amazon’s infrastructure. I can see why startups are using these services a lot – being able to start small and grow without the infrastructural pain is a serious bonus

   – Steve Streeting, a software developer based on a little rock called Guernsey, an island off the coast of Normandy, France.

II –  In other words, it’s great for backing up data,” Ippolito told SearchStorage, “but I wouldn’t recommend it for anything that needs to be on the public Internet or in continuous use.” Right now, the company is still using S3 for some backups and archives, but has looked elsewhere to store its primary content.

   –     Bob Ippolito, chief technology officer and co-founder for Mochi Media LLC, a Web company that serves advertisements into online video games for sponsors through Mochiads.com

II – “Nearly 7,000 miles away in Redmond, Washington, Microsoft wanted to expand its MSDN Direct Student Download program. “We needed a storage and delivery solution that made it simple, fast, and dependable for students in hundreds of countries around the world to download our software at any time,”

   –     Joe Wilson, Director of Academic Initiatives in the Developer Marketing division at Microsoft Corp

Chap-6: Disadvantages

  • Relies on your Internet connection

  • Relies on availability of Amazon’s network

  • Online payment in US$ by credit card – not so good if you are based in Europe.

  • As it is not WebDAV compatible, WebDAV clients cannot use it.

  • It’s non-standard authentication restricts secure access to AWS-enabled client libraries.

  • It’s non-standard authentication mechanism is brittle against client side clock problems…if the client’s clock is very out (or the client is configured to be in a different timezone from where it really is, a request may fail)

  • Dependent on Amazon AWS for providing high availability services; there has been one outage in February 2008, related to authentication service overload rather than S3 itself.

Who are Using Amazon S3

    II 

    Fetch Up Scalable Hosting