RSS

Most votes on amazon-web-services questions 9

Most votes on amazon-web-services questions 9. #81 Technically what is the difference between s3n, s3a and s3? #82 What is the difference between a task and a service in AWS ECS? #83 Can you attach Amazon EBS to multiple instances? #84 What is the difference between Elastic Beanstalk and CloudFormation for a .NET project? #85 Downloading folders from aws s3, cp or sync? #86 How can I use wildcards to cp a group of files with the AWS CLI #87 What is CPU Credit Balance in EC2? #88 Why do we need private subnet in VPC? #89 Nodejs AWS SDK S3 Generate Presigned URL #90 How to write a file or data to an S3 object using boto3

Read all the top votes questions and answers in a single page.

#81: Technically what is the difference between s3n, s3a and s3? (Score: 140)

Created: 2015-10-26 Last updated: 2020-06-20

Tags: amazon-web-services, amazon-s3, aws-sdk

I’m aware of the existence of https://wiki.apache.org/hadoop/AmazonS3 and the following words:

S3 Native FileSystem (URI scheme: s3n) A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3.

S3A (URI scheme: s3a) A successor to the S3 Native, s3n fs, the S3a: system uses Amazon’s libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema.

S3 Block FileSystem (URI scheme: s3) A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools.

Why a letter change on the URI could make such difference? For example

val data = sc.textFile("s3n://bucket-name/key")

to

val data = sc.textFile("s3a://bucket-name/key")

What is the technical difference underlying this change? Are there any good articles that I can read on this?

#81 Best answer 1 of Technically what is the difference between s3n, s3a and s3? (Score: 147)

Created: 2015-10-26 Last updated: 2018-10-04

The letter change on the URI scheme makes a big difference because it causes different software to be used to interface to S3. Somewhat like the difference between http and https - it’s only a one-letter change, but it triggers a big difference in behavior.

The difference between s3 and s3n/s3a is that s3 is a block-based overlay on top of Amazon S3, while s3n/s3a are not (they are object-based).

The difference between s3n and s3a is that s3n supports objects up to 5GB in size, while s3a supports objects up to 5TB and has higher performance (both are because it uses multi-part upload). s3a is the successor to s3n.

If you’re here because you want to understand which S3 file system you should use with Amazon EMR, then read this article from Amazon (only available on wayback machine). The net is: use s3:// because s3:// and s3n:// are functionally interchangeable in the context of EMR, while s3a:// is not compatible with EMR.

For additional advice, read Work with Storage and File Systems.

#81 Best answer 2 of Technically what is the difference between s3n, s3a and s3?(Score: 62)

Created: 2016-11-29 Last updated: 2018-01-11

in Apache Hadoop, “s3://” refers to the original S3 client, which used a non-standard structure for scalability. That library is deprecated and soon to be deleted,

s3n is its successor, which used direct path names to objects, so you can read and write data with other applications. Like s3://, it uses jets3t.jar to talk to S3.

On Amazon’s EMR service, s3:// refers to Amazon’s own S3 client, which is different. A path in s3:// on EMR refers directly to an object in the object store.

In Apache Hadoop, S3N and S3A are both connectors to S3, with S3A the successor built using Amazon’s own AWS SDK. Why the new name? so we could ship it side-by-side with the one which was stable. S3A is where all ongoing work on scalability, performance, security, etc, goes. S3N is left alone so we don’t break it. S3A shipped in Hadoop 2.6, but was still stabilising until 2.7, primarily with some minor scale problems surfacing.

If you are using Hadoop 2.7 or later, use s3a. If you are using Hadoop 2.5 or earlier. s3n, If you are using Hadoop 2.6, it’s a tougher choice. -I’d try s3a and switch back to s3n if there were problems-

For more of the history, see http://hortonworks.com/blog/history-apache-hadoops-support-amazon-s3/

2017-03-14 Update actually, partitioning is broken on S3a in Hadoop 2.6, as the block size returned in a listFiles() call is 0: things like Spark & pig partition the work into one task/byte. You cannot use S3a for analytics work in Hadoop 2.6, even if core filesystem operations & data generation is happy. Hadoop 2.7 fixes that.

2018-01-10 Update Hadoop 3.0 has cut its s3: and s3n implementations: s3a is all you get. It is now significantly better than its predecessor and performs as least as good as the Amazon implementation. Amazon’s “s3:” is still offered by EMR, which is their closed source client. Consult the EMR docs for more info.

See also original question in stackoverflow

#82: What is the difference between a task and a service in AWS ECS? (Score: 139)

Created: 2017-03-22 Last updated: 2019-05-09

Tags: amazon-web-services, amazon-ecs

It appears that one can either run a Task or a Service based on a Task Definition. What are the differences and similarities between Task and Service? Is there a clue in the fact that one can specify “Task Group” when creating Task but not Service? Are Task and Service hierarchically equal instantiations of Task Definition, or is Service composed of Tasks?

#82 Best answer 1 of What is the difference between a task and a service in AWS ECS? (Score: 291)

Created: 2017-03-22 Last updated: 2017-10-01

A Task Definition is a collection of 1 or more container configurations. Some Tasks may need only one container, while other Tasks may need 2 or more potentially linked containers running concurrently. The Task definition allows you to specify which Docker image to use, which ports to expose, how much CPU and memory to allot, how to collect logs, and define environment variables.

A Task is created when you run a Task directly, which launches container(s) (defined in the task definition) until they are stopped or exit on their own, at which point they are not replaced automatically. Running Tasks directly is ideal for short running jobs, perhaps as an example things that were accomplished via CRON.

A Service is used to guarantee that you always have some number of Tasks running at all times. If a Task’s container exits due to error, or the underlying EC2 instance fails and is replaced, the ECS Service will replace the failed Task. This is why we create Clusters so that the Service has plenty of resources in terms of CPU, Memory and Network ports to use. To us it doesn’t really matter which instance Tasks run on so long as they run. A Service configuration references a Task definition. A Service is responsible for creating Tasks.

Services are typically used for long running applications like web servers. For example, if I deployed my website powered by Node.JS in Oregon (us-west-2) I would want say at least three Tasks running across the three Availability Zones (AZ) for the sake of High-Availability; if one fails I have another two and the failed one will be replaced (read that as self-healing!). Creating a Service is the way to do this. If I had 6 EC2 instances in my cluster, 2 per AZ, the Service will automatically balance Tasks across zones as best it can while also considering cpu, memory, and network resources.

UPDATE:

I’m not sure it helps to think of these things hierarchically.

Another very important point is that a Service can be configured to use a load balancer, so that as it creates the Tasks—that is it launches containers defined in the Task Defintion—the Service will automatically register the container’s EC2 instance with the load balancer. Tasks cannot be configured to use a load balancer, only Services can.

#82 Best answer 2 of What is the difference between a task and a service in AWS ECS?(Score: 50)

Created: 2018-10-10

Beautifully explained in words by @talentedmrjones. Picture below will help you visualize it easily :)

Cluster, Service, EC2 Instance and Task in action

See also original question in stackoverflow

#83: Can you attach Amazon EBS to multiple instances? (Score: 139)

Created: 2009-05-08 Last updated: 2011-06-20

Tags: linux, file, amazon-ec2, amazon-web-services

We currently use multiple webservers accessing one mysql server and fileserver. Looking at moving to the cloud, can I use this same setup and attach the EBS to multiple machine instances or what’s another solution?

#83 Best answer 1 of Can you attach Amazon EBS to multiple instances? (Score: 136)

Created: 2010-10-13 Last updated: 2015-06-03

UPDATE (April 2015): For this use-case, you should start looking at the new Amazon Elastic File System (EFS), which is designed to be multiply attached in exactly the way you are wanting. The key difference between EFS and EBS is that they provide different abstractions: EFS exposes the NFSv4 protocol, whereas EBS provides raw block IO access.

Below you’ll find my original explanation as to why it’s not possible to safely mount a raw block device on multiple machines.


ORIGINAL POST (2011):

Even if you were able to get an EBS volume attached to more than one instance, it would be a REALLY_BAD_IDEA. To quote Kekoa, “this is like using a hard drive in two computers at once”

Why is this a bad idea? … The reason you can’t attach a volume to more than one instance is that EBS provides a “block storage” abstraction upon which customers run a filesystem like ext2/ext3/etc. Most of these filesystems (eg, ext2/3, FAT, NTFS, etc) are written assuming they have exclusive access to the block device. Two instances accessing the same filesystem would almost certainly end in tears and data corruption.

In other words, double mounting an EBS volume would only work if you were running a cluster filesystem that is designed to share a block device between multiple machines. Furthermore, even this wouldn’t be enough. EBS would need to be tested for this scenario and to ensure that it provides the same consistency guarantees as other shared block device solutions … ie, that blocks aren’t cached at intermediate non-shared levels like the Dom0 kernel, Xen layer, and DomU kernel. And then there’s the performance considerations of synchronizing blocks between multiple clients - most of the clustered filesystems are designed to work on high speed dedicated SANs, not a best-effort commodity ethernet. It sounds so simple, but what you are asking for is a very nontrivial thing.

Alternatively, see if your data sharing scenario can be NFS, SMB/CIFS, SimpleDB, or S3. These solutions all use higher layer protocols that are intended to share files without having a shared block device subsystem. Many times such a solution is actually more efficient.

In your case, you can still have a single MySql instance / fileserver that is accessed by multiple web front-ends. That fileserver could then store it’s data on an EBS volume, allowing you to take nightly snapshot backups. If the instance running the fileserver is lost, you can detach the EBS volume and reattach it to a new fileserver instance and be back up and running in minutes.

“Is there anything like S3 as a filesystem?" - yes and no. Yes, there are 3rd party solutions like s3fs that work “ok”, but under the hood they still have to make relatively expensive web service calls for each read / write. For a shared tools dir, works great. For the kind of clustered FS usage you see in the HPC world, not a chance. To do better, you’d need a new service that provides a binary connection-oriented protocol, like NFS. Offering such a multi-mounted filesystem with reasonable performance and behavior would be a GREAT feature add-on for EC2. I’ve long been an advocate for Amazon to build something like that.

#83 Best answer 2 of Can you attach Amazon EBS to multiple instances?(Score: 80)

Created: 2009-05-08 Last updated: 2020-04-17

Update (2020) It is now possible!

This is possible now with the newest instance types running in AWS Nitro within the same Availability Zone. There are some caveats but this is great for certain use cases that need the speed of EBS and where EFS isn’t feasible.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volumes-multi.html


Original Post (2009)

No, this is like using a hard drive in two computers.

If you want shared data, you can setup a server that all your instances can access. If you are wanting a simple storage area for all your instances, you can use Amazon’s S3 storage service to store data that is distributed and scalable.

Moving to the cloud, you can have the exact same setup, but you can possibly replace the fileserver with S3, or have all your instances connect to your fileserver.

You have a lot of options, but sharing a hard drive between instances is probably not the best option.

See also original question in stackoverflow

#84: What is the difference between Elastic Beanstalk and CloudFormation for a .NET project? (Score: 138)

Created: 2013-01-20 Last updated: 2013-01-20

Tags: amazon-web-services, amazon-elastic-beanstalk, amazon-cloudformation

I have developed a .NET MVC application and have started playing around with AWS and deploying it via the Visual Studio Toolkit. I have successfully deployed the application using the Elastic Beanstalk option in the toolkit.

As I was going over the tutorials for deploying .NET apps to AWS with the toolkit, I noticed there are tutorials for deploying with both Elastic Beanstalk and CloudFormation. What is the difference between these two?

From what I can tell, it seems like they both essentially are doing the same thing - making it easier to deploy your application to the AWS cloud (setting up EC2 instances, load balancer, auto-scaling, etc). I have tried reading up on them both, but I can’t seem to get anything other than a bunch of buzz-words that sound like the same thing to me. I even found an FAQ on the AWS website that is supposed to answer this exact question, yet I don’t really understand.

Should I be using one or the other? Both?

#84 Best answer 1 of What is the difference between Elastic Beanstalk and CloudFormation for a .NET project? (Score: 250)

Created: 2013-01-20

They’re actually pretty different. Elastic Beanstalk is intended to make developers' lives easier. CloudFormation is intended to make systems engineers' lives easier.

Elastic Beanstalk is a PaaS-like layer ontop of AWS’s IaaS services which abstracts away the underlying EC2 instances, Elastic Load Balancers, auto scaling groups, etc. This makes it a lot easier for developers, who don’t want to be dealing with all the systems stuff, to get their application quickly deployed on AWS. It’s very similar to other PaaS products such as Heroku, EngineYard, Google App Engine, etc. With Elastic Beanstalk, you don’t need to understand how any of the underlying magic works.

CloudFormation, on the other hand, doesn’t automatically do anything. It’s simply a way to define all the resources needed for deployment in a huge JSON file. So a CloudFormation template might actually create two ElasticBeanstalk environments (production and staging), a couple of ElasticCache clusters, a DyanmoDB table, and then the proper DNS in Route53. I then upload this template to AWS, walk away, and 45 minutes later everything is ready and waiting. Since it’s just a plain-text JSON file, I can stick it in my source control which provides a great way to version my application deployments. It also ensures that I have a repeatable, “known good” configuration that I can quickly deploy in a different region.

#84 Best answer 2 of What is the difference between Elastic Beanstalk and CloudFormation for a .NET project?(Score: 66)

Created: 2017-01-04 Last updated: 2020-06-20

For getting started quickly deploying a standard .NET web-application, Elastic Beanstalk is the right service for you.

App Services Comparison Graphic

AWS CloudFormation: “Template-Driven Provisioning”

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.

CloudFormation (CFn) is a lightweight, low-level abstraction over existing AWS APIs. Using a static JSON/YAML template document, you declare a set of Resources (such as an EC2 instance or an S3 bucket) that correspond to CRUD operations on the AWS APIs.

When you create a CloudFormation stack, CloudFormation calls the corresponding APIs to create the associated Resources, and when you delete a stack, CloudFormation calls the corresponding APIs to delete them. Most (but not all) AWS APIs are supported.

AWS Elastic Beanstalk: “Web Apps Made Easy”

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring.

Elastic Beanstalk (EB) is a higher-level, managed ‘platform as a service’ (PaaS) for hosting web applications, similar in scope to Heroku. Rather than deal with low-level AWS resources directly, EB provides a fully-managed platform where you create an application environment using a web interface, select which platform your application uses, create and upload a source bundle, and EB handles the rest.

Using EB, you get all sorts of built-in features for monitoring your application environment and deploying new versions of your application.

Under the hood, EB uses CloudFormation to create and manage the application’s various AWS resources. You can customize and extend the default EB environment by adding CloudFormation Resources to an EB configuration file deployed with your application.

Conclusion

If your application is a standard web-tier application using one of Elastic Beanstalk’s supported platforms, and you want easy-to-manage, highly-scalable hosting for your application, use Elastic Beanstalk.

If you:

  • Want to manage all of your application’s AWS resources directly;
  • Want to manage or heavily customize your instance-provisioning or deployment process;
  • Need to use an application platform not supported by Elastic Beanstalk; or
  • Just don’t want/need any of the higher-level Elastic Beanstalk features

then use CloudFormation directly and avoid the added configuration layer of Elastic Beanstalk.

See also original question in stackoverflow

#85: Downloading folders from aws s3, cp or sync? (Score: 137)

Created: 2015-01-13 Last updated: 2017-10-31

Tags: windows, amazon-web-services, amazon-s3

If I want to download all the contents of a directory on S3 to my local PC, which command should I use cp or sync ?

Any help would be highly appreciated.

For example,

if I want to download all the contents of “this folder” to my desktop, would it look like this ?

 aws s3 sync s3://"myBucket"/"this folder" C:\\Users\Desktop

#85 Best answer 1 of Downloading folders from aws s3, cp or sync? (Score: 249)

Created: 2015-01-14 Last updated: 2016-10-24

Using aws s3 cp from the AWS Command-Line Interface (CLI) will require the --recursive parameter to copy multiple files.

aws s3 cp s3://myBucket/dir localdir --recursive

The aws s3 sync command will, by default, copy a whole directory. It will only copy new/modified files.

aws s3 sync s3://mybucket/dir localdir

Just experiment to get the result you want.

Documentation:

#85 Best answer 2 of Downloading folders from aws s3, cp or sync?(Score: 4)

Created: 2017-08-08 Last updated: 2017-08-08

In the case you want to download a single file, you can try the following command:

aws s3 cp s3://bucket/filename /path/to/dest/folder

See also original question in stackoverflow

#86: How can I use wildcards to cp a group of files with the AWS CLI (Score: 136)

Created: 2016-08-08 Last updated: 2016-12-13

Tags: amazon-web-services, amazon-s3, aws-cli

I’m having trouble using * in the AWS CLI to select a subset of files from a certain bucket.

Adding * to the path like this does not seem to work

aws s3 cp s3://data/2016-08* .

#86 Best answer 1 of How can I use wildcards to cp a group of files with the AWS CLI (Score: 235)

Created: 2016-08-08 Last updated: 2021-01-21

To download multiple files from an aws bucket to your current directory, you can use recursive, exclude, and include flags. The order of the parameters matters.

Example command:

aws s3 cp s3://data/ . --recursive --exclude "*" --include "2016-08*"

For more info on how to use these filters: http://docs.aws.amazon.com/cli/latest/reference/s3/#use-of-exclude-and-include-filters

#86 Best answer 2 of How can I use wildcards to cp a group of files with the AWS CLI(Score: 67)

Created: 2017-10-12

The Order of the Parameters Matters

The exclude and include should be used in a specific order, We have to first exclude and then include. The viceversa of it will not be successful.

aws s3 cp s3://data/ . --recursive  --include "2016-08*" --exclude "*" 

This will fail because order of the parameters maters in this case. The include is excluded by the *

aws s3 cp s3://data/ . --recursive --exclude "*" --include "2016-08*"`

This one will work because the we excluded everything but later we had included the specific directory.

See also original question in stackoverflow

#87: What is CPU Credit Balance in EC2? (Score: 136)

Created: 2015-03-11 Last updated: 2017-09-10

Tags: amazon-web-services, amazon-ec2

I came across CPU Credit Balance in EC2 monitoring . What is CPU Credit Balance?

#87 Best answer 1 of What is CPU Credit Balance in EC2? (Score: 246)

Created: 2015-04-06 Last updated: 2018-11-21

AWS EC2 has 2 different type of instances: Fixed Performance Instances(e.g. M3, C3 etc) and Burstable Performance Instances (e.g. T2). Fixed Performance Instances provides a consistent CPU performance whereas Burstable Performance Instances provide a baseline CPU performance under normal workload. But when the workload increases Burstable Performance Instances have the ability to burst, i.e. increase the CPU performance.

CPU Credit regulates the amount CPU burst of an instance. You can spend this CPU Credit to increase the CPU performance during the Burst period. Suppose you are operating the instance at 100% of CPU performance for 5 minutes, you will spend 5(i.e. 5*1.0) CPU Credit. Similarly if you run an instance at 50% CPU performance for 5 minutes you will spend 2.5(i.e. 5*0.5) CPU Credits.

CPU Credit Balance is simply the amount of CPU Credit available in your account at any moment.

When you create an instance you will get an initial CPU Credit. In every hour you will get certain amount of CPU credits automatically(this amount depends on the type of instance). If you don’t burst the CPU performance the CPU Credit will be added to your CPU Credit Balance of your account. If you are out of CPU Credit(i.e. CPU Credit Balance turns into 0) your instance will work on baseline performance.

Read more on CPU Credits and Baseline Performance for Burstable Performance Instances

#87 Best answer 2 of What is CPU Credit Balance in EC2?(Score: 26)

Created: 2015-04-28

According to official document:

Amazon EC2 allows you to choose between Fixed Performance Instances (e.g. M3, C3, and R3) and Burstable Performance Instances (e.g. T2). Burstable Performance Instances provide a baseline level of CPU performance with the ability to burst above the baseline. T2 instances are for workloads that don’t use the full CPU often or consistently, but occasionally need to burst.

T2 instances’ baseline performance and ability to burst are governed by CPU Credits. Each T2 instance receives CPU Credits continuously, the rate of which depends on the instance size. T2 instances accrue CPU Credits when they are idle, and use CPU credits when they are active. A CPU Credit provides the performance of a full CPU core for one minute.

See also original question in stackoverflow

#88: Why do we need private subnet in VPC? (Score: 134)

Created: 2014-03-05 Last updated: 2017-10-20

Tags: amazon-web-services, amazon-vpc, vpc

There are 4 scenarios in AWS VPC configure. But let’s look at these two:

  • Scenario 1: 1 public subnet.
  • Scenario 2: 1 public subnet and 1 private subnet.

Since any instance launched in public subnet does not have EIP (unless it’s assigned), it is already not addressable from the Internet. Then:

  • Why is there a need for private subnet?
  • What exactly are the differences between private and public subnets?

#88 Best answer 1 of Why do we need private subnet in VPC? (Score: 247)

Created: 2014-03-05 Last updated: 2016-10-13

Update: in late December, 2015, AWS announced a new feature, a Managed NAT Gateway for VPC. This optional service provides an alternative mechanism for VPC instances in a private subnet to access the Internet, where previously, the common solution was an EC2 instance on a public subnet within the VPC, functioning as a “NAT instance,” providing network address translation (technically, port address translation) for instances in other, private subnets, allowing those machines to use the NAT instance’s public IP address for their outbound Internet access.

The new managed NAT service does not fundamentally change the applicability of the following information, but this option is not addressed in the content that follows. A NAT instance can still be used as described, or the Managed NAT Gateway service can be provisioned, instead. An expanded version of this answer integrating more information about NAT Gateway and how it compares to a NAT instance will be forthcoming, as these are both relevant to the private/public subnet paradigm in VPC.

Note that the Internet Gateway and NAT Gateway are two different features. All VPC configurations with Internet access will have an Internet Gateway virtual object.


To understand the distinction between “private” and “public” subnets in Amazon VPC requires an understanding of how IP routing and network address translation (NAT) work in general, and how it they are specifically implemented in VPC.

The core differentiation between a public and private subnet in VPC is defined by what that subnet’s default route is, in the VPC routing tables.

This configuration, in turn, dictates the validity of using, or not using, public IP addresses on instances on that particular subnet.

Each subnet has exactly one default route, which can be only one of two things:

  • the VPC’s “Internet Gateway” object, in the case of a “public” subnet, or
  • a NAT device – that is, either a NAT Gateway or an EC2 instance, performing the “NAT instance” role, in the case of a “private” subnet.

The Internet Gateway does not do any network address translation for instances without public IP addresses so an instance without a public IP address cannot connect outward to the Internet – to do things like downloading software updates, or accessing other AWS resources like S31 and SQS – if the default route on its VPC subnet is the Internet Gateway object. So, if you are an instance on a “public” subnet, then you need a public IP address in order to do a significant number of things that servers commonly need to do.

For instances with only a private IP address, there’s an alternate way of outbound access to the Internet. This is where Network Address Translation² and a NAT instance come in.

The machines on a private subnet can access the Internet because the default route on a private subnet is not the VPC “Internet Gateway” object – it is an EC2 instance configured as a NAT instance.

A NAT instance is an instance on a public subnet with a public IP, and specific configuration. There are AMIs that are pre-built to do this, or your can build your own.

When the private-addressed machines send traffic outward, the traffic is sent, by VPC, to the NAT instance, which replaces the source IP address on the packet (the private machine’s private IP address) with its own public IP address, sends the traffic out to the Internet, accepts the response packets, and forwards them back to the private address of the originating machine. (It may also rewrite the source port, and in any case, it remembers the mappings so it knows which internal machine should receive the response packets). A NAT instance does not allow any “unexpected” inbound traffic to reach the private instances, unless it’s been specifically configured to do so.

Thus, when accessing external Internet resource from a private subnet, the traffic traverses the NAT instance, and appears to the destination to have originated from the public IP address of the NAT instance… so the response traffic comes back to the NAT instance. Neither the security group assigned to the NAT instance nor the security group assigned to the private instance need to be configured to “allow” this response traffic, because security groups are stateful. They realize the response traffic is correlated to sessions originated internally, so it is automatically allowed. Unexpected traffic is, of course, denied unless the security group is configured to permit it.

Unlike conventional IP routing, where your default gateway is on your same subnet, the way it works in VPC is different: the NAT instance for any given private subnet is always on a different subnet, and that other subnet is always a public subnet, because the NAT instance needs to have a public external IP, and its default gateway has to be the VPC “Internet Gateway” object.

Similarly… you cannot deploy an instance with a public IP on a private subnet. It doesn’t work, because the default route on a private subnet is (by definition) a NAT instance (which performs NAT on the traffic), and not the Internet Gateway object (which doesn’t). Inbound traffic from the Internet would hit the public IP of the instance, but the replies would try to route outward through the NAT instance, which would either drop the traffic (since it would be composed of replies to connections it’s not aware of, so they’d be deemed invalid) or would rewrite the reply traffic to use its own public IP address, which wouldn’t work since the external origin would not accept replies that came from an IP address other than the one they were trying to initiate communications with.

In essence, then, the “private” and “public” designations are not really about accessibility or inaccessibility from the Internet. They are about the kinds of addresses that will be assigned to the instances on that subnet, which is relevant because of the need to translate – or avoid translating – those IP addresses for Internet interactions.

Since VPC has implicit routes from all VPC subnets to all other VPC subnets, the default route does not play a role in internal VPC traffic. Instances with private IP addresses will connect to other private IP addresses in the VPC “from” their private IP address, not “from” their public IP address (if they have one)… as long as the destination address is another private address within the VPC.

If your instances with private IP addresses never, under any circumstances, need to originate outbound Internet traffic, then they technically could be deployed on a “public” subnet and would still still be inaccessible from the Internet… but under such a configuration, it is impossible for them to originate outbound traffic towards the Internet, which includes connections with other AWS infrastructure services, again, like S31 or SQS.


1. Regarding S3, specifically, to say that Internet access is always required is an oversimplification that will likely grow in scope over time and spread to other AWS services, as the capabilities of VPC continue to grow and evolve. There is a relatively new concept called a VPC Endpoint that allows your instances, including those with only private IP addresses, to directly access S3 from selected subnets within the VPC, without touching “the Internet,” and without using a NAT instance or NAT gateway, but this does require additional configuration, and is only usable to access buckets within the same AWS region as your VPC. By default, S3 – which is, as of this writing, the only service that has exposed the capability of creating VPC endpoints – is only accessible from inside VPC via the Internet. When you create a VPC endpoint, this creates a prefix list (pl-xxxxxxxx) that you can use in your VPC route tables to send traffic bound for that particular AWS service direct to the service via the virtual “VPC Endpoint” object. It also solves a problem of restricting outbound access to S3 for particular instance, because the prefix list can be used in outbound security groups, in place of a destination IP address or block – and an S3 VPC endpoint can be subject to additional policy statements, restricting bucket access from inside, as desired.

2. As noted in the documentation, what’s actually being discussed here is port as well as network address translation. It’s common, though technically a bit imprecise, to refer to the combined operation as “NAT.” This is somewhat akin to the way many of us tend to say “SSL” when we actually mean “TLS.” We know what we’re talking about, but we don’t use the most correct word to describe it. “Note We use the term NAT in this documentation to follow common IT practice, though the actual role of a NAT device is both address translation and port address translation (PAT)."

#88 Best answer 2 of Why do we need private subnet in VPC?(Score: 28)

Created: 2016-10-24

I’d suggest a different tack - ditch “private” subnets and NAT instances / gateways. They aren’t necessary. If you don’t want the machine to be accessible from the internet, don’t put it in a security group that allows such access.

By ditching the NAT instance / gateway, you are eliminating the running cost of the instance / gateway, and you eliminate the speed limit (be it 250mbit or 10gbit).

If you have a machine that also does not need to access the internet directly, (and I would ask how you are patching it*), then by all means, don’t assign a public IP address.

*If the answer here is some kind of proxy, well, you’re incurring an overhead, but each to his own.

See also original question in stackoverflow

#89: Nodejs AWS SDK S3 Generate Presigned URL (Score: 131)

Created: 2016-08-08 Last updated: 2020-05-29

Tags: node.js, amazon-web-services, amazon-s3, aws-sdk-js

I am using the NodeJS AWS SDK to generate a presigned S3 URL. The docs give an example of generating a presigned URL.

Here is my exact code (with sensitive info omitted):

const AWS = require('aws-sdk')

const s3 = new AWS.S3()
AWS.config.update({accessKeyId: 'id-omitted', secretAccessKey: 'key-omitted'})

// Tried with and without this. Since s3 is not region-specific, I don't
// think it should be necessary.
// AWS.config.update({region: 'us-west-2'})

const myBucket = 'bucket-name'
const myKey = 'file-name.pdf'
const signedUrlExpireSeconds = 60 * 5

const url = s3.getSignedUrl('getObject', {
    Bucket: myBucket,
    Key: myKey,
    Expires: signedUrlExpireSeconds
})

console.log(url)

The URL that generates looks like this:

https://bucket-name.s3-us-west-2.amazonaws.com/file-name.pdf?AWSAccessKeyId=[access-key-omitted]&Expires=1470666057&Signature=[signature-omitted]

I am copying that URL into my browser and getting the following response:

<Error>
  <Code>NoSuchBucket</Code>
  <Message>The specified bucket does not exist</Message>
  <BucketName>[bucket-name-omitted]</BucketName>
  <RequestId>D1A358D276305A5C</RequestId>
  <HostId>
    bz2OxmZcEM2173kXEDbKIZrlX508qSv+CVydHz3w6FFPFwC0CtaCa/TqDQYDmHQdI1oMlc07wWk=
  </HostId>
</Error>

I know the bucket exists. When I navigate to this item via the AWS Web GUI and double click on it, it opens the object with URL and works just fine:

https://s3-us-west-2.amazonaws.com/[bucket-name-omitted]/[file-name-omitted].pdf?X-Amz-Date=20160808T141832Z&X-Amz-Expires=300&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Signature=[signature-omitted]&X-Amz-Credential=ASIAJKXDBR5CW3XXF5VQ/20160808/us-west-2/s3/aws4_request&X-Amz-SignedHeaders=Host&x-amz-security-token=[really-long-key]

So I am led to believe that I must be doing something wrong with how I’m using the SDK.

#89 Best answer 1 of Nodejs AWS SDK S3 Generate Presigned URL (Score: 121)

Created: 2016-08-08 Last updated: 2019-07-05

Dustin,

Your code is correct, double check following:

  1. Your bucket access policy.

  2. Your bucket permission via your API key.

  3. Your API key and secret.

  4. Your bucket name and key.

#89 Best answer 2 of Nodejs AWS SDK S3 Generate Presigned URL(Score: 6)

Created: 2020-09-10

Since this question is very popular and the most popular answer is saying your code is correct, but there is a bit of problem in the code which might lead a frustrating problem. So, here is a working code

    AWS.config.update({ 
        accessKeyId: ':)))',
        secretAccessKey: ':DDDD',
        region: 'ap-south-1',
        signatureVersion: 'v4'
    });

    const s3 = new AWS.S3()
    const myBucket = ':)))))'
    const myKey = ':DDDDDD'
    const signedUrlExpireSeconds = 60 * 5

    const url = s3.getSignedUrl('getObject', {
        Bucket: myBucket,
        Key: myKey,
        Expires: signedUrlExpireSeconds
    });

    console.log(url);

The noticeable difference is the s3 object is created after the config update, without this the config is not effective and the generated url doesn’t work.

See also original question in stackoverflow

#90: How to write a file or data to an S3 object using boto3 (Score: 128)

Created: 2016-10-31 Last updated: 2017-10-24

Tags: python, amazon-web-services, amazon-s3, boto, boto3

In boto 2, you can write to an S3 object using these methods:

Is there a boto 3 equivalent? What is the boto3 method for saving data to an object stored on S3?

#90 Best answer 1 of How to write a file or data to an S3 object using boto3 (Score: 255)

Created: 2016-10-31 Last updated: 2020-06-20

In boto 3, the ‘Key.set_contents_from_’ methods were replaced by

For example:

import boto3

some_binary_data = b'Here we have some data'
more_binary_data = b'Here we have some more data'

# Method 1: Object.put()
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)

# Method 2: Client.put_object()
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')

Alternatively, the binary data can come from reading a file, as described in the official docs comparing boto 2 and boto 3:

Storing Data

Storing data from a file, stream, or string is easy:

# Boto 2.x
from boto.s3.key import Key
key = Key('hello.txt')
key.set_contents_from_file('/tmp/hello.txt')

# Boto 3
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))

#90 Best answer 2 of How to write a file or data to an S3 object using boto3(Score: 63)

Created: 2018-03-08 Last updated: 2020-11-22

boto3 also has a method for uploading a file directly:

s3 = boto3.resource('s3')    
s3.Bucket('bucketname').upload_file('/local/file/here.txt','folder/sub/path/to/s3key')

http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.upload_file

See also original question in stackoverflow


Notes:
  1. This page use API to get the relevant data from stackoverflow community.
  2. Content license on this page is CC BY-SA 3.0.
  3. score = up votes - down votes.