astrobase.awsutils module

This contains functions that handle various AWS services for use with lcproc_aws.py.

astrobase.awsutils.ec2_ssh(ip_address, keypem_file, username='ec2-user', raiseonfail=False)[source]

This opens an SSH connection to the EC2 instance at ip_address.

Parameters:
  • ip_address (str) – IP address of the AWS EC2 instance to connect to.
  • keypem_file (str) – The path to the keypair PEM file generated by AWS to allow SSH connections.
  • username (str) – The username to use to login to the EC2 instance.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

This has all the usual paramiko functionality:

  • Use SSHClient.exec_command(command, environment=None) to exec a shell command.
  • Use SSHClient.open_sftp() to get a SFTPClient for the server. Then call SFTPClient.get() and .put() to copy files from and to the server.

Return type:

paramiko.SSHClient

astrobase.awsutils.s3_get_file(bucket, filename, local_file, altexts=None, client=None, raiseonfail=False)[source]

This gets a file from an S3 bucket.

Parameters:
  • bucket (str) – The AWS S3 bucket name.
  • filename (str) – The full filename of the file to get from the bucket
  • local_file (str) – Path to where the downloaded file will be stored.
  • altexts (None or list of str) – If not None, this is a list of alternate extensions to try for the file other than the one provided in filename. For example, to get anything that’s an .sqlite where .sqlite.gz is expected, use altexts=[‘’] to strip the .gz.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

Path to the downloaded filename or None if the download was unsuccessful.

Return type:

str

astrobase.awsutils.s3_get_url(url, altexts=None, client=None, raiseonfail=False)[source]

This gets a file from an S3 bucket based on its s3:// URL.

Parameters:
  • url (str) – S3 URL to download. This should begin with ‘s3://’.
  • altexts (None or list of str) – If not None, this is a list of alternate extensions to try for the file other than the one provided in filename. For example, to get anything that’s an .sqlite where .sqlite.gz is expected, use altexts=[‘’] to strip the .gz.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

Path to the downloaded filename or None if the download was unsuccessful. The file will be downloaded into the current working directory and will have a filename == basename of the file on S3.

Return type:

str

astrobase.awsutils.s3_put_file(local_file, bucket, client=None, raiseonfail=False)[source]

This uploads a file to S3.

Parameters:
  • local_file (str) – Path to the file to upload to S3.
  • bucket (str) – The AWS S3 bucket to upload the file to.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

If the file upload is successful, returns the s3:// URL of the uploaded file. If it failed, will return None.

Return type:

str or None

astrobase.awsutils.s3_delete_file(bucket, filename, client=None, raiseonfail=False)[source]

This deletes a file from S3.

Parameters:
  • bucket (str) – The AWS S3 bucket to delete the file from.
  • filename (str) – The full file name of the file to delete, including any prefixes.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

If the file was successfully deleted, will return the delete-marker (https://docs.aws.amazon.com/AmazonS3/latest/dev/DeleteMarker.html). If it wasn’t, returns None

Return type:

str or None

astrobase.awsutils.sqs_create_queue(queue_name, options=None, client=None)[source]

This creates an SQS queue.

Parameters:
  • queue_name (str) – The name of the queue to create.
  • options (dict or None) – A dict of options indicate extra attributes the queue should have. See the SQS docs for details. If None, no custom attributes will be attached to the queue.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
Returns:

This returns a dict of the form:

{'url': SQS URL of the queue,
 'name': name of the queue}

Return type:

dict

astrobase.awsutils.sqs_delete_queue(queue_url, client=None)[source]

This deletes an SQS queue given its URL

Parameters:
  • queue_url (str) – The SQS URL of the queue to delete.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
Returns:

True if the queue was deleted successfully. False otherwise.

Return type:

bool

astrobase.awsutils.sqs_put_item(queue_url, item, delay_seconds=0, client=None, raiseonfail=False)[source]

This pushes a dict serialized to JSON to the specified SQS queue.

Parameters:
  • queue_url (str) – The SQS URL of the queue to push the object to.
  • item (dict) – The dict passed in here will be serialized to JSON.
  • delay_seconds (int) – The amount of time in seconds the pushed item will be held before going ‘live’ and being visible to all queue consumers.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

If the item was successfully put on the queue, will return the response from the service. If it wasn’t, will return None.

Return type:

boto3.Response or None

astrobase.awsutils.sqs_get_item(queue_url, max_items=1, wait_time_seconds=5, client=None, raiseonfail=False)[source]

This gets a single item from the SQS queue.

The queue_url is composed of some internal SQS junk plus a queue_name. For our purposes (lcproc_aws.py), the queue name will be something like:

lcproc_queue_<action>

where action is one of:

runcp
runpf

The item is always a JSON object:

{'target': S3 bucket address of the file to process,
 'action': the action to perform on the file ('runpf', 'runcp', etc.)
 'args': the action's args as a tuple (not including filename, which is
         generated randomly as a temporary local file),
 'kwargs': the action's kwargs as a dict,
 'outbucket: S3 bucket to write the result to,
 'outqueue': SQS queue to write the processed item's info to (optional)}

The action MUST match the <action> in the queue name for this item to be processed.

Parameters:
  • queue_url (str) – The SQS URL of the queue to get messages from.
  • max_items (int) – The number of items to pull from the queue in this request.
  • wait_time_seconds (int) – This specifies how long the function should block until a message is received on the queue. If the timeout expires, an empty list will be returned. If the timeout doesn’t expire, the function will return a list of items received (up to max_items).
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

For each item pulled from the queue in this request (up to max_items), a dict will be deserialized from the retrieved JSON, containing the message items and various metadata. The most important item of the metadata is the receipt_handle, which can be used to acknowledge receipt of all items in this request (see sqs_delete_item below).

If the queue pull fails outright, returns None. If no messages are available for this queue pull, returns an empty list.

Return type:

list of dicts or None

astrobase.awsutils.sqs_delete_item(queue_url, receipt_handle, client=None, raiseonfail=False)[source]

This deletes a message from the queue, effectively acknowledging its receipt.

Call this only when all messages retrieved from the queue have been processed, since this will prevent redelivery of these messages to other queue workers pulling fromn the same queue channel.

Parameters:
  • queue_url (str) – The SQS URL of the queue where we got the messages from. This should be the same queue used to retrieve the messages in sqs_get_item.
  • receipt_handle (str) – The receipt handle of the queue message that we’re responding to, and will acknowledge receipt of. This will be present in each message retrieved using sqs_get_item.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

Return type:

Nothing.

astrobase.awsutils.make_ec2_nodes(security_groupid, subnet_id, keypair_name, iam_instance_profile_arn, launch_instances=1, ami='ami-04681a1dbd79675a5', instance='t3.micro', ebs_optimized=True, user_data=None, wait_until_up=True, client=None, raiseonfail=False)[source]

This makes new EC2 worker nodes.

This requires a security group ID attached to a VPC config and subnet, a keypair generated beforehand, and an IAM role ARN for the instance. See:

https://docs.aws.amazon.com/cli/latest/userguide/tutorial-ec2-ubuntu.html

Use user_data to launch tasks on instance launch.

Parameters:
  • security_groupid (str) – The security group ID of the AWS VPC where the instances will be launched.
  • subnet_id (str) – The subnet ID of the AWS VPC where the instances will be launched.
  • keypair_name (str) – The name of the keypair to be used to allow SSH access to all instances launched here. This corresponds to an already downloaded AWS keypair PEM file.
  • iam_instance_profile_arn (str) – The ARN string corresponding to the AWS instance profile that describes the permissions the launched instances have to access other AWS resources. Set this up in AWS IAM.
  • launch_instances (int) – The number of instances to launch in this request.
  • ami (str) – The Amazon Machine Image ID that describes the OS the instances will use after launch. The default ID is Amazon Linux 2 in the US East region.
  • instance (str) – The instance type to launch. See the following URL for a list of IDs: https://aws.amazon.com/ec2/pricing/on-demand/
  • ebs_optimized (bool) – If True, will enable EBS optimization to speed up IO. This is usually True for all instances made available in the last couple of years.
  • user_data (str or None) – This is either the path to a file on disk that contains a shell-script or a string containing a shell-script that will be executed by root right after the instance is launched. Use to automatically set up workers and queues. If None, will not execute anything at instance start up.
  • wait_until_up (bool) – If True, will not return from this function until all launched instances are verified as running by AWS.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

Returns launched instance info as a dict, keyed by instance ID.

Return type:

dict

astrobase.awsutils.delete_ec2_nodes(instance_id_list, client=None)[source]

This deletes EC2 nodes and terminates the instances.

Parameters:
  • instance_id_list (list of str) – A list of EC2 instance IDs to terminate.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
Returns:

Return type:

Nothing.

astrobase.awsutils.make_spot_fleet_cluster(security_groupid, subnet_id, keypair_name, iam_instance_profile_arn, spot_fleet_iam_role, target_capacity=20, spot_price=0.4, expires_days=7, allocation_strategy='lowestPrice', instance_types=['m5.xlarge', 'm5.2xlarge', 'c5.xlarge', 'c5.2xlarge', 'c5.4xlarge'], instance_weights=None, instance_ami='ami-04681a1dbd79675a5', instance_user_data=None, instance_ebs_optimized=True, wait_until_up=True, client=None, raiseonfail=False)[source]

This makes an EC2 spot-fleet cluster.

This requires a security group ID attached to a VPC config and subnet, a keypair generated beforehand, and an IAM role ARN for the instance. See:

https://docs.aws.amazon.com/cli/latest/userguide/tutorial-ec2-ubuntu.html

Use user_data to launch tasks on instance launch.

Parameters:
  • security_groupid (str) – The security group ID of the AWS VPC where the instances will be launched.
  • subnet_id (str) – The subnet ID of the AWS VPC where the instances will be launched.
  • keypair_name (str) – The name of the keypair to be used to allow SSH access to all instances launched here. This corresponds to an already downloaded AWS keypair PEM file.
  • iam_instance_profile_arn (str) – The ARN string corresponding to the AWS instance profile that describes the permissions the launched instances have to access other AWS resources. Set this up in AWS IAM.
  • spot_fleet_iam_role (str) – This is the name of AWS IAM role that allows the Spot Fleet Manager to scale up and down instances based on demand and instances failing, etc. Set this up in IAM.
  • target_capacity (int) – The number of instances to target in the fleet request. The fleet manager service will attempt to maintain this number over the lifetime of the Spot Fleet Request.
  • spot_price (float) – The bid price in USD for the instances. This is per hour. Keep this at about half the hourly on-demand price of the desired instances to make sure your instances aren’t taken away by AWS when it needs capacity.
  • expires_days (int) – The number of days this request is active for. All instances launched by this request will live at least this long and will be terminated automatically after.
  • allocation_strategy ({'lowestPrice', 'diversified'}) – The allocation strategy used by the fleet manager.
  • instance_types (list of str) – List of the instance type to launch. See the following URL for a list of IDs: https://aws.amazon.com/ec2/pricing/on-demand/
  • instance_weights (list of float or None) – If instance_types is a list of different instance types, this is the relative weight applied towards launching each instance type. This can be used to launch a mix of instances in a defined ratio among their types. Doing this can make the spot fleet more resilient to AWS taking back the instances if it runs out of capacity.
  • instance_ami (str) – The Amazon Machine Image ID that describes the OS the instances will use after launch. The default ID is Amazon Linux 2 in the US East region.
  • instance_user_data (str or None) – This is either the path to a file on disk that contains a shell-script or a string containing a shell-script that will be executed by root right after the instance is launched. Use to automatically set up workers and queues. If None, will not execute anything at instance start up.
  • instance_ebs_optimized (bool) – If True, will enable EBS optimization to speed up IO. This is usually True for all instances made available in the last couple of years.
  • wait_until_up (bool) – If True, will not return from this function until the spot fleet request is acknowledged by AWS.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
  • raiseonfail (bool) – If True, will re-raise whatever Exception caused the operation to fail and break out immediately.
Returns:

This is the spot fleet request ID if successful. Otherwise, returns None.

Return type:

str or None

astrobase.awsutils.delete_spot_fleet_cluster(spot_fleet_reqid, client=None)[source]

This deletes a spot-fleet cluster.

Parameters:
  • spot_fleet_reqid (str) – The fleet request ID returned by make_spot_fleet_cluster.
  • client (boto3.Client or None) – If None, this function will instantiate a new boto3.Client object to use in its operations. Alternatively, pass in an existing boto3.Client instance to re-use it here.
Returns:

Return type:

Nothing.