Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

Mastering AWS Lambda: Optimize Cost and Performance

6 min read

Let’s debunk a common myth: serverless is more expensive. Depending on how you manage resources, the operational costs for server-based and serverless both may grow, requiring you to use better optimization techniques.

In a serverless architecture like Amazon Web Services (AWS), you don’t manage or provision servers. AWS automatically handles scaling, patching and infrastructure, allowing you to focus on the code and the deployments. On the other hand, in a server-based architecture, you’re responsible for managing and provisioning virtual or physical servers, as well as setting up, configuring and maintaining the server infrastructure.

Why Serverless?

Each choice you make depends on the application’s specific requirements and business goals. Sometimes organizations choose both approaches to achieve the best of both worlds, but for now, let’s focus on serverless architectures using the example of AWS Lambda.

Advantages

  1. Cost efficiency: Pay only for what you use (e.g., duration of AWS Lambda executions).
  2. Scalability: AWS Lambda handles autoscaling to meet demand, whether during a high-traffic event or downtime.
  3. Reduced operational overhead: No need for server management.
  4. Speed: Rapid development and deployment.

Trade-offs

Although AWS Lambda has numerous advantages, some reasons organizations might not adopt a serverless strategy include:

1. Cold Starts

AWS defines a cold start as the time it takes to download your code and start an initialization environment. This, in turn, increases latency, thus affecting the user experience.

There are ways to mitigate cold starts, such as making the right architectural decisions (e.g., runtime optimizations or using AWS features like provisioned concurrency or manual warm-up strategies). AWS provides other tools, such as AWS Lambda@Edge and API gateway caching, but at an extra cost.

2. Cost Misalignment

Serverless pricing can become expensive with high-frequency invocations or long execution times compared to server-based solutions. Running containers on Elastic Container Services (ECS) or EC2 instances might be more effective for consistent workloads because you pay the per-instance rate.

While cost savings are achievable, they may diminish if configurations are not optimized. For example, to manage traffic more effectively on servers, you need to set a threshold that caps the maximum throughput before triggering the provisioning of an additional server. However, this approach could result in multiple charges and delays due to the cold start required to activate the new server.

Overprovisioning is also a potential problem as it can lead to unnecessary cost overhead.

3. Customization Needs

Some use cases require low-level customization, which serverless abstractions may not support.

Serverless solutions are highly tied to specific cloud providers. Therefore, moving workloads to another provider, such as from AWS Lambda to Azure Functions, can involve significant reengineering.

Companies with hybrid cloud strategies may prefer portable solutions like Docker or Kubernetes; hence, they may choose solutions like ECS or EKS.

4. Use Case Mismatch With Serverless

Applications with long-running processes or those requiring fine-grained control over hardware resources might not be a good fit for serverless for reasons including AWS Lambda’s 15-minute timeout. For example, video processing, batch jobs and machine learning training workloads may not fit.

Serverless is stateless and is not ideal for long-lived connections or in-memory state. To resolve these cases, AWS step functions create event-driven workflows.

5. Poor Organizational Fit

Large companies like Google and Amazon have the resources and infrastructure to internally handle server-based architectures or even serverless on their servers, giving them more flexibility to choose the best solutions. Some of these providers have idle servers sitting around with no-code execution that will just take up these resources if there’s a need for execution.

Serverless became a byproduct of server architectures, where they primarily solved server-based architectural challenges, such as:

  • Overprovisioning and manual scaling: AWS Lambda automatically scales based on demand. It also scales to zero when not in use, thus eliminating the cost of maintaining idle servers.
  • Maintenance costs: Lambda abstracts the underlying architecture; therefore, AWS handles server maintenance, including patching and availability. Therefore developers can focus on writing code and deployment.
  • Long development and deployment cycles: Developers can iterate faster since they don’t need to think about managing servers or provisioning. Developers can easily do DevOps.

Estimating Serverless Costs

To get the right cost estimates for serverless applications, you need to account for:

  1. Invocation frequency: Number of function calls.
  2. Execution duration: Time your function takes to execute.
  3. Memory allocation: Amount of memory assigned to the function.
  4. Data transfer: Costs incurred for data transferred in and out of your application.

Suppose you’re running a Lambda function with 512MB of memory, invoked 100 million times per month, with each execution lasting 200 milliseconds. The cost calculation would be:

const memoryInMB = 512;
const executionTimeInMS = 200 / 1000; // Convert to seconds
const monthlyInvocations = 100000000;

const gbSeconds = (memoryInMB / 1024) * executionTimeInMS * monthlyInvocations;
const costPerGBSecond = 0.00001667; // AWS Lambda’s pricing (as of January 2025)

const totalCost = gbSeconds * costPerGBSecond;
console.log(`Estimated Monthly Cost: $${totalCost.toFixed(2)}`); // $167

The above example is for AWS Lambda, but you may need to use other resources like Simple Queue Service (SQS), Simple Notification Service (SNS), EventBridge or CloudWatch.

You can use the AWS calculator to get better estimates by selecting the required resources and provisioning.

Example Implementation

Let’s consider a real-life e-commerce example. If you’re focusing your e-commerce operation on a specific region, these are the things to consider in deploying your application:

  1. Operation time: Regular business hours may see steady traffic.
  2. Sales events: Black Friday or flash sales generate traffic spikes.
  3. Holidays and weekends: Regional and seasonal trends can impact usage.

Implementation Steps

  1. For event-driven triggers, use S3, API Gateway or EventBridge to trigger Lambda functions.
  2. Integrate DynamoDB to store metadata and orders.
  3. Use autoscaling with reserved concurrency during predictable high-traffic events.

These events require anticipating traffic spikes when the load will be different and you need better resource management to handle these events.

Cost Optimization and Resource Management Strategies

By design, Lambda is cost-efficient because you pay for what you use. However, your costs can increase with traffic spikes and inefficient configurations. Some ways to optimize resource utilization and Lambda costs are:

1. Predictive Autoscaling

Predictive autoscaling is an AWS cloud resource management technique that anticipates changes in demand by utilizing historical data and real-time metrics.

This can be done by using Lambda’s power tuning:

await lambda.updateFunctionConfiguration({
        FunctionName: functionName,
        MemorySize: memorySize,
    }).promise();

Machine learning-based predictions can be used with Amazon SageMaker (or any other solution) for better results. For example:

// types.ts
import * as dynamoose from "dynamoose";

interface UsageMetrics {
  timestamp: number;
  concurrentExecutions: number;
  averageLatency: number;
  errorRate: number;
  coldStarts: number;
  cost: number;
}


const usageMetricsSchema = new dynamoose.Schema({
  timestamp: Number,
  concurrentExecutions: Number,
  averageLatency: Number,
  errorRate: Number,
  coldStarts: Number,
  cost: Number
});

export const UsageMetrics = dynamoose.model("UsageMetrics", usageMetricsSchema);


// predictive-scaling.ts
import { SageMakerRuntime } from 'aws-sdk';
import { UsageMetrics } from './types';

interface PredictionResult {
  predictedConcurrency: number;
  confidence: number;
}

export class PredictiveScaling {
  private sagemaker: SageMakerRuntime;
  private readonly ENDPOINT_NAME = 'streaming-prediction-endpoint';

  constructor() {
    this.sagemaker = new SageMakerRuntime();
  }

  async getHistoricalMetrics(hoursBack: number): Promise {
    const endTime = Date.now();
    const startTime = endTime - (hoursBack * 60 * 60 * 1000);
    
    return await UsageMetrics.query('timestamp')
      .between(startTime, endTime)
      .exec();
  }

  async predictWorkload(): Promise {
    const historicalData = await this.getHistoricalMetrics(24);
    
    const payload = {
      instances: historicalData.map(metric => ({
        timestamp: metric.timestamp,
        concurrentExecutions: metric.concurrentExecutions,
        timeOfDay: new Date(metric.timestamp).getHours(),
        dayOfWeek: new Date(metric.timestamp).getDay()
      }))
    };

    const prediction = await this.sagemaker.invokeEndpoint({
      EndpointName: this.ENDPOINT_NAME,
      ContentType: 'application/json',
      Body: JSON.stringify(payload)
    }).promise();

    return JSON.parse(prediction.Body.toString());
  }

  async adjustConcurrency(prediction: PredictionResult): Promise {
    const lambda = new AWS.Lambda();
    
    await lambda.putFunctionConcurrency({
      FunctionName: 'streaming-service-handler',
      ReservedConcurrentExecutions: Math.ceil(prediction.predictedConcurrency * 1.1) // 10% buffer
    }).promise();
  }
}

// metrics-collector.ts
import { CloudWatch } from 'aws-sdk';
import { UsageMetrics } from './types';

export class MetricsCollector {
  private cloudwatch: CloudWatch;

  constructor() {
    this.cloudwatch = new CloudWatch();
  }

  async collectMetrics(): Promise {
    const endTime = new Date();
    const startTime = new Date(endTime.getTime() - (5 * 60 * 1000)); // Last 5 minutes

    const metrics = await this.cloudwatch.getMetricData({
      MetricDataQueries: [
        {
          Id: 'concurrent_executions',
          MetricStat: {
            Metric: {
              MetricName: 'ConcurrentExecutions',
              Namespace: 'AWS/Lambda'
            },
            Period: 300,
            Stat: 'Average'
          }
        },
        // Add other relevant metrics
      ],
      StartTime: startTime,
      EndTime: endTime
    }).promise();

    await UsageMetrics.create({
      timestamp: Date.now(),
      concurrentExecutions: metrics.MetricDataResults[0].Values[0],
      averageLatency: calculateAverageLatency(metrics),
      errorRate: calculateErrorRate(metrics),
      coldStarts: calculateColdStarts(metrics),
      cost: calculateCost(metrics)
    });
  }
}

This enables you to do automatic concurrency adjustments based on the values you get.

2. Optimize Cold Start Impacts

Use provisioned concurrency for latency-sensitive workloads. Here’s a simple implementation for predictive autoscaling with provisioned concurrency.

const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();

async function configureProvisionedConcurrency(functionName, concurrency) {
    await lambda.putProvisionedConcurrencyConfig({
        FunctionName: functionName,
        ProvisionedConcurrentExecutions: concurrency,
    }).promise();

    console.log(`Provisioned Concurrency set to ${concurrency} for ${functionName}`);
}

configureProvisionedConcurrency('MyLambdaFunction', 10);

You can also reduce deployment package size to decrease initialization time. This can be done by bundling dependencies like Webpack or Rollup. Techniques like tree-shaking help ensure that you use the required resources. You can also exclude dependencies that are available in a Lambda runtime.

3. Scheduled Scaling

Use AWS EventBridge to schedule scaling for predictable traffic patterns. For instance, you could run a function after a couple of minutes.

Resources:
  ScheduledScalingRule:
    Type: "AWS::Events::Rule"
    Properties:
      Name: "RunEvery10Minutes"
      ScheduleExpression: "rate(10 minutes)"
      State: "ENABLED"
      Targets:
        - Arn: !GetAtt MyLambdaFunction.Arn
          Id: "TargetFunction"

4. Optimize Invocation Frequency

Batch process events (e.g., SQS messages) to reduce the number of invocations. For instance, if delivery drivers need to pick up multiple orders:

const sqs = new AWS.SQS();

exports.handler = async (event) => {
    const messages = event.Records.map(record => JSON.parse(record.body));

    // Process batch of messages
    for (const message of messages) {
        console.log(`Processing message: ${message.id}`);
        // Business logic here
    }

    return {
        statusCode: 200,
        body: 'Batch processed successfully',
    };
};

5. Use Scale-to-Zero

Scale-to-zero is a concept where resources automatically scale down to zero when there’s no demand or workload. This eliminates idle resource costs, creating a cost-efficient model for applications with unpredictable traffic patterns. So, when the functions are not in use, the infrastructure releases all associated resources, and you aren’t charged for idle capacity.

Organizations can significantly reduce costs, improve resource utilization and simplify infrastructure management by adopting this model. If you combine this with the strategies for cost optimization and resource management above, a near scale-to-zero might be possible and highly optimized.

For low-peak hours, you can reduce the ReservedConcurrentExecutions to zero with:

const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();
const defaultConcurrency = 5

exports.handler = async (event) => {
    const traffic = event.traffic; // Example traffic data

    // Assumes one function but the logic can be updated depending on the no. of fns. 
    // You can add business hours logic as well
    if (!traffic) {
        await lambda.updateFunctionConfiguration({
            FunctionName: 'MyLambdaFunction',
            ReservedConcurrentExecutions: 0
        }).promise();
        console.log('Scaled to zero');
    } else {
        await lambda.updateFunctionConfiguration({
            FunctionName: 'MyLambdaFunction',
            ReservedConcurrentExecutions: defaultConcurrency
        }).promise();
        console.log('Scaled up for traffic');
    }
};

Conclusion

Serverless architecture, especially AWS Lambda, offers a scalable, cost-efficient solution for modern applications. While it has limitations, combining strategies like predictive autoscaling, cold start optimization and scale-to-zero allows organizations to maximize performance and minimize costs, paving the way for efficient resource utilization and better application performance.

Want to learn more about the potential of AWS? Learn how to seamlessly deploy React on AWS Amplify using Terraform and streamline your workflows.

The post Mastering AWS Lambda: Optimize Cost and Performance appeared first on The New Stack.

Kubefeeds Team A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.