Server-less Charging Architecture with AWS

Jaime Casero
9 min readDec 27, 2022

Charging is an important part of any service. Charging allows to keep track of Customer service usage, and is important source of information for many other systems in any solution (analytics, billing, troubleshooting api, usage api…)

This article will provide an implementation based on AWS technologies, delivering a completely Server-less architecture, with a minimal cost that will only grow with actual traffic.

High level picture

aws charging architecture

Charging is a publisher/consumer problem. While publishers are printing CDRs while processing traffic, consumer systems are waiting for the information to be processed. This problem is better fixed with a broker that decouples publishers from consumers, allows optimal throughput, and reduces dependencies among systems.

This architecture becomes a Server-less template that needs to be instantiated for the different services exposed in the solution. Assuming a Microservice architecture every Service (in the picture referred as “cdrPublisher”) will print a different type of CDR. Monoliths printing multiple CDR types is also possible. The specific Service CDR defines an specific document model, and is injected in specific Kinesis Streams for appropriate handling.

The key AWS technologies used in this architecture are:

  • AWS Kinesis Stream: For each CDR type a new Kinesis stream is created to ingest the incoming CDR generated by the corresponding Service. This stream becomes a broker that allows to subscribe different delivery/target systems with different requirements and capacities.
  • AWS Kinesis agent: This built-in java app needs to be deployed colocated to the corresponding Service.
  • AWS Firehose: There will be separate Firehose per target/delivery system, subscribed to the specific Service/CDR Kinesis Stream. Using a Firehose allows to easily configure the transformations required for the target system, while allowing to throttle the stream depending on the real time target system capacity.

Another important aspect of this Architecture is the document oriented approach. CDR content is usually changing frequently. As the service evolves and more features are implemented, new fields are required to be tracked in the CDR. This architecture allows to adapt to document model changes, with minimal schema management.

Anatomy of CDR

CDR content depends on specific Service and sometimes business requirements. Even-though some specs provide standard format, most of times their content will be different per solution.

In any case, there are some common contents worth revisiting that will allow us to understand how Charging works:

{
"sid":"SA99b4159476de42ea85eeefce29363c36",
"call_sid":"ID000000000031302e322e3131312e3433-CA39ce5a80a1c143c5ae7991227bf0e419",
"date_created":"Mon, 19 Dec 2022 13:23:20 +0000",
"account_sid":"AC40ea78485341fe8a47763043506064b5",
"org_sid":"OR87a155237d5241b6bbe11b05fe6f9836",
"api_version":"2012-04-24",
"status":"completed",
"reason":{"code":0,"message":"","protocol":""},
"duration":6,
"character_count":51,
"provider":"voicerss",
"language":"en-us",
"voice":"man",
"loop":1,
}

This CDR sample corresponds to a Text To Speech service invoked by the customer in the context of a voice call. There are some important fields to highlight:

  • “sid”: Identifies this specific TTS session. In this case is a simple UUID.
  • “account_sid”: Identifies the charging account or subscriber behind this TTS session.
  • “org_sid”: Identifies the organization associated to the account. Most subscriber models are built on some sort of hierarchy that requires careful tracking at the charging side.
  • “date_created”: Its important that all CDRs contain some sort of date-time timestamp. Since the the cdr collection will happen asynchronously, it’s important that each session saves the original timestamp when the usage was really performed.
  • “status”: most CDR will have similar field marking the final status of the CDR. Most of times this an enum with known values that allows to check whether this session was successful or not.
  • “reason”: Reason fields are also common to enrich the final status. The reason field explain why the session reached a certain status.
  • “character_count”: All cdrs contain some scalar field that measures the cost of the service provided. In this case a Text To Speech operation cost depends on the number of characters of the original String.
  • “call_sid”: Some complex services are bundling internal services that tracks their own usage. Keeping track of general session charging identifier, allows to link different services charging cdrs. In this case this field is storing the identifier of the original voice call.

Network Element CDR Publishing

The Service is printing to a local file, usually using some sort of Logging library (log4j, logback..) , while the agent is scanning the file and sending content to the specific Kinesis Stream.

Since the Service is printing to a local resource (normally EBS volume), the traffic can scale independently from Kinesis Stream server resources (horizontal scalability), and the dependency to Kinesis Stream API is removed from the Service.

The Kinesis Agent allows to configure the buffering with a size and time policy, allowing to tune the load towards the Kinesis Stream.

The agent is simple Java app that can be deployed via Docker. Proper volumes must be configured in both the Service container and the Agent one, so they can access the log files.

At this level, there is no document record processing. Kinesis Agent supports both CSV and JSON formats. For CDR purposes JSON format is preferred since it has richer data modelling support, and will provide better document oriented approach. CSV is better if saving bytes is important. Adding or modifying existing CDR schema won’t require any change in agent configuration.

Charging Broker

The broker is implemented using Kinesis Stream. The configuration looks like:

stream configuration

Some important stream settings:

  • On-Demand mode: this new mode introduce at Nov 2021 (aws doc) allows a Server less approach to Kinesis. No need to configure the server instance size, or number of servers. The stream will automatically scale as needed. The capacity on this mode may automatically scale up to 200K records/seconds. The limit can be increased, and the pricing model is like this
  • Server side encryption: Since CDR may contain PII or other kind of sensitive data, this settings ensures we can meet regulations and keep data private.
  • Data retention: This is the maximum time the stream will keep the data until all subscribers have completed the processing. By default this is set to 24 hours, but greater values can be set, which will translate in further costs. It’s important to set your data retention according to your incident management capacity/MTTR. You need to ensure any issue with a downstream delivery system is fixed before the retention is consumed, otherwise data lost will happen.

At this level, there is no document record processing. Adding or modifying existing CDR schema won’t require any change in stream configuration.

Troubleshooting API Delivery Destination

This API allows customers to search what happens with Service sessions, and troubleshoot. The API allows to search CDRs by different filters (status, date…)

Troubleshooting API

Key elements in the architecture:

  • Firehose: The Firehose is subscribed to the Charging Broker, and configured to support and optimize the Troubleshooting queries. In this case, no format transformation is required. The Firehose transformation lambda is provided to anonymize/mask records in case the subscriber requires this process. Firehose target is simple S3 bucket, with buffering configuration. We can use the standard Firehose partitioning which will use year/month/day/hour scheme, or configure our own partitioning schema using record content queried using JQ apths. Firehose will compress records in order to minimize s3 storage cost
  • Masking Lambda: This lambda is automatically invoked by the Firehose with an independent buffering tuning. The Lambda will mask the fields marked as anonymous. These fields are part of Lambda environment variable configuration, so the lambda code remains independent to the CDR model.
  • S3 bucket: The bucket is used to store the CDRs with proper partitioning to optimize the queries. S3 bucket is encrypted to meet regulation, and retention policy is configured to remove data as it meets the policy.
  • Athena: Athena allows to query data in s3 buckets. In order to query the data, an AWS Glue table is required. In this table we will define the minimal set of fields to support the defined filters.
  • Searching Lambda: This lambda uses Athena API to execute queries for the user. Date is a mandatory filter with a max time interval restriction, allowing queries to have low latency.
  • APIGateway: The gateway implements the Troubleshooting API using OpenAPI 3 definitions. It defines the resources and methods supported, and the request and response models. It transforms the request/response to/from the Searching Lambda. The APIGateway is configured with a Cognito User Pool authorizer to authenticate the API requests.

Analytics Delivery Destination

Analytics dashboards allow Customer to understand their traffic pattern, and to take business decisions based on the data.

Key elements in the architecture:

  • Firehose: The Firehose is subscribed to the Charging Broker, and configured to support and optimize the Analytics queries. In this case record conversion is configured to transform records into Parquet format. For this conversion proper AWS Glue table schema is required. This schema will list the fields required to support the analytics queries, ignoring the rest of fields. The conversion process will anonymize the record since full CDR content is not required. Partitioning may use again default S3 schema, or use a combination of partition keys and bucketing to optimize the queries.
  • S3 bucket: The bucket is used to store the CDRs with proper partitioning to optimize the queries. S3 bucket is encrypted to meet regulation, and retention policy is configured to remove data as it meets the policy.
  • Athena: Athena allows to query data in s3 buckets. In order to query the data, an AWS Glue table is required. In this table we will define the minimal set of fields to support the defined filters.
  • QuickSight: Allows to define the dashboards and datasets supporting the Analytics queries. SPICE can be used on top of S3 to accelerate the queries to lower the latency to the minimum

Billing Delivery Destination

Billing systems allows to create and send invoices to customers.

billing arch

Key elements in the architecture:

  • Firehose: The Firehose is subscribed to the Charging Broker, and configured to support and optimize the Billing system synchronization. In this case, no format conversion is required. Lambda transformation is used to anonymize CDR content. Default partitioning is used.
  • S3 bucket: The bucket is used to store the CDRs with proper partitioning to optimize synchronization. S3 bucket is encrypted to meet regulation, and retention policy is configured to remove data as it meets the policy. In this case retention is low, since the Billing system will store the cdrs in its own data store.
  • AWS Transfer: The transfer allows secure FTP access for the billing system to synchronize cdrs. In this case a simple SSH key will be generated for the billing system to authenticate.

FTP Export Delivery Destination

This subsytem allows customer to easily download all their CDRs in order to feed their own billing, analytics systems.

Export arch

Key elements in the architecture:

  • Firehose: The Firehose is subscribed to the Charging Broker, and configured to support and optimize the cdr export. In this case, no format conversion is required. JQ query is used to facilitate subscriber based partitioning (the subscriber id is extracted from the incoming record). This is key to ensure customer are only allowed to access their own CDRs.
  • S3 bucket: S3 bucket is encrypted to meet regulation, and retention policy is configured to remove data as it meets the policy. In this case retention is low, since the Billing system will store the cdrs in its own data store.
  • AWS Transfer: The transfer allows secure FTP access for the billing system to synchronize cdrs. In this case, a custom identity provider is deployed to allow customer to use regular API key to access the FTP site.
  • Transfer Auth Lambda: This lambda takes the API key from the FTP session and uses it to authenticate it against the IAM solution. In this case a Cognito Pool is used.

Conclusion

  • Since all components used here are standard AWS, they support monitoring through Cloudwatch as usual. Metric dashboards can be easily built to monitor the production environment. Alarms can be configured to escalate to support teams when corresponding conditions are met.
  • All the AWS technologies used are completely server less, allowing a very flexible scalability versus cost models (such as the case of Kinesis Stream On-Demand).
  • All components are designed with Document Oriented approach to minimize data model schema management. All specific data schema information is conveniently stored in separate stores (lambda env vars, aws glue catalog…) that can be managed from infrastructure automation tools (Terraform, Pulimi…)

--

--

Jaime Casero

Software Engineer with 20 years experience in the Telco sector. Currently working at 1nce.