Turning Audit Logging into a Shared Library in a Fintech Microservices System

Turning Audit Logging into a Shared Library in a Fintech Microservices System

Audit logging is one of those things that sounds simple until you actually have to do it properly.

In a fintech system especially a multi-tenant one audit logs aren’t optional. You need to know:

  • who did what
  • from which service
  • on which tenant
  • and at what time

At first glance, it feels like something every service should just handle on its own.

That assumption is what got me thinking deeper.

Why I didn’t want audit logic inside every service

The obvious solution would have been:

  • add audit logging logic to each microservice
  • push logs to a database
  • move on

But that approach breaks down quickly.

Each service would:

  • implement audit logic slightly differently
  • evolve its schema independently
  • log different levels of detail
  • become harder to reason about over time

Worse, compliance questions would turn into:

“It depends on which service handled that request.”

That wasn’t acceptable.

So instead of solving audit logging per service, I treated it as a platform concern.

The decision: audit logging as a shared library

Rather than building a central audit service that every microservice had to call synchronously, I built a shared audit library.

The idea was simple:

  • every service imports the same library
  • audit behaviour is consistent everywhere
  • services declare what to audit, not how

From the consuming service’s point of view, it’s just:

  • add a dependency
  • annotate the method
  • configure a message broker

No duplicated logic. No drift.

High-level architecture

At a high level, the system looks like this:

  • Microservices
    • use the shared audit library
    • emit audit events asynchronously
  • Message broker (RabbitMQ)
    • decouples audit logging from business flow
  • Audit service
    • consumes audit events
    • persists them to MongoDB
    • exposes filtering by service, operation, user, tenant

The important part:

Business logic never waits for audit logging to succeed.

Audit failures shouldn’t block onboarding, or updates unless explicitly configured to do so.

How auditing is injected without touching business logic

The key technical decision here was using Spring AOP.

The audit library exposes a custom @Audit annotation, and internally uses an @Around advice to intercept annotated methods at runtime.

This is important.

By intercepting at the method boundary — not at the HTTP layer — auditing works consistently for:

  • controllers
  • service methods
  • internal calls
  • async flows

Using ProceedingJoinPoint allows the audit layer to:

  • capture method arguments before execution (request context)
  • proceed with the actual business logic
  • capture the return value or exception (response context)

All of this happens without polluting domain logic.

The consuming service doesn’t need to know how auditing works only that it exists.

What it looks like for a consuming service

From a consuming service’s perspective, integration with the audit library is intentionally minimal.

There’s no SDK to learn, no client to manage, and no audit specific business logic scattered around the codebase.

Adding the dependency

The service simply imports the audit library like any other internal dependency:

<dependency>
    <groupId>com.bestcompany.worldauditlib</groupId>
    <artifactId>bestcompany-audit-library</artifactId>
    <version>0.3.0</version>
</dependency>

Once this is on the classpath, the auditing infrastructure becomes available automatically through Spring.

Required configuration

At minimum, the consuming service only needs to declare:

  • its service name (used for attribution)
  • a RabbitMQ connection string
spring.application.name=payment-service
spring.rabbitmq.addresses=amqp://user:password@rabbitmq-host:5672

That’s it.



Annotating what actually matters

Auditing is opt-in and explicit. A service only audits what it chooses to audit.

Here’s a simple example:

@Audit(
    activity = "User Login",
    isMetaDataRequired = true,
    failOnError = false
)
public AuthResponse login(LoginRequest request) {
    return authenticationService.authenticate(request);
}

A few things are happening here:

  • activity defines what is being recorded
  • isMetaDataRequired controls whether user and tenant context is attached
  • failOnError determines how tightly this operation is coupled to audit reliability

This makes auditing a design decision, not a side effect.


When audit failure should not block the request

Most endpoints fall into this category.

Read operations, non critical mutations, or user facing actions should continue to work even if the audit pipeline is temporarily unavailable.

That’s why failOnError = false exists.

If RabbitMQ is down:

  • the business logic still executes
  • the request still succeeds
  • audit failure is logged internally

This prevents operational issues in the audit pipeline from cascading into user facing outages.

When audit failure must block the request

Some operations are different.

For example:

  • payments
  • balance changes
  • irreversible state transitions

For these, audit guarantees matter more than availability.

Here’s what that looks like:

@Audit(
    activity = "Respond to Chargeback",
    isMetaDataRequired = true,
    failOnError = true
)
public ChargebackResponse respondToChargeback(ChargebackDecisionRequest request) {
    return chargebackService.processDecision(request);
}

Chargeback handling is a high risk financial operation.
Failing fast on audit errors prevents irreversible state changes without an audit record.

Why failOnError exists

The failOnError flag was added after a real incident.

We had a period where RabbitMQ was unavailable, and because auditing was synchronous at the interception point, every audited endpoint started failing including ones that shouldn’t have.

That forced a clear distinction:

  • Which operations require audit guarantees?
  • Which ones should remain available even if audit logging fails?

Instead of baking in a global rule, the decision was pushed down to the method level, where the context actually exists.

This keeps the system flexible without sacrificing correctness.

Why we record both request and response (intentionally)

One thing worth calling out explicitly is how auditing is handled across the lifecycle of a method call.

We publish audit data twice per operation:

  • once before execution
  • once after execution

This is intentional but it does not mean we create two unrelated audit records.

Under the hood, a single process identifier is generated at interception time and reused throughout the method’s lifecycle. The same audit object is enriched as execution progresses.

In practical terms:

  • the pre-execution publish captures intent and input
  • the post-execution publish updates the same audit context with the outcome

This ensures that:

  • failed operations still leave a trace
  • partial executions can be investigated
  • intent and outcome are correlated using the same process identifier

Rather than creating duplicate records, this approach produces a single logical audit trail that evolves over time.

There is some additional messaging overhead but in systems where traceability matters, the ability to reconstruct what was attempted is more important than optimizing away a second publish.

This was a deliberate tradeoff.

Improvements I’d make before open sourcing it

1. RabbitMQ connection handling

The initial version uses the raw RabbitMQ client.

That works, but it comes with tradeoffs:

  • manual connection management
  • fragile channel reuse
  • limited resilience

For a public library, this should move to Spring AMQP, which gives:

  • connection pooling
  • better error handling
  • cleaner abstractions

This is the first thing I would change.

2. Decoupling from RabbitMQ entirely

Right now, the library assumes RabbitMQ. That’s fine internally, but not ideal for public adoption.

A better design would introduce a small abstraction something like an AuditPublisher interface with pluggable implementations:

  • RabbitMQ
  • Kafka
  • SQS
  • even maybe HTTP

The audit library shouldn’t care how events are transported only that they are.

3. Event schema versioning

Audit logs live long.

Without schema versioning:

  • older consumers break
  • historical data becomes harder to reason about

A simple version field in the audit payload would go a long way.

4. Servlet vs Reactive

This implementation targets Servlet based Spring boot applications running on:

  • Tomcat
  • Jetty
  • Undertow

The audit library relies on:

  • Spring AOP method interception
  • HttpServletRequest for request-level context
  • thread-local security and request propagation

Because of these, the current implementation does not work with:

  • Spring WebFlux
  • Netty based reactive servers
  • Reactor driven execution models

Reactive applications handle request lifecycles and context propagation differently.

A future iteration of this library will separate audit lifecycle capture from transport concerns, with dedicated implementations for servlet-based and reactive applications.

Using Spring Boot’s conditional auto-configuration, the appropriate audit module can be selected automatically at runtime without requiring consuming services to change how they use the @Audit annotation.

Final thoughts

This started as a task to “add audit logging”.

It ended up becoming:

  • a shared library
  • a platform decision
  • and a lesson in treating cross cutting concerns seriously

Audit logging isn’t glamorous.
But when it’s done right, everything else becomes easier:

  • compliance
  • debugging
  • trust

I plan to rebuild and open source this properly with the improvements above and any one I discover while documenting the journey.