How to Reduce Latency in AI Agents: A Jira JQL Optimization Case Study

I recently completed a proof of concept for building a Jira AI Agent, which is a system that converts natural language into sophisticated JQL (Jira Query Language) and also analyzes all the jira tickets to provide actionable insights.

While Atlassian already has a Jira Service Agent, we aimed for something different. We wanted a custom AI agent that could:

Search Jira issues across standard and custom fields
Analyze issue history, comments, and attachments (PDFs, PPTs)
Conduct lightweight research across tickets
Suggest solutions or generate structured outputs (tables, explanations, resolution steps) based on the user intent

For example, if a user asked:

❝

“How was error X resolved earlier?”

Now, the agent would:

Identify relevant jira issues
Parse comments and attachment
Summarize the resolution into a clear, reusable answer

This agent was designed mainly for field engineers, who usually sift through tickets manually to find solutions. The user interface also allowed users to download generated answers for offline use. Functionally, the system worked well. It had high accuracy and reasonable reasoning. However, there was one major issue.

The Problem: A Smart Agent That Felt Painfully Slow

Every time a user typed a query like: “Show me all high-priority bugs in the Mobile project assigned to me” They faced a loading spinner for about five-six seconds. And, I got the feedback that the its useful but very slow. Now, I have the real challenge to make it faster and provide the good user experience to the user.

Diagnosing the Bottleneck: The “Waterfall of Death”

I started analyzing all the logs and the issue was obvious. The entire system was running as a strictly sequential pipeline.

Original Execution Flow (Sequential)

Steps	Description	Time Cost
Authentication	Jira API auth & permission validation	~800 ms
Prompt Construction	Sending full project schema (~1,200 tokens)	~2 s
Query Validation	LLM re-validates generated JQL	~1 s
Execution + UI	Backend waits for full JSON before responding	~2+ s
Total		~6 seconds

Then, I realized to change the flow and implemented these four-pillar optimization strategy:

1. Intent-Aware Memory (Redis Semantic Cache)

First thing I did was to stop treating every request as brand new request. Generally, for most of the queries wordings are different but the user intent remains the same. For example,

“My open bugs”
“Show me my bugs”
“What are the issues assigned to me?”

What I Changed

Implemented a Redis-backed intent cache
Cached generated JQL and, in some cases, final responses
Used semantic matching to detect intent similarity

Metric	Result
Cache hit latency	~15 ms
LLM bypass rate	~40% of repeated queries
Cost impact	Significant reduction in inference calls

2. Parallel Execution (Breaking the Sequential Chain)

Previously, I was waiting until LLM stopped processing, which was the reason for being too slow.

User Query
├── Process A: Generate optimized JQL
└── Process B: Validate permissions + fetch metadata
(runs in parallel)

I decided to use Python multiprocessing:

JQL generation and permission checks ran concurrently
Metadata was pre-fetched while the LLM was still working

3. Context Distillation (Smaller Targeted Prompts, Faster Reasoning)

Originally, I sent the entire Jira schema to the model every time, which was unnecessary and expensive.

What I Changed

Built a Minified Metadata Map
Sent only:
- Relevant project keys
- Required field IDs
- Session-specific context

Benefit	Outcome
Prompt size	Targeted Prompts
Model responsiveness	Faster first visible output
Cost	Lower token usage

Smaller context → faster reasoning → better UX

4. Async Execution & Streaming (Perceived Speed Matters)

The final bottleneck was users were getting frustrated because they were not happy with the how results were delivered.

Before

Backend waited for the full JQL response
UI rendered only after everything completed

After

Migrated webhook handling to FastAPI
Used:
- Async endpoints
- Background tasks for JQL execution
- Streaming responses to the frontend

Now, the UI starts responding as soon as the first token is ready.

Effect

Users see progress immediately—even if computation continues in the background.

The Result: The "Instant" Feeling

The result? We dropped from ~5s to 200ms.

Now, when the users tried it, they found the response to be faster and started using it more.

How to Reduce Latency in AI Agents: A Jira JQL Optimization Case Study

Diagnosing the Bottleneck: The “Waterfall of Death”

1. Intent-Aware Memory (Redis Semantic Cache)

What I Changed

2. Parallel Execution (Breaking the Sequential Chain)

3. Context Distillation (Smaller Targeted Prompts, Faster Reasoning)

What I Changed

4. Async Execution & Streaming (Perceived Speed Matters)

Before

After

Effect

The Result: The "Instant" Feeling

Reply

Keep Reading

AI PM Insider: Product Management & Technical Frameworks for AI Professionals