I recently completed a proof of concept for building a Jira AI Agent, which is a system that converts natural language into sophisticated JQL (Jira Query Language) and also analyzes all the jira tickets to provide actionable insights.
While Atlassian already has a Jira Service Agent, we aimed for something different. We wanted a custom AI agent that could:
Search Jira issues across standard and custom fields
Analyze issue history, comments, and attachments (PDFs, PPTs)
Conduct lightweight research across tickets
Suggest solutions or generate structured outputs (tables, explanations, resolution steps) based on the user intent
For example, if a user asked:
“How was error X resolved earlier?”
Now, the agent would:
Identify relevant jira issues
Parse comments and attachment
Summarize the resolution into a clear, reusable answer
This agent was designed mainly for field engineers, who usually sift through tickets manually to find solutions. The user interface also allowed users to download generated answers for offline use. Functionally, the system worked well. It had high accuracy and reasonable reasoning. However, there was one major issue.
The Problem: A Smart Agent That Felt Painfully Slow
Every time a user typed a query like: “Show me all high-priority bugs in the Mobile project assigned to me” They faced a loading spinner for about five-six seconds. And, I got the feedback that the its useful but very slow. Now, I have the real challenge to make it faster and provide the good user experience to the user.
Diagnosing the Bottleneck: The “Waterfall of Death”
I started analyzing all the logs and the issue was obvious. The entire system was running as a strictly sequential pipeline.
Original Execution Flow (Sequential)
Steps | Description | Time Cost |
|---|---|---|
Authentication | Jira API auth & permission validation | ~800 ms |
Prompt Construction | Sending full project schema (~1,200 tokens) | ~2 s |
Query Validation | LLM re-validates generated JQL | ~1 s |
Execution + UI | Backend waits for full JSON before responding | ~2+ s |
Total | ~6 seconds |
Then, I realized to change the flow and implemented these four-pillar optimization strategy:
1. Intent-Aware Memory (Redis Semantic Cache)
First thing I did was to stop treating every request as brand new request. Generally, for most of the queries wordings are different but the user intent remains the same. For example,
“My open bugs”
“Show me my bugs”
“What are the issues assigned to me?”
What I Changed
Implemented a Redis-backed intent cache
Cached generated JQL and, in some cases, final responses
Used semantic matching to detect intent similarity
Metric | Result |
|---|---|
Cache hit latency | ~15 ms |
LLM bypass rate | ~40% of repeated queries |
Cost impact | Significant reduction in inference calls |
2. Parallel Execution (Breaking the Sequential Chain)
Previously, I was waiting until LLM stopped processing, which was the reason for being too slow.
User Query
├── Process A: Generate optimized JQL
└── Process B: Validate permissions + fetch metadata
(runs in parallel)
I decided to use Python multiprocessing:
JQL generation and permission checks ran concurrently
Metadata was pre-fetched while the LLM was still working
3. Context Distillation (Smaller Targeted Prompts, Faster Reasoning)
Originally, I sent the entire Jira schema to the model every time, which was unnecessary and expensive.
What I Changed
Built a Minified Metadata Map
Sent only:
Relevant project keys
Required field IDs
Session-specific context
Benefit | Outcome |
|---|---|
Prompt size | Targeted Prompts |
Model responsiveness | Faster first visible output |
Cost | Lower token usage |
Smaller context → faster reasoning → better UX
4. Async Execution & Streaming (Perceived Speed Matters)
The final bottleneck was users were getting frustrated because they were not happy with the how results were delivered.
Before
Backend waited for the full JQL response
UI rendered only after everything completed
After
Migrated webhook handling to FastAPI
Used:
Async endpoints
Background tasks for JQL execution
Streaming responses to the frontend
Now, the UI starts responding as soon as the first token is ready.
Effect
Users see progress immediately—even if computation continues in the background.
The Result: The "Instant" Feeling
The result? We dropped from ~5s to 200ms.
Now, when the users tried it, they found the response to be faster and started using it more.
