I recently completed a proof of concept for building a Jira AI Agent, which is a system that converts natural language into sophisticated JQL (Jira Query Language) and also analyzes all the jira tickets to provide actionable insights.

While Atlassian already has a Jira Service Agent, we aimed for something different. We wanted a custom AI agent that could:

  • Search Jira issues across standard and custom fields

  • Analyze issue history, comments, and attachments (PDFs, PPTs)

  • Conduct lightweight research across tickets

  • Suggest solutions or generate structured outputs (tables, explanations, resolution steps) based on the user intent

For example, if a user asked:

“How was error X resolved earlier?”

Now, the agent would:

  1. Identify relevant jira issues

  2. Parse comments and attachment

  3. Summarize the resolution into a clear, reusable answer

This agent was designed mainly for field engineers, who usually sift through tickets manually to find solutions. The user interface also allowed users to download generated answers for offline use. Functionally, the system worked well. It had high accuracy and reasonable reasoning. However, there was one major issue.

The Problem: A Smart Agent That Felt Painfully Slow

Every time a user typed a query like: “Show me all high-priority bugs in the Mobile project assigned to me” They faced a loading spinner for about five-six seconds. And, I got the feedback that the its useful but very slow. Now, I have the real challenge to make it faster and provide the good user experience to the user.

Diagnosing the Bottleneck: The “Waterfall of Death”

I started analyzing all the logs and the issue was obvious. The entire system was running as a strictly sequential pipeline.

Original Execution Flow (Sequential)

Steps

Description

Time Cost

Authentication

Jira API auth & permission validation

~800 ms

Prompt Construction

Sending full project schema (~1,200 tokens)

~2 s

Query Validation

LLM re-validates generated JQL

~1 s

Execution + UI

Backend waits for full JSON before responding

~2+ s

Total

~6 seconds

Then, I realized to change the flow and implemented these four-pillar optimization strategy:

1. Intent-Aware Memory (Redis Semantic Cache)

First thing I did was to stop treating every request as brand new request. Generally, for most of the queries wordings are different but the user intent remains the same. For example,

  • “My open bugs”

  • “Show me my bugs”

  • “What are the issues assigned to me?”

What I Changed

  • Implemented a Redis-backed intent cache

  • Cached generated JQL and, in some cases, final responses

  • Used semantic matching to detect intent similarity

Metric

Result

Cache hit latency

~15 ms

LLM bypass rate

~40% of repeated queries

Cost impact

Significant reduction in inference calls

2. Parallel Execution (Breaking the Sequential Chain)

Previously, I was waiting until LLM stopped processing, which was the reason for being too slow.

User Query
├── Process A: Generate optimized JQL
└── Process B: Validate permissions + fetch metadata
(runs in parallel)

I decided to use Python multiprocessing:

  • JQL generation and permission checks ran concurrently

  • Metadata was pre-fetched while the LLM was still working

3. Context Distillation (Smaller Targeted Prompts, Faster Reasoning)

Originally, I sent the entire Jira schema to the model every time, which was unnecessary and expensive.

What I Changed

  • Built a Minified Metadata Map

  • Sent only:

    • Relevant project keys

    • Required field IDs

    • Session-specific context

Benefit

Outcome

Prompt size

Targeted Prompts

Model responsiveness

Faster first visible output

Cost

Lower token usage

Smaller context → faster reasoning → better UX

4. Async Execution & Streaming (Perceived Speed Matters)

The final bottleneck was users were getting frustrated because they were not happy with the how results were delivered.

Before

  • Backend waited for the full JQL response

  • UI rendered only after everything completed

After

  • Migrated webhook handling to FastAPI

  • Used:

    • Async endpoints

    • Background tasks for JQL execution

    • Streaming responses to the frontend

Now, the UI starts responding as soon as the first token is ready.

Effect

Users see progress immediately—even if computation continues in the background.

The Result: The "Instant" Feeling

The result? We dropped from ~5s to 200ms.

Now, when the users tried it, they found the response to be faster and started using it more.

Reply

Avatar

or to participate

Keep Reading

No posts found