How to build a production-ready Retrieval-Augmented Generation pipeline that turns your Knowledge Base into an intelligent assistant — using Just247Pipes visual pipeline designer.
Why RAG? The Next Step Beyond Search
Your Knowledge Base (set up in the previous guide) already delivers instant, accurate answers to user questions. But it has a limitation: it can only return existing FAQ entries. What happens when a user asks a question that doesn’t match any FAQ exactly? Or when the answer needs to synthesize information from multiple sources?
That’s where RAG — Retrieval-Augmented Generation — transforms your Knowledge Base from a search engine into an intelligent assistant.
| Knowledge Base Alone | Knowledge Base + RAG |
| Returns the best-matching FAQ entry | Generates a natural-language answer from multiple sources |
| Only answers questions that match existing FAQs | Answers any question grounded in your documents |
| “Close enough” matching | Synthesizes and reasons across multiple passages |
| No citations | Every claim linked to a source document |
RAG works in three steps:
1. Retrieve — Find the most relevant passages from your Knowledge Base and vector store
2. Augment — Pass those passages as context to a Large Language Model
3. Generate — The LLM produces a coherent, grounded answer — always citing your data, never fabricating information
And with Just247Pipes, building a RAG pipeline is a visual, step-by-step process. No machine learning expertise required.
What You’ll Build
By the end of this guide, you’ll have a complete RAG pipeline that:
– ✅ Accepts natural-language questions from your users
– ✅ Retrieves the most relevant passages from your vector store and Knowledge Base
– ✅ Generates accurate, grounded answers using an LLM — always citing your data
– ✅ Verifies factual claims against retrieved context to prevent hallucinations
– ✅ Includes source citations so users can verify every answer
– ✅ Runs 24/7 without manual intervention
Here’s the pipeline at a glance:
Ingestion Pipeline:
File Input → Processor → Embedder → PGVector Upsert
Query Pipeline:
Text Input → Embedder → PGVector Search → Knowledge Base → LLM → Response Output
Step 1: Create Vector Embeddings for Your Documents
Your Knowledge Base handles structured FAQ matching. RAG adds a second retrieval layer: vector similarity search — finding relevant passages by meaning, not just by keyword or FAQ match.
To enable this, you need to convert your documents into vector embeddings and store them in a vector database.
The Ingestion Pipeline
File Input → Processor → Embedder → PGVector Upsert
File Input
Drag a File Input component to start:
– Supported formats: PDF, TXT, DOCX, CSV, Markdown
– Configuration: Point to your document directory or connect cloud storage
Processor (Chunking)
Large documents must be broken into smaller chunks for effective retrieval:
| Setting | Recommended Value | Why |
| Chunk Size | 500–1000 tokens | Keeps chunks focused and retrievable |
| Overlap | 50–100 tokens | Preserves context across chunk boundaries |
| Strategy | Sliding window | Ensures no information is lost at chunk edges |
Embedder
The Embedder converts each text chunk into a vector — a mathematical representation of meaning:
| Property | Recommended Value | Notes |
| Provider | openai | Best quality; also supports `huggingface`, `cohere`, `local` |
| Model | text-embedding-3-small | Great balance of speed, cost, and quality |
| Dimensions | 1536 | Standard for OpenAI embeddings |
| Normalize | true | Ensures consistent similarity comparisons |
| Enable Caching | true | Avoid re-embedding unchanged documents |
| Enable Vector Store | true | Connect directly to your vector database |
PGVector Upsert
The PGVector Upsert component stores your embeddings in PostgreSQL with the pgvector extension:
{
"database_url": "postgresql://user:pass@host:5432/your_db",
"table_name": "knowledge_vectors",
"distance_metric": "cosine"
}
What it does:
– Auto-creates the table and index if they don’t exist
– Upserts documents — updates existing ones, inserts new ones
– Returns confirmation with stored document IDs
💡 Business Value Tip
> Incremental ingestion means zero downtime. The PGVector Upsert component handles updates seamlessly. Changed a policy document? Re-ingest it and the old version is replaced — no pipeline rebuild required. Your vector store stays current without any manual work.
Step 2: Build the Query Pipeline
This is the heart of your RAG system — the pipeline that accepts a user question, retrieves relevant context, and generates a grounded answer.
The Complete Pipeline
Text Input → Embedder → PGVector Search → Knowledge Base → LLM → Response Output
Let’s configure each component.
Text Input
| Property | Value | Purpose |
| Label | User Question | Clear identification |
| Placeholder | Ask a question… | Guide your users |
| Max Length | 500 | Prevent overly verbose questions |
| Trim Whitespace | true | Clean input automatically |
Embedder (Query)
This is a second Embedder instance — same model as ingestion, but now converting the user’s question into a vector for search:
| Property | Value | Why |
| Provider | openai | Must match the ingestion embedder |
| Model | text-embedding-3-small | Must match the ingestion embedder |
| Dimensions | 1536 | Must match the ingestion embedder |
> ⚠️ Critical: The query Embedder must use the same model and dimensions as the ingestion Embedder. Mismatched models produce incompatible vectors, and search quality will drop dramatically.
PGVector Search
The PGVector Search component performs similarity search — finding the most relevant document chunks:
{
"database_url": "postgresql://user:pass@host:5432/your_db",
"table_name": "knowledge_vectors",
"distance_metric": "cosine"
}
Key settings:
| Property | Value | Why |
| Top K | 5 | Retrieve the 5 most relevant passages |
| Similarity Threshold | 0.7 | Only return results above 70% relevance |
| Include Metadata | true | Preserve source information for citations |
| Hybrid Search | true (optional) | Combine semantic + keyword search |
Knowledge Base (Search)
Connect the PGVector Search results into the Knowledge Base component for an additional FAQ matching layer:
| Property | Value |
| Operation | search_faq |
| Max Results | 5 |
| Auto Categorize | true |
This dual retrieval approach — vector similarity plus structured FAQ lookup — gives you the best of both worlds: deep semantic understanding and precise FAQ matching.
💡 Business Value Tip
> Dual retrieval catches what single methods miss. Vector search finds semantically relevant passages even when keywords differ. Knowledge Base search finds exact FAQ matches with high confidence. Together, they ensure your RAG pipeline always has the best possible context — leading to more accurate, complete answers.
Step 3: Configure the LLM for Grounded Answers
The LLM component is where retrieval meets generation. It takes the retrieved context and the user’s question, then produces a coherent, factual answer.
LLM Configuration for RAG
| Property | Recommended Value | Why |
| Primary Provider | openai | Industry-leading quality |
| Primary Model | gpt-4 | Best accuracy for RAG answers |
| Temperature | 0.1 | Low temperature = factual, consistent answers |
| Max Tokens | 1000 | Sufficient for detailed answers |
| Enable RAG | true | Activates built-in RAG support |
| Max Context Documents | 5 | Include top 5 retrieved passages |
| Enable Citations | true | Attribute answers to source documents |
| Citation Format | markdown | Clean, readable source attribution |
| Verify Claims in Context | true | Reduce hallucinations |
| Check Factuality | true | Extra safety layer |
System Prompt (The Most Important Setting for RAG)
The system prompt is what prevents the LLM from making things up.
Here’s an optimized RAG system prompt:
You are a knowledgeable assistant that answers questions based solely on the provided context.
Follow these rules:
1. Answer ONLY using information from the provided context
2. If the context doesn't contain relevant information, say: "I don't have enough information to answer this question."
3. Always cite the source document when providing facts
4. Be concise but thorough
5. Do not speculate or add information not present in the context
User Prompt Template
Context:
{context}
Question: {question}
Please provide a detailed answer based only on the context above.
Include source citations where applicable.
💡 Business Value Tip
> Low temperature + citations = trustworthy AI. Setting the LLM temperature to 0.1 ensures consistent, factual answers — not creative fiction. Enabling citations means every answer links back to your source documents, building user trust and enabling verification. Your customers and employees can verify what the AI says, rather than blindly trusting it.
Step 4: Wire the Pipeline on the Canvas
On the Just247Pipes canvas, connect components by drawing edges between their ports:
1. Text Input (output: question) → Embedder (input: text)
2. Embedder (output: embeddings) → PGVector Search (input: query_embedding)
3. PGVector Search (output: results) → Knowledge Base (input: question)
4. Knowledge Base (output: matched_faq) → LLM (input: context)
5. Text Input (output: question) → LLM (input: prompt) (passthrough)
6. LLM (output: response) → Response Output
The result is a clear, auditable pipeline that anyone on your team can understand and modify — not a black-box script that only one developer can maintain.
Step 5: Test and Deploy
Testing Your Pipeline
Just247Pipes provides built-in testing tools:
1. Click “Run” on the canvas to execute the pipeline with sample input
2. Inspect each component’s output — verify embeddings, search results, and LLM responses
3. Test with edge cases — questions outside your knowledge base, ambiguous queries, multi-part questions
4. Monitor execution logs — track data flow, latencies, and errors at every stage
Going Live
Once validated, deploy with confidence:
– Schedule ingestion pipelines to re-index documents on a cron schedule (e.g., nightly)
– Expose the query pipeline via REST API for integration into your website, app, or chatbot
– Monitor executions through the built-in dashboard — track success rates, latencies, and costs
Real-World Examples
Example 1: Customer Support AI Assistant
A SaaS company adds RAG on top of their Knowledge Base (built in the previous guide):
Ingestion Pipeline:
Help Center Articles → File Input → Processor → Embedder → PGVector Upsert
Query Pipeline:
Customer Question → Text Input → Embedder → PGVector Search
→ Knowledge Base (search_faq) → LLM → Natural Language Answer with Citations
Result: 65% of tickets resolved automatically. Response time drops from hours to seconds. Every answer includes a citation linking to the source document.
Example 2: Internal Knowledge Assistant
An enterprise connects its internal wiki, HR policies, and compliance documents:
Ingestion Pipeline:
Internal Wiki + Policies → File Input → Processor → Embedder → PGVector Upsert
Query Pipeline:
Employee Question → Text Input → Embedder → PGVector Search
→ Knowledge Base (search_faq) → LLM → Answer with Source Citations
Result: Employees get instant, sourced answers to policy questions. HR saves 20+ hours per week. New hires onboard faster with 24/7 access to organizational knowledge.
Example 3: Smart Escalation
A financial services firm adds confidence-based escalation:
Query Pipeline:
Customer Question → Text Input → Embedder → PGVector Search
→ Knowledge Base → LLM → Response
If LLM confidence < threshold:
→ Escalation → Email/Slack Notification (to human agent)
Result: 80% of inquiries resolved automatically. Complex or low-confidence questions seamlessly escalated to human experts. Full audit trail maintained for compliance.
Advanced: Taking Your RAG Pipeline Further
Once your core pipeline is running, Just247Pipes makes it easy to add sophisticated capabilities.
Hybrid Search
Combine semantic (vector) search with keyword matching for the best retrieval quality:
– PGVector Search: Set `hybrid_search: true`
– Embedder: Enable `Hybrid Embeddings` with configurable `Sparse Weight` and `Dense Weight`
Hybrid search catches both meaning-based matches (“How do I reset my password?”) and exact-term matches (error codes, product names, SKU numbers).
Conversation Memory
The LLM component supports multi-turn conversations:
| Setting | Value | Purpose |
| Include Conversation History | true | Enable follow-up questions |
| Max History Messages | 10 | Remember last 10 exchanges |
| Enable History Summarization | true | Compress long conversations |
This means your RAG assistant handles follow-ups naturally — “Tell me more about that” or “What about enterprise pricing?”
Intent Detection + Smart Routing
Add an Intent Detection component before your retrieval to classify questions:
– Route billing questions to billing-specific knowledge
– Route technical questions to technical documentation
– Route account questions to account management flows
Escalation
Add an Escalation component for confidence-based routing:
| Confidence Level | Action |
| High (> 0.85) | Deliver answer automatically |
| Medium (0.6–0.85) | Deliver with a disclaimer |
| Low (< 0.6) | Escalate to a human agent via Slack, Email, or Telegram |
Cost Optimization
Just247Pipes includes built-in cost controls:
– Enable Cost Tracking — Monitor spending per user, model, and intent
– Daily/Monthly Cost Limits — Set budgets and alerts
– Model Routing — Use cheaper models (GPT-3.5) for simple queries, premium models (GPT-4) for complex ones
– Semantic Caching — Avoid re-processing identical or similar questions
Why Just247Pipes for RAG? Business Value, Simplicity, and Flexibility
🎯 Business Value
| Metric | Knowledge Base Only | Knowledge Base + RAG |
| Question coverage | Only pre-written FAQs | Any question answerable from your documents |
| Answer quality | Best-matching FAQ entry | Synthesized, contextual answer with citations |
| Hallucination risk | N/A (returns stored answers) | Minimized by grounding + verification |
| User trust | Moderate | High (every claim linked to a source) |
| Support cost per ticket | $2–$5 (search-based) | $0.50–$2.00 (AI-generated) |
🧩 Simplicity
– Visual pipeline designer — Drag, connect, configure. No ML expertise required.
– Template system — Start from pre-built RAG templates and customize in minutes
– Built-in RAG support — The LLM component has RAG mode, citations, and factuality checking built in
– One-click deployment — No DevOps gymnastics
🔄 Flexibility
– Swap LLM providers — OpenAI, Anthropic, local models — change one dropdown, pipeline stays the same
– Swap vector stores — pgvector, Pinecone, Weaviate, Milvus, Qdrant — pick what fits your infrastructure
– Add components freely — Need intent detection? Drop it in. Need escalation? Add it. Want notifications? Add Slack or Email output
– Scale deployment — Docker, AWS, GCP, Azure, or on-premises behind your firewall
Technology Glossary
| Term | Plain-English Definition |
| RAG (Retrieval-Augmented Generation) | A technique where an AI first *retrieves* relevant documents from a knowledge base and vector store, then *generates* an answer based on those documents — instead of relying solely on its training data. This makes answers grounded, factual, and specific to your organization. |
| Embedding | A mathematical representation of text as a list of numbers (a vector). Texts with similar *meaning* get similar embeddings, enabling computers to find semantically related content — even if the exact words differ. |
| Embedder | The component that converts raw text into embeddings. Just247Pipes supports OpenAI, HuggingFace, Cohere, Azure, and local models. |
| Vector Store / Vector Database | A specialized database that stores embeddings and supports fast similarity search. Examples: pgvector, Pinecone, Weaviate, Milvus, Qdrant. Think of it as a search engine that finds content by *meaning*, not just by keywords. |
| pgvector | A PostgreSQL extension that adds vector similarity search to the world’s most popular open-source database. Store and query embeddings without adding a separate database to your stack. |
| Cosine Similarity | A mathematical measure of how similar two vectors are, ranging from -1 (opposite) to 1 (identical). In RAG, it ranks how closely a document matches a question. A score of 0.7+ typically indicates strong relevance. |
| Top-K | The number of most-relevant results to retrieve. Top-K = 5 means “give me the 5 best-matching documents.” Higher K = more context but more noise; lower K = less context but higher precision. |
| Semantic Search | Search that understands *meaning* rather than just matching keywords. “How do I reset my password?” and “password recovery process” would match in semantic search, even though they share few words. |
| Hybrid Search | Combining semantic (meaning-based) search with keyword (exact match) search. Catches both conceptual matches and exact terms like error codes or product names. |
| Chunking | Breaking large documents into smaller pieces (chunks) so they can be individually embedded and retrieved. Without chunking, a long document would produce a single embedding that dilutes specific topics. |
| LLM (Large Language Model) | An AI model (like GPT-4) that generates human-like text. In RAG, the LLM receives the user’s question plus the retrieved context and generates a coherent, grounded answer. |
| System Prompt | Instructions given to the LLM that define its behavior. In RAG, the system prompt typically says “Answer only using the provided context” to prevent the AI from making things up. |
| Temperature | A setting that controls how creative (high) vs. deterministic (low) the LLM’s responses are. For RAG, low temperature (0.1–0.3) ensures factual, consistent answers. |
| Hallucination | When an AI generates confident-sounding information that isn’t based on facts. RAG dramatically reduces hallucinations by grounding answers in retrieved documents. |
| Citation | A reference to the source document from which an answer was derived. Citations let users verify AI responses and build trust. |
| Upsert | A database operation that either inserts a new record or updates an existing one. In RAG, “upserting” embeddings means your vector store stays current without manual cleanup. |
| Similarity Threshold | A minimum relevance score (0–1) below which results are discarded. Setting it to 0.7 means “only show results that are at least 70% relevant.” |
| Semantic Caching | Storing previous AI responses and reusing them when a *semantically similar* question comes in — even if worded differently. Reduces costs and latency significantly. |
| Escalation | Routing a query to a human agent when the AI can’t confidently handle it. Essential for maintaining quality and trust in production AI systems. |
Quick-Start Checklist
– [ ] Complete the Knowledge Base guide — Set up and populate your Knowledge Base first (see previous guide)
– [ ] Choose your embedding model — OpenAI `text-embedding-3-small` is a great starting point
– [ ] Build the ingestion pipeline — File Input → Processor → Embedder → PGVector Upsert
– [ ] Run ingestion — Index your documents into the vector store
– [ ] Build the query pipeline — Text Input → Embedder → PGVector Search → Knowledge Base → LLM → Response
– [ ] Configure the LLM for RAG — Enable RAG mode, citations, factuality checking; set temperature to 0.1
– [ ] Write a strong system prompt — Instruct the LLM to answer only from context
– [ ] Test with real questions — Try queries your users would actually ask, including edge cases
– [ ] Deploy — Expose via API, integrate into your application, set up monitoring
– [ ] Iterate — Add hybrid search, conversation memory, intent detection, and escalation as needed
Conclusion: From Knowledge Base to Intelligent Assistant
In the previous guide, you built a Knowledge Base that delivers instant, accurate answers from your curated FAQs. Now, with RAG, you’ve transformed that Knowledge Base into an intelligent assistant that can:
– Answer any question — not just those with pre-written FAQs
– Synthesize information — draw from multiple sources to build complete answers
– Cite sources — every claim linked to a document your team can verify
– Prevent hallucinations — grounded responses with factuality checking
– Scale effortlessly — handle thousands of concurrent queries without adding staff
All through Just247Pipes’ visual pipeline designer. No machine learning team required. No months of development. Just drag, connect, configure, and deploy.
Your organization’s knowledge is its most valuable asset. RAG makes that knowledge universally accessible — through natural conversation, grounded in your own data, with citations your users can trust.
—
Just247Pipes — Transform your Knowledge Base into an AI-powered assistant with visual pipeline design.
Leave a Reply