Building a RAG Pipeline on Just247Pipes: AI-Powered Answers Grounded in Your Data

How to build a production-ready Retrieval-Augmented Generation pipeline that turns your Knowledge Base into an intelligent assistant — using Just247Pipes visual pipeline designer.

Why RAG? The Next Step Beyond Search

Your Knowledge Base (set up in the previous guide) already delivers instant, accurate answers to user questions. But it has a limitation: it can only return existing FAQ entries. What happens when a user asks a question that doesn’t match any FAQ exactly? Or when the answer needs to synthesize information from multiple sources?

That’s where RAG — Retrieval-Augmented Generation — transforms your Knowledge Base from a search engine into an intelligent assistant.

Knowledge Base AloneKnowledge Base + RAG
Returns the best-matching FAQ entryGenerates a natural-language answer from multiple sources
Only answers questions that match existing FAQsAnswers any question grounded in your documents
“Close enough” matchingSynthesizes and reasons across multiple passages
No citationsEvery claim linked to a source document

RAG works in three steps:

1. Retrieve — Find the most relevant passages from your Knowledge Base and vector store

2. Augment — Pass those passages as context to a Large Language Model

3. Generate — The LLM produces a coherent, grounded answer — always citing your data, never fabricating information

And with Just247Pipes, building a RAG pipeline is a visual, step-by-step process. No machine learning expertise required.

What You’ll Build

By the end of this guide, you’ll have a complete RAG pipeline that:

– ✅ Accepts natural-language questions from your users

– ✅ Retrieves the most relevant passages from your vector store and Knowledge Base

– ✅ Generates accurate, grounded answers using an LLM — always citing your data

– ✅ Verifies factual claims against retrieved context to prevent hallucinations

– ✅ Includes source citations so users can verify every answer

– ✅ Runs 24/7 without manual intervention

Here’s the pipeline at a glance:

Ingestion Pipeline:
File Input → Processor → Embedder → PGVector Upsert

Query Pipeline:
Text Input → Embedder → PGVector Search → Knowledge Base → LLM → Response Output

Step 1: Create Vector Embeddings for Your Documents

Your Knowledge Base handles structured FAQ matching. RAG adds a second retrieval layer: vector similarity search — finding relevant passages by meaning, not just by keyword or FAQ match.

To enable this, you need to convert your documents into vector embeddings and store them in a vector database.

The Ingestion Pipeline

File Input → Processor → Embedder → PGVector Upsert

File Input

Drag a File Input component to start:

Supported formats: PDF, TXT, DOCX, CSV, Markdown

Configuration: Point to your document directory or connect cloud storage

Processor (Chunking)

Large documents must be broken into smaller chunks for effective retrieval:

SettingRecommended ValueWhy
Chunk Size500–1000 tokensKeeps chunks focused and retrievable
Overlap50–100 tokensPreserves context across chunk boundaries
StrategySliding windowEnsures no information is lost at chunk edges

Embedder

The Embedder converts each text chunk into a vector — a mathematical representation of meaning:

PropertyRecommended ValueNotes
ProvideropenaiBest quality; also supports `huggingface`, `cohere`, `local`
Modeltext-embedding-3-smallGreat balance of speed, cost, and quality
Dimensions1536Standard for OpenAI embeddings
NormalizetrueEnsures consistent similarity comparisons
Enable CachingtrueAvoid re-embedding unchanged documents
Enable Vector StoretrueConnect directly to your vector database

PGVector Upsert

The PGVector Upsert component stores your embeddings in PostgreSQL with the pgvector extension:

{
  "database_url": "postgresql://user:pass@host:5432/your_db",
  "table_name": "knowledge_vectors",
  "distance_metric": "cosine"
}

What it does:

– Auto-creates the table and index if they don’t exist

– Upserts documents — updates existing ones, inserts new ones

– Returns confirmation with stored document IDs

💡 Business Value Tip

> Incremental ingestion means zero downtime. The PGVector Upsert component handles updates seamlessly. Changed a policy document? Re-ingest it and the old version is replaced — no pipeline rebuild required. Your vector store stays current without any manual work.

Step 2: Build the Query Pipeline

This is the heart of your RAG system — the pipeline that accepts a user question, retrieves relevant context, and generates a grounded answer.

The Complete Pipeline

Text Input → Embedder → PGVector Search → Knowledge Base → LLM → Response Output

Let’s configure each component.

Text Input

PropertyValuePurpose
LabelUser QuestionClear identification
PlaceholderAsk a question…Guide your users
Max Length500Prevent overly verbose questions
Trim WhitespacetrueClean input automatically

Embedder (Query)

This is a second Embedder instance — same model as ingestion, but now converting the user’s question into a vector for search:

PropertyValueWhy
ProvideropenaiMust match the ingestion embedder
Modeltext-embedding-3-smallMust match the ingestion embedder
Dimensions1536Must match the ingestion embedder

> ⚠️ Critical: The query Embedder must use the same model and dimensions as the ingestion Embedder. Mismatched models produce incompatible vectors, and search quality will drop dramatically.

PGVector Search

The PGVector Search component performs similarity search — finding the most relevant document chunks:

{
  "database_url": "postgresql://user:pass@host:5432/your_db",
  "table_name": "knowledge_vectors",
  "distance_metric": "cosine"
}

Key settings:

PropertyValueWhy
Top K5Retrieve the 5 most relevant passages
Similarity Threshold0.7Only return results above 70% relevance
Include MetadatatruePreserve source information for citations
Hybrid Searchtrue (optional)Combine semantic + keyword search

Knowledge Base (Search)

Connect the PGVector Search results into the Knowledge Base component for an additional FAQ matching layer:

PropertyValue
Operationsearch_faq
Max Results5
Auto Categorizetrue

This dual retrieval approach — vector similarity plus structured FAQ lookup — gives you the best of both worlds: deep semantic understanding and precise FAQ matching.

💡 Business Value Tip

> Dual retrieval catches what single methods miss. Vector search finds semantically relevant passages even when keywords differ. Knowledge Base search finds exact FAQ matches with high confidence. Together, they ensure your RAG pipeline always has the best possible context — leading to more accurate, complete answers.

Step 3: Configure the LLM for Grounded Answers

The LLM component is where retrieval meets generation. It takes the retrieved context and the user’s question, then produces a coherent, factual answer.

LLM Configuration for RAG

PropertyRecommended ValueWhy
Primary ProvideropenaiIndustry-leading quality
Primary Modelgpt-4Best accuracy for RAG answers
Temperature0.1Low temperature = factual, consistent answers
Max Tokens1000Sufficient for detailed answers
Enable RAGtrueActivates built-in RAG support
Max Context Documents5Include top 5 retrieved passages
Enable CitationstrueAttribute answers to source documents
Citation FormatmarkdownClean, readable source attribution
Verify Claims in ContexttrueReduce hallucinations
Check FactualitytrueExtra safety layer

System Prompt (The Most Important Setting for RAG)

The system prompt is what prevents the LLM from making things up.

Here’s an optimized RAG system prompt:

You are a knowledgeable assistant that answers questions based solely on the provided context.

 Follow these rules:

1. Answer ONLY using information from the provided context

2. If the context doesn't contain relevant information, say: "I don't have enough information to answer this question."

3. Always cite the source document when providing facts

4. Be concise but thorough

5. Do not speculate or add information not present in the context

User Prompt Template

Context:
{context}

Question: {question}

Please provide a detailed answer based only on the context above. 
Include source citations where applicable.

💡 Business Value Tip

> Low temperature + citations = trustworthy AI. Setting the LLM temperature to 0.1 ensures consistent, factual answers — not creative fiction. Enabling citations means every answer links back to your source documents, building user trust and enabling verification. Your customers and employees can verify what the AI says, rather than blindly trusting it.

Step 4: Wire the Pipeline on the Canvas

On the Just247Pipes canvas, connect components by drawing edges between their ports:

1. Text Input (output: question) → Embedder (input: text)

2. Embedder (output: embeddings) → PGVector Search (input: query_embedding)

3. PGVector Search (output: results) → Knowledge Base (input: question)

4. Knowledge Base (output: matched_faq) → LLM (input: context)

5. Text Input (output: question) → LLM (input: prompt) (passthrough)

6. LLM (output: response) → Response Output

The result is a clear, auditable pipeline that anyone on your team can understand and modify — not a black-box script that only one developer can maintain.

Step 5: Test and Deploy

Testing Your Pipeline

Just247Pipes provides built-in testing tools:

1. Click “Run” on the canvas to execute the pipeline with sample input

2. Inspect each component’s output — verify embeddings, search results, and LLM responses

3. Test with edge cases — questions outside your knowledge base, ambiguous queries, multi-part questions

4. Monitor execution logs — track data flow, latencies, and errors at every stage

Going Live

Once validated, deploy with confidence:

Schedule ingestion pipelines to re-index documents on a cron schedule (e.g., nightly)

Expose the query pipeline via REST API for integration into your website, app, or chatbot

Monitor executions through the built-in dashboard — track success rates, latencies, and costs

Real-World Examples

Example 1: Customer Support AI Assistant

A SaaS company adds RAG on top of their Knowledge Base (built in the previous guide):

Ingestion Pipeline:
  Help Center Articles → File Input → Processor → Embedder → PGVector Upsert

Query Pipeline:
  Customer Question → Text Input → Embedder → PGVector Search
    → Knowledge Base (search_faq) → LLM → Natural Language Answer with Citations

Result: 65% of tickets resolved automatically. Response time drops from hours to seconds. Every answer includes a citation linking to the source document.

Example 2: Internal Knowledge Assistant

An enterprise connects its internal wiki, HR policies, and compliance documents:

Ingestion Pipeline:
  Internal Wiki + Policies → File Input → Processor → Embedder → PGVector Upsert

Query Pipeline:
  Employee Question → Text Input → Embedder → PGVector Search
    → Knowledge Base (search_faq) → LLM → Answer with Source Citations

Result: Employees get instant, sourced answers to policy questions. HR saves 20+ hours per week. New hires onboard faster with 24/7 access to organizational knowledge.

Example 3: Smart Escalation

A financial services firm adds confidence-based escalation:

Query Pipeline:
  Customer Question → Text Input → Embedder → PGVector Search
    → Knowledge Base → LLM → Response

  If LLM confidence < threshold:
    → Escalation → Email/Slack Notification (to human agent)

Result: 80% of inquiries resolved automatically. Complex or low-confidence questions seamlessly escalated to human experts. Full audit trail maintained for compliance.

Advanced: Taking Your RAG Pipeline Further

Once your core pipeline is running, Just247Pipes makes it easy to add sophisticated capabilities.

Hybrid Search

Combine semantic (vector) search with keyword matching for the best retrieval quality:

PGVector Search: Set `hybrid_search: true`

Embedder: Enable `Hybrid Embeddings` with configurable `Sparse Weight` and `Dense Weight`

Hybrid search catches both meaning-based matches (“How do I reset my password?”) and exact-term matches (error codes, product names, SKU numbers).

Conversation Memory

The LLM component supports multi-turn conversations:

SettingValuePurpose
Include Conversation HistorytrueEnable follow-up questions
Max History Messages10Remember last 10 exchanges
Enable History SummarizationtrueCompress long conversations

This means your RAG assistant handles follow-ups naturally — “Tell me more about that” or “What about enterprise pricing?”

Intent Detection + Smart Routing

Add an Intent Detection component before your retrieval to classify questions:

– Route billing questions to billing-specific knowledge

– Route technical questions to technical documentation

– Route account questions to account management flows

Escalation

Add an Escalation component for confidence-based routing:

Confidence LevelAction
High (> 0.85)Deliver answer automatically
Medium (0.6–0.85)Deliver with a disclaimer
Low (< 0.6)Escalate to a human agent via Slack, Email, or Telegram

Cost Optimization

Just247Pipes includes built-in cost controls:

Enable Cost Tracking — Monitor spending per user, model, and intent

Daily/Monthly Cost Limits — Set budgets and alerts

Model Routing — Use cheaper models (GPT-3.5) for simple queries, premium models (GPT-4) for complex ones

Semantic Caching — Avoid re-processing identical or similar questions

Why Just247Pipes for RAG? Business Value, Simplicity, and Flexibility

🎯 Business Value

MetricKnowledge Base OnlyKnowledge Base + RAG
Question coverageOnly pre-written FAQsAny question answerable from your documents
Answer qualityBest-matching FAQ entrySynthesized, contextual answer with citations
Hallucination riskN/A (returns stored answers)Minimized by grounding + verification
User trustModerateHigh (every claim linked to a source)
Support cost per ticket$2–$5 (search-based)$0.50–$2.00 (AI-generated)

🧩 Simplicity

Visual pipeline designer — Drag, connect, configure. No ML expertise required.

Template system — Start from pre-built RAG templates and customize in minutes

Built-in RAG support — The LLM component has RAG mode, citations, and factuality checking built in

One-click deployment — No DevOps gymnastics

🔄 Flexibility

Swap LLM providers — OpenAI, Anthropic, local models — change one dropdown, pipeline stays the same

Swap vector stores — pgvector, Pinecone, Weaviate, Milvus, Qdrant — pick what fits your infrastructure

Add components freely — Need intent detection? Drop it in. Need escalation? Add it. Want notifications? Add Slack or Email output

Scale deployment — Docker, AWS, GCP, Azure, or on-premises behind your firewall

Technology Glossary

TermPlain-English Definition
RAG (Retrieval-Augmented Generation)A technique where an AI first *retrieves* relevant documents from a knowledge base and vector store, then *generates* an answer based on those documents — instead of relying solely on its training data. This makes answers grounded, factual, and specific to your organization.
EmbeddingA mathematical representation of text as a list of numbers (a vector). Texts with similar *meaning* get similar embeddings, enabling computers to find semantically related content — even if the exact words differ.
EmbedderThe component that converts raw text into embeddings. Just247Pipes supports OpenAI, HuggingFace, Cohere, Azure, and local models.
Vector Store / Vector DatabaseA specialized database that stores embeddings and supports fast similarity search. Examples: pgvector, Pinecone, Weaviate, Milvus, Qdrant. Think of it as a search engine that finds content by *meaning*, not just by keywords.
pgvectorA PostgreSQL extension that adds vector similarity search to the world’s most popular open-source database. Store and query embeddings without adding a separate database to your stack.
Cosine SimilarityA mathematical measure of how similar two vectors are, ranging from -1 (opposite) to 1 (identical). In RAG, it ranks how closely a document matches a question. A score of 0.7+ typically indicates strong relevance.
Top-KThe number of most-relevant results to retrieve. Top-K = 5 means “give me the 5 best-matching documents.” Higher K = more context but more noise; lower K = less context but higher precision.
Semantic SearchSearch that understands *meaning* rather than just matching keywords. “How do I reset my password?” and “password recovery process” would match in semantic search, even though they share few words.
Hybrid SearchCombining semantic (meaning-based) search with keyword (exact match) search. Catches both conceptual matches and exact terms like error codes or product names.
ChunkingBreaking large documents into smaller pieces (chunks) so they can be individually embedded and retrieved. Without chunking, a long document would produce a single embedding that dilutes specific topics.
LLM (Large Language Model)An AI model (like GPT-4) that generates human-like text. In RAG, the LLM receives the user’s question plus the retrieved context and generates a coherent, grounded answer.
System PromptInstructions given to the LLM that define its behavior. In RAG, the system prompt typically says “Answer only using the provided context” to prevent the AI from making things up.
TemperatureA setting that controls how creative (high) vs. deterministic (low) the LLM’s responses are. For RAG, low temperature (0.1–0.3) ensures factual, consistent answers.
HallucinationWhen an AI generates confident-sounding information that isn’t based on facts. RAG dramatically reduces hallucinations by grounding answers in retrieved documents.
CitationA reference to the source document from which an answer was derived. Citations let users verify AI responses and build trust.
UpsertA database operation that either inserts a new record or updates an existing one. In RAG, “upserting” embeddings means your vector store stays current without manual cleanup.
Similarity ThresholdA minimum relevance score (0–1) below which results are discarded. Setting it to 0.7 means “only show results that are at least 70% relevant.”
Semantic CachingStoring previous AI responses and reusing them when a *semantically similar* question comes in — even if worded differently. Reduces costs and latency significantly.
EscalationRouting a query to a human agent when the AI can’t confidently handle it. Essential for maintaining quality and trust in production AI systems.

Quick-Start Checklist

– [ ] Complete the Knowledge Base guide — Set up and populate your Knowledge Base first (see previous guide)

– [ ] Choose your embedding model — OpenAI `text-embedding-3-small` is a great starting point

– [ ] Build the ingestion pipeline — File Input → Processor → Embedder → PGVector Upsert

– [ ] Run ingestion — Index your documents into the vector store

– [ ] Build the query pipeline — Text Input → Embedder → PGVector Search → Knowledge Base → LLM → Response

– [ ] Configure the LLM for RAG — Enable RAG mode, citations, factuality checking; set temperature to 0.1

– [ ] Write a strong system prompt — Instruct the LLM to answer only from context

– [ ] Test with real questions — Try queries your users would actually ask, including edge cases

– [ ] Deploy — Expose via API, integrate into your application, set up monitoring

– [ ] Iterate — Add hybrid search, conversation memory, intent detection, and escalation as needed

Conclusion: From Knowledge Base to Intelligent Assistant

In the previous guide, you built a Knowledge Base that delivers instant, accurate answers from your curated FAQs. Now, with RAG, you’ve transformed that Knowledge Base into an intelligent assistant that can:

Answer any question — not just those with pre-written FAQs

Synthesize information — draw from multiple sources to build complete answers

Cite sources — every claim linked to a document your team can verify

Prevent hallucinations — grounded responses with factuality checking

Scale effortlessly — handle thousands of concurrent queries without adding staff

All through Just247Pipes’ visual pipeline designer. No machine learning team required. No months of development. Just drag, connect, configure, and deploy.

Your organization’s knowledge is its most valuable asset. RAG makes that knowledge universally accessible — through natural conversation, grounded in your own data, with citations your users can trust.

Just247Pipes — Transform your Knowledge Base into an AI-powered assistant with visual pipeline design.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *