- Published on
Building a Research Agent — Web Search, Summarization, and Report Generation
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Research agents gather information from multiple sources, synthesize findings, and generate comprehensive reports. Unlike simple web search, agents follow up on interesting findings, verify facts across sources, and detect contradictions. This post covers building research agents that produce trustworthy, well-cited reports.
- Search Tool Integration
- Source Credibility Scoring
- Deduplication and Clustering
- Progressive Deepening
- Citation Tracking
- Fact Extraction
- Report Structuring
- Human Review Checkpoint
- Checklist
- Conclusion
Search Tool Integration
Integrate multiple search providers for redundancy and better coverage.
interface SearchResult {
url: string;
title: string;
snippet: string;
source: string;
retrievedAt: number;
}
interface SearchProvider {
name: string;
search: (query: string, limit?: number) => Promise<SearchResult[]>;
}
class SearchOrchestrator {
private providers: Map<string, SearchProvider> = new Map();
registerProvider(provider: SearchProvider): void {
this.providers.set(provider.name, provider);
}
async search(query: string, topK: number = 5): Promise<SearchResult[]> {
// Search with multiple providers in parallel
const allResults = await Promise.all(
Array.from(this.providers.values()).map((provider) =>
provider
.search(query, topK)
.catch((error) => {
console.warn(`Provider ${provider.name} failed: ${error}`);
return [];
}),
),
);
// Merge and deduplicate
const merged = allResults.flat();
return this.deduplicateResults(merged).slice(0, topK);
}
private deduplicateResults(results: SearchResult[]): SearchResult[] {
const seen = new Set<string>();
const unique: SearchResult[] = [];
for (const result of results) {
const key = result.url.toLowerCase();
if (!seen.has(key)) {
seen.add(key);
unique.push(result);
}
}
return unique.sort((a, b) => b.retrievedAt - a.retrievedAt);
}
}
// Example providers
const tavilyProvider: SearchProvider = {
name: 'tavily',
async search(query: string, limit: number = 10): Promise<SearchResult[]> {
const response = await fetch('https://api.tavily.com/search', {
method: 'POST',
body: JSON.stringify({
api_key: process.env.TAVILY_API_KEY,
query,
max_results: limit,
}),
});
const data = (await response.json()) as any;
return data.results.map((r: any) => ({
url: r.url,
title: r.title,
snippet: r.snippet,
source: 'tavily',
retrievedAt: Date.now(),
}));
},
};
const googleProvider: SearchProvider = {
name: 'google',
async search(query: string, limit: number = 10): Promise<SearchResult[]> {
const response = await fetch(
`https://www.googleapis.com/customsearch/v1?q=${encodeURIComponent(query)}&key=${process.env.GOOGLE_API_KEY}&cx=${process.env.GOOGLE_CX}`,
);
const data = (await response.json()) as any;
return (data.items || []).slice(0, limit).map((item: any) => ({
url: item.link,
title: item.title,
snippet: item.snippet,
source: 'google',
retrievedAt: Date.now(),
}));
},
};
Multiple providers reduce bias and improve coverage.
Source Credibility Scoring
Not all sources are equally trustworthy.
interface SourceScore {
url: string;
credibility: number; // 0-1
reasons: string[];
}
class SourceScorer {
async scoreCredibility(url: string, snippet: string): Promise<SourceScore> {
let score = 0.5; // Start neutral
const reasons: string[] = [];
// Check 1: Domain reputation
const domain = new URL(url).hostname;
if (this.isTrustedDomain(domain)) {
score += 0.2;
reasons.push('From trusted domain');
} else if (this.isSuspiciousDomain(domain)) {
score -= 0.3;
reasons.push('From suspicious domain');
}
// Check 2: Academic/institutional sources
if (domain.includes('.edu') || domain.includes('.gov') || domain.includes('.org')) {
score += 0.15;
reasons.push('Educational or institutional source');
}
// Check 3: Content analysis
if (this.hasReferences(snippet)) {
score += 0.1;
reasons.push('Contains citations/references');
}
if (this.hasAuthorship(snippet)) {
score += 0.05;
reasons.push('Has author attribution');
}
// Check 4: Recency
if (this.isRecent(url)) {
score += 0.1;
reasons.push('Recently updated');
}
// Check 5: Bias detection
if (this.hasObviousBias(snippet)) {
score -= 0.15;
reasons.push('Possible bias detected');
}
return {
url,
credibility: Math.max(0, Math.min(1, score)),
reasons,
};
}
private isTrustedDomain(domain: string): boolean {
const trusted = [
'wikipedia.org',
'bbc.com',
'reuters.com',
'apnews.com',
'scholar.google.com',
'github.com',
];
return trusted.some((t) => domain.includes(t));
}
private isSuspiciousDomain(domain: string): boolean {
return domain.includes('scam') || domain.includes('fake') || domain.length > 50;
}
private hasReferences(snippet: string): boolean {
return /\[.*?\]|cite|reference|source/i.test(snippet);
}
private hasAuthorship(snippet: string): boolean {
return /by\s+\w+|author|written by/i.test(snippet);
}
private isRecent(url: string): boolean {
// Check if URL or domain suggests recent content
const yearMatch = url.match(/202[0-9]/);
return !!yearMatch;
}
private hasObviousBias(snippet: string): boolean {
const biasIndicators = ['opinion', 'believes', 'obviously', 'clearly', 'naturally'];
return biasIndicators.some((indicator) =>
new RegExp(`\\b${indicator}\\b`, 'i').test(snippet),
);
}
}
Score sources to prioritize reliable information.
Deduplication and Clustering
Similar results from different sources should be clustered.
interface ResultCluster {
topic: string;
results: SearchResult[];
consensus: string;
contradictions: string[];
}
class ResultDeduplicate {
async clusterResults(results: SearchResult[]): Promise<ResultCluster[]> {
const clusters: Map<string, SearchResult[]> = new Map();
for (const result of results) {
// Extract topic from result
const topic = await this.extractTopic(result.title, result.snippet);
if (!clusters.has(topic)) {
clusters.set(topic, []);
}
clusters.get(topic)!.push(result);
}
// Convert clusters to structured format
const clustered: ResultCluster[] = [];
for (const [topic, clusterResults] of clusters.entries()) {
const consensus = await this.findConsensus(clusterResults);
const contradictions = await this.findContradictions(clusterResults);
clustered.push({
topic,
results: clusterResults,
consensus,
contradictions,
});
}
return clustered;
}
private async extractTopic(title: string, snippet: string): Promise<string> {
// Use LLM to extract main topic
const prompt = `Extract the main topic (2-3 words) from this content:
Title: ${title}
Snippet: ${snippet}
Respond with just the topic, nothing else.`;
return this.llmCall(prompt);
}
private async findConsensus(results: SearchResult[]): Promise<string> {
// Find common theme across results
const snippets = results.map((r) => r.snippet).join('\n\n');
const prompt = `What's the consensus across these sources?
${snippets}
Summarize the common points in 1-2 sentences.`;
return this.llmCall(prompt);
}
private async findContradictions(results: SearchResult[]): Promise<string[]> {
if (results.length < 2) {
return [];
}
const snippets = results.map((r) => `Source: ${r.url}\n${r.snippet}`).join('\n\n---\n\n');
const prompt = `Do these sources contradict each other? List any contradictions found.
${snippets}
Return JSON: { "contradictions": ["contradiction 1", "contradiction 2"] }`;
const response = await this.llmCall(prompt);
try {
const parsed = JSON.parse(response);
return parsed.contradictions;
} catch {
return [];
}
}
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Clustering reveals where sources agree and where they contradict.
Progressive Deepening
Follow up on interesting findings with deeper searches.
interface ResearchQuery {
query: string;
depth: number; // 1-5, how deep to go
maxQueries: number; // Stop after this many queries
}
class ProgressiveResearcher {
private queryCount: number = 0;
private findings: Map<string, SearchResult[]> = new Map();
async research(initialQuery: string, maxDepth: number = 3): Promise<Map<string, SearchResult[]>> {
const queue: string[] = [initialQuery];
this.queryCount = 0;
while (queue.length > 0 && this.queryCount < 20) {
const query = queue.shift()!;
this.queryCount++;
console.log(`Query ${this.queryCount}: ${query}`);
// Search
const results = await this.search(query);
this.findings.set(query, results);
// Extract follow-up questions
const followUps = await this.generateFollowUpQueries(query, results);
// Add to queue if we haven't hit max depth
if (this.queryCount < 10) {
queue.push(...followUps.slice(0, 2)); // Only top 2 follow-ups per query
}
}
return this.findings;
}
private async search(query: string): Promise<SearchResult[]> {
// Perform search
return [];
}
private async generateFollowUpQueries(
currentQuery: string,
results: SearchResult[],
): Promise<string[]> {
const prompt = `Based on these search results, what follow-up questions would deepen understanding?
Query: ${currentQuery}
Results:
${results.map((r) => `- ${r.title}: ${r.snippet}`).join('\n')}
Return JSON: { "followUpQueries": ["question 1", "question 2", "question 3"] }`;
const response = await this.llmCall(prompt);
try {
const parsed = JSON.parse(response);
return parsed.followUpQueries;
} catch {
return [];
}
}
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Progressive deepening explores a topic systematically instead of stopping at surface results.
Citation Tracking
Keep track of sources for proper attribution.
interface Citation {
url: string;
title: string;
source: string;
accessedAt: number;
quote?: string;
pageNumber?: number;
}
class CitationManager {
private citations: Citation[] = [];
addCitation(result: SearchResult, quote?: string): void {
this.citations.push({
url: result.url,
title: result.title,
source: result.source,
accessedAt: Date.now(),
quote,
});
}
generateBibliography(): string {
// Sort by URL
const sorted = this.citations.sort((a, b) => a.url.localeCompare(b.url));
return sorted
.map((c, i) => {
const title = c.title || new URL(c.url).hostname;
return `[${i + 1}] ${title}. Retrieved from ${c.url} on ${new Date(c.accessedAt).toLocaleDateString()}`;
})
.join('\n');
}
generateInlineReferences(text: string): string {
// Replace fact mentions with citation markers
let result = text;
for (const citation of this.citations) {
if (citation.quote) {
const pattern = new RegExp(citation.quote, 'g');
const index = this.citations.indexOf(citation) + 1;
result = result.replace(pattern, `${citation.quote} [${index}]`);
}
}
return result;
}
}
Proper citations enable fact-checking and maintain transparency.
Fact Extraction
Extract factual claims from search results.
interface ExtractedFact {
claim: string;
source: string;
credibility: number;
evidence: string;
}
class FactExtractor {
async extractFacts(results: SearchResult[], topic: string): Promise<ExtractedFact[]> {
const facts: ExtractedFact[] = [];
for (const result of results) {
const prompt = `Extract factual claims from this source about "${topic}".
Source: ${result.title}
Content: ${result.snippet}
Return JSON: {
"facts": [
{ "claim": "...", "evidence": "..." }
]
}`;
const response = await this.llmCall(prompt);
try {
const parsed = JSON.parse(response);
for (const fact of parsed.facts) {
facts.push({
claim: fact.claim,
source: result.url,
credibility: 0.7, // Would score based on source
evidence: fact.evidence,
});
}
} catch {
// Skip malformed responses
}
}
return facts;
}
async deduplicateFacts(facts: ExtractedFact[]): Promise<ExtractedFact[]> {
// Group similar facts
const groups: Map<string, ExtractedFact[]> = new Map();
for (const fact of facts) {
const normalized = this.normalizeClaim(fact.claim);
if (!groups.has(normalized)) {
groups.set(normalized, []);
}
groups.get(normalized)!.push(fact);
}
// Return deduplicated facts with best credibility
const deduplicated: ExtractedFact[] = [];
for (const group of groups.values()) {
const best = group.reduce((a, b) => (a.credibility > b.credibility ? a : b));
best.credibility = Math.min(1, best.credibility + (group.length - 1) * 0.05); // Boost for consensus
deduplicated.push(best);
}
return deduplicated;
}
private normalizeClaim(claim: string): string {
// Remove common variations to match similar claims
return claim.toLowerCase().replace(/[^\w\s]/g, '').trim();
}
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Fact extraction and deduplication surfaces the most important findings.
Report Structuring
Organize findings into a coherent report.
interface ResearchReport {
title: string;
executive_summary: string;
sections: Array<{
heading: string;
content: string;
citations: Citation[];
}>;
contradictions: Array<{
claim: string;
sources: string[];
}>;
bibliography: string;
methodology: string;
}
class ReportGenerator {
async generate(
topic: string,
findings: Map<string, SearchResult[]>,
facts: ExtractedFact[],
): Promise<ResearchReport> {
const sections = await this.structureSections(topic, facts);
const contradictions = await this.identifyContradictions(facts);
const summary = await this.generateSummary(topic, facts);
return {
title: `Research Report: ${topic}`,
executive_summary: summary,
sections,
contradictions,
bibliography: 'Generated bibliography',
methodology:
'Data gathered from multiple web sources. Facts extracted and deduplicated. Sources scored for credibility.',
};
}
private async structureSections(
topic: string,
facts: ExtractedFact[],
): Promise<ResearchReport['sections']> {
// Group facts into logical sections
const sections: ResearchReport['sections'] = [];
const grouped = this.groupFactsByTheme(facts);
for (const [theme, themeFacts] of grouped) {
const content = themeFacts.map((f) => `- ${f.claim}`).join('\n');
sections.push({
heading: theme,
content,
citations: [], // Would map facts to citations
});
}
return sections;
}
private groupFactsByTheme(
facts: ExtractedFact[],
): Map<string, ExtractedFact[]> {
// Use LLM to categorize facts into themes
const groups: Map<string, ExtractedFact[]> = new Map();
for (const fact of facts) {
// Simple: group by first word
const theme = fact.claim.split(' ')[0];
if (!groups.has(theme)) {
groups.set(theme, []);
}
groups.get(theme)!.push(fact);
}
return groups;
}
private async generateSummary(topic: string, facts: ExtractedFact[]): Promise<string> {
const factSummary = facts.map((f) => f.claim).join(', ');
const prompt = `Write a 2-3 sentence executive summary for this research topic.
Topic: ${topic}
Key findings: ${factSummary}`;
return this.llmCall(prompt);
}
private async identifyContradictions(
facts: ExtractedFact[],
): Promise<ResearchReport['contradictions']> {
// Find facts that contradict each other
return [];
}
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Well-structured reports are easy to read and fact-check.
Human Review Checkpoint
Before finalizing, have humans review key findings.
interface ReviewRequest {
reportId: string;
topic: string;
contradictions: string[];
unusualFindings: string[];
facts: ExtractedFact[];
}
class HumanReviewCheckpoint {
async requestReview(report: ResearchReport): Promise<void> {
const request: ReviewRequest = {
reportId: `report-${Date.now()}`,
topic: report.title,
contradictions: report.contradictions.map((c) => c.claim),
unusualFindings: await this.identifyUnusualFindings(report),
facts: [],
};
console.log(`Report ready for review: ${request.reportId}`);
console.log(`Contradictions to verify: ${request.contradictions.join(', ')}`);
console.log(`Unusual findings: ${request.unusualFindings.join(', ')}`);
// In production: send to review queue
}
private async identifyUnusualFindings(report: ResearchReport): Promise<string[]> {
// Findings that differ significantly from common knowledge
return [];
}
}
Human review catches errors and validates unusual claims.
Checklist
- Search: integrate multiple providers for coverage
- Scoring: rank sources by credibility
- Deduplication: cluster similar results
- Deepening: follow up on interesting findings
- Citations: track all sources for attribution
- Facts: extract and deduplicate key claims
- Structure: organize findings into sections
- Review: human verification of unusual claims
Conclusion
Research agents that systematically search, verify, and synthesize information produce better reports than single searches. Use multiple sources, score credibility, follow up on findings, and maintain proper citations. Humans should verify unusual claims and contradictions. The result is research you can trust.