CSV File Support¶
NeuroLink provides seamless CSV file support as a multimodal input type - attach CSV files directly to your AI prompts for data analysis, insights, and processing.
Overview¶
CSV support in NeuroLink works just like image support - it's a multimodal input that gets automatically processed and injected into your prompts. The system:
- Auto-detects CSV files using FileDetector (magic bytes, MIME types, extensions, content heuristics)
- Parses CSV data using streaming parser for memory efficiency
- Formats CSV content into LLM-optimized text (markdown/json)
- Injects formatted CSV data into your prompt text
- Works with ALL AI providers (not limited to vision models)
Quick Start¶
SDK Usage¶
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
// Basic CSV analysis
const result = await neurolink.generate({
input: {
text: "What are the key trends in this sales data?",
csvFiles: ["sales-2024.csv"],
},
});
// Multiple CSV files
const comparison = await neurolink.generate({
input: {
text: "Compare Q1 vs Q2 performance and identify growth areas",
csvFiles: ["q1-sales.csv", "q2-sales.csv"],
},
});
// Auto-detect file types (mix CSV and images)
const multimodal = await neurolink.generate({
input: {
text: "Analyze this data and compare with the chart",
files: ["data.csv", "chart.png"], // Auto-detects which is CSV vs image
},
});
// Customize CSV processing
const custom = await neurolink.generate({
input: {
text: "Summarize the top 100 customers by revenue",
csvFiles: ["customers.csv"],
},
csvOptions: {
maxRows: 100, // Limit to first 100 rows
formatStyle: "markdown", // Use markdown table format
includeHeaders: true, // Include CSV headers
},
});
CLI Usage¶
# Attach CSV files to your prompt
neurolink generate "Analyze this sales data" --csv sales.csv
# Multiple CSV files
neurolink generate "Compare these datasets" --csv q1.csv --csv q2.csv
# Auto-detect file types
neurolink generate "Analyze data and image" --file data.csv --file chart.png
# Customize CSV processing
neurolink generate "Summarize trends" \
--csv large-dataset.csv \
--csv-max-rows 500 \
--csv-format json
# Stream mode also supports CSV
neurolink stream "Explain this data in detail" --csv data.csv
# Batch processing with CSV
echo "Summarize sales data" > prompts.txt
echo "Find top performers" >> prompts.txt
neurolink batch prompts.txt --csv sales.csv
API Reference¶
GenerateOptions¶
type GenerateOptions = {
input: {
text: string;
images?: Array<Buffer | string>;
csvFiles?: Array<Buffer | string>; // Explicit CSV files
files?: Array<Buffer | string>; // Auto-detect file types
};
csvOptions?: {
maxRows?: number; // Default: 1000
formatStyle?: "raw" | "markdown" | "json"; // Default: "raw"
includeHeaders?: boolean; // Default: true
};
// ... other options
};
CSV Input Types¶
CSV files can be provided as:
- File paths:
"./data.csv"
or"/absolute/path/data.csv"
- URLs:
"https://example.com/data.csv"
- Buffers:
Buffer.from("name,age\nAlice,30")
- Data URIs:
"data:text/csv;base64,..."
// File path
await neurolink.generate({
input: {
text: "Analyze this",
csvFiles: ["./data.csv"],
},
});
// URL
await neurolink.generate({
input: {
text: "Analyze this",
csvFiles: ["https://example.com/data.csv"],
},
});
// Buffer
const csvBuffer = Buffer.from("name,age\nAlice,30\nBob,25");
await neurolink.generate({
input: {
text: "Analyze this",
csvFiles: [csvBuffer],
},
});
CSV Processing Options¶
maxRows¶
Limit the number of rows processed (default: 1000). Useful for large datasets.
formatStyle¶
Control how CSV data is formatted for the LLM:
raw
(default, RECOMMENDED): Original CSV format with proper escaping- Best for large files and minimal token usage
- Preserves original structure
- Handles commas, quotes, newlines correctly
-
File size stays minimal (63KB stays 63KB, not 199KB)
-
json
: JSON array format - Best for structured data processing
- Easy to parse programmatically
-
Higher token usage (can expand 3x for large files)
-
markdown
: Markdown table format - Best for small datasets (<100 rows)
- More readable for humans
- Takes most tokens
// Raw CSV (recommended for large files)
csvOptions: {
formatStyle: "raw",
}
// Output: name,age\nAlice,30\nBob,25
// JSON array
csvOptions: {
formatStyle: "json",
}
// Output: [{"name":"Alice","age":30},{"name":"Bob","age":25}]
// Markdown table
csvOptions: {
formatStyle: "markdown",
}
// Output: | name | age |
// | ---- | --- |
// | Alice | 30 |
includeHeaders¶
Include CSV headers in output (default: true).
File Detection System¶
NeuroLink uses a multi-strategy detection system with confidence scores:
Detection Strategies (in priority order)¶
- Magic Bytes (95% confidence)
- Detects file type from binary headers
- Works for images (PNG, JPEG, GIF, WebP)
-
PDFs and binary formats
-
MIME Type (85% confidence)
- Uses HTTP Content-Type headers for URLs
-
Detects
text/csv
,image/*
, etc. -
Extension (70% confidence)
- File extension-based detection
-
Supports:
.csv
,.tsv
,.jpg
,.png
, etc. -
Content Heuristics (75% confidence)
- Analyzes file content patterns
- Detects CSV by checking consistent comma-separated columns
The system stops at the first strategy with 80%+ confidence.
// Example: FileDetector workflow
// 1. Check magic bytes -> Not binary (0% confidence)
// 2. Check MIME type (if URL) -> text/csv (85% confidence) ✓ STOP
// Result: Detected as CSV with 85% confidence
How It Works¶
Internal Processing Flow¶
// When you call generate() with CSV files:
await neurolink.generate({
input: {
text: "Analyze this data",
csvFiles: ["data.csv"],
},
});
// Internal flow:
// 1. messageBuilder.ts detects csvFiles array
// 2. Calls FileDetector.detectAndProcess("data.csv")
// 3. FileDetector runs detection strategies
// 4. Loads file content (from path/URL/buffer)
// 5. Routes to CSVProcessor.process(buffer)
// 6. CSV parsed using streaming csv-parser library
// 7. Formatted to LLM-optimized text (raw/markdown/json)
// 8. Appends to prompt text:
// "Analyze this data
//
// ## CSV Data from "data.csv":
// ```csv
// name,age,city
// Alice,30,New York
// Bob,25,London
// ```"
// 9. Sends to AI provider
Memory Efficiency¶
CSV files are parsed using streaming for memory efficiency:
// CSVProcessor uses Readable streams
Readable.from([csvString])
.pipe(csvParser())
.on("data", (row) => {
if (count < maxRows) rows.push(row);
});
Large CSV files are handled efficiently:
- Streaming parser: Processes line-by-line
- Row limit: Configurable
maxRows
(default: 1000) - Memory bounded: Only holds limited rows in memory
Examples¶
Data Analysis¶
const result = await neurolink.generate({
input: {
text: `Analyze this customer data and provide:
1. Total customers
2. Average age
3. Top 5 cities by customer count
4. Any notable patterns or insights`,
csvFiles: ["customers.csv"],
},
});
Data Comparison¶
const result = await neurolink.generate({
input: {
text: "Compare Q1 vs Q2 sales data. What changed? Which products improved?",
csvFiles: ["q1-sales.csv", "q2-sales.csv"],
},
});
Data Cleaning¶
const result = await neurolink.generate({
input: {
text: `Review this data for:
- Missing values
- Duplicate entries
- Data quality issues
- Suggested corrections`,
csvFiles: ["raw-data.csv"],
},
csvOptions: {
maxRows: 100,
formatStyle: "markdown",
},
});
Schema Generation¶
const result = await neurolink.generate({
input: {
text: "Generate a JSON schema for this CSV data with appropriate types and constraints",
csvFiles: ["sample-data.csv"],
},
csvOptions: {
maxRows: 50,
formatStyle: "json",
},
});
Multimodal Analysis¶
const result = await neurolink.generate({
input: {
text: "Compare the sales chart with the actual CSV data. Do they match?",
files: ["sales-chart.png", "sales-data.csv"],
},
});
TypeScript Types¶
Only types are exposed from the package (not classes):
import type {
FileType,
FileInput,
FileSource,
FileDetectionResult,
FileProcessingResult,
CSVProcessorOptions,
FileDetectorOptions,
CSVContent,
} from "@juspay/neurolink";
// FileType union
type FileType = "csv" | "image" | "pdf" | "text" | "unknown";
// CSV processing options
type CSVProcessorOptions = {
maxRows?: number;
formatStyle?: "raw" | "markdown" | "json";
includeHeaders?: boolean;
};
// File detector options
type FileDetectorOptions = {
maxSize?: number;
timeout?: number;
allowedTypes?: FileType[];
};
Best Practices¶
1. Use Raw Format for Large Files¶
The raw
format is recommended for large files and best token efficiency:
csvOptions: {
formatStyle: "raw",
} // ✅ RECOMMENDED for large files
// Use json for smaller datasets or when you need structured parsing
csvOptions: {
formatStyle: "json",
} // ✅ Good for small-medium files
2. Limit Rows for Large Files¶
For large datasets, limit rows to avoid token limits:
3. Use Markdown for Small Datasets¶
For <100 rows, markdown tables are more readable:
4. Provide Clear Instructions¶
Give the AI clear instructions about what to analyze:
input: {
text: `Analyze this sales data and provide:
1. Total revenue
2. Top 5 products
3. Revenue trend
4. Recommendations`,
csvFiles: ["sales.csv"],
}
5. Use Auto-Detection¶
Let FileDetector handle mixed file types:
Limitations¶
- Max file size: 10MB by default (configurable)
- Max rows: 1000 by default (configurable)
- Encoding: UTF-8 recommended (auto-detected)
- Token limits: Large CSV files may exceed provider token limits
- Streaming: CSV content is parsed and formatted before sending (not streamed to LLM)
Error Handling¶
try {
const result = await neurolink.generate({
input: {
text: "Analyze this",
csvFiles: ["data.csv"],
},
});
} catch (error) {
if (error.message.includes("File too large")) {
// Handle file size error
} else if (error.message.includes("not allowed")) {
// Handle file type restriction
} else if (error.message.includes("CSV")) {
// Handle CSV parsing error
}
}
Related Features¶
- Image Support: Similar multimodal input for images
- File Detection: Auto-detect file types with confidence scores
- Memory Efficient: Streaming parser for large files
- Provider Agnostic: Works with all AI providers
- CLI Integration: Full CLI support with options
Summary¶
- CSV support is multimodal input (like images)
- Use
csvFiles
array orfiles
array (auto-detect) - Customize with
csvOptions
(maxRows, formatStyle, includeHeaders) - Works with ALL providers (not just vision models)
- Memory efficient streaming parser
- CLI support with
--csv
,--file
,--csv-max-rows
,--csv-format
- Only types exposed from package (not classes)