CSV File Support¶

NeuroLink provides seamless CSV file support as a multimodal input type - attach CSV files directly to your AI prompts for data analysis, insights, and processing.

Overview¶

CSV support in NeuroLink works just like image support - it's a multimodal input that gets automatically processed and injected into your prompts. The system:

Auto-detects CSV files using FileDetector (magic bytes, MIME types, extensions, content heuristics)
Parses CSV data using streaming parser for memory efficiency
Formats CSV content into LLM-optimized text (markdown/json)
Injects formatted CSV data into your prompt text
Works with ALL AI providers (not limited to vision models)

Quick Start¶

SDK Usage¶

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Basic CSV analysis
const result = await neurolink.generate({
  input: {
    text: "What are the key trends in this sales data?",
    csvFiles: ["sales-2024.csv"],
  },
});

// Multiple CSV files
const comparison = await neurolink.generate({
  input: {
    text: "Compare Q1 vs Q2 performance and identify growth areas",
    csvFiles: ["q1-sales.csv", "q2-sales.csv"],
  },
});

// Auto-detect file types (mix CSV and images)
const multimodal = await neurolink.generate({
  input: {
    text: "Analyze this data and compare with the chart",
    files: ["data.csv", "chart.png"], // Auto-detects which is CSV vs image
  },
});

// Customize CSV processing
const custom = await neurolink.generate({
  input: {
    text: "Summarize the top 100 customers by revenue",
    csvFiles: ["customers.csv"],
  },
  csvOptions: {
    maxRows: 100, // Limit to first 100 rows
    formatStyle: "markdown", // Use markdown table format
    includeHeaders: true, // Include CSV headers
  },
});

CLI Usage¶

# Attach CSV files to your prompt
neurolink generate "Analyze this sales data" --csv sales.csv

# Multiple CSV files
neurolink generate "Compare these datasets" --csv q1.csv --csv q2.csv

# Auto-detect file types
neurolink generate "Analyze data and image" --file data.csv --file chart.png

# Customize CSV processing
neurolink generate "Summarize trends" \
  --csv large-dataset.csv \
  --csv-max-rows 500 \
  --csv-format json

# Stream mode also supports CSV
neurolink stream "Explain this data in detail" --csv data.csv

# Batch processing with CSV
echo "Summarize sales data" > prompts.txt
echo "Find top performers" >> prompts.txt
neurolink batch prompts.txt --csv sales.csv

API Reference¶

GenerateOptions¶

type GenerateOptions = {
  input: {
    text: string;
    images?: Array<Buffer | string>;
    csvFiles?: Array<Buffer | string>; // Explicit CSV files
    files?: Array<Buffer | string>; // Auto-detect file types
  };

  csvOptions?: {
    maxRows?: number; // Default: 1000
    formatStyle?: "raw" | "markdown" | "json"; // Default: "raw"
    includeHeaders?: boolean; // Default: true
  };

  // ... other options
};

CSV Input Types¶

CSV files can be provided as:

File paths: "./data.csv" or "/absolute/path/data.csv"
URLs: "https://example.com/data.csv"
Buffers: Buffer.from("name,age\nAlice,30")
Data URIs: "data:text/csv;base64,..."

// File path
await neurolink.generate({
  input: {
    text: "Analyze this",
    csvFiles: ["./data.csv"],
  },
});

// URL
await neurolink.generate({
  input: {
    text: "Analyze this",
    csvFiles: ["https://example.com/data.csv"],
  },
});

// Buffer
const csvBuffer = Buffer.from("name,age\nAlice,30\nBob,25");
await neurolink.generate({
  input: {
    text: "Analyze this",
    csvFiles: [csvBuffer],
  },
});

CSV Processing Options¶

maxRows¶

Limit the number of rows processed (default: 1000). Useful for large datasets.

csvOptions: {
  maxRows: 100; // Only process first 100 rows
}

formatStyle¶

Control how CSV data is formatted for the LLM:

raw (default, RECOMMENDED): Original CSV format with proper escaping
Best for large files and minimal token usage
Preserves original structure
Handles commas, quotes, newlines correctly
File size stays minimal (63KB stays 63KB, not 199KB)
json: JSON array format
Best for structured data processing
Easy to parse programmatically
Higher token usage (can expand 3x for large files)
markdown: Markdown table format
Best for small datasets (<100 rows)
More readable for humans
Takes most tokens

// Raw CSV (recommended for large files)
csvOptions: {
  formatStyle: "raw",
}
// Output: name,age\nAlice,30\nBob,25

// JSON array
csvOptions: {
  formatStyle: "json",
}
// Output: [{"name":"Alice","age":30},{"name":"Bob","age":25}]

// Markdown table
csvOptions: {
  formatStyle: "markdown",
}
// Output: | name | age |
//         | ---- | --- |
//         | Alice | 30 |

includeHeaders¶

Include CSV headers in output (default: true).

csvOptions: {
  includeHeaders: false; // Skip headers
}

File Detection System¶

NeuroLink uses a multi-strategy detection system with confidence scores:

Detection Strategies (in priority order)¶

Magic Bytes (95% confidence)
Detects file type from binary headers
Works for images (PNG, JPEG, GIF, WebP)
PDFs and binary formats
MIME Type (85% confidence)
Uses HTTP Content-Type headers for URLs
Detects text/csv, image/*, etc.
Extension (70% confidence)
File extension-based detection
Supports: .csv, .tsv, .jpg, .png, etc.
Content Heuristics (75% confidence)
Analyzes file content patterns
Detects CSV by checking consistent comma-separated columns

The system stops at the first strategy with 80%+ confidence.

// Example: FileDetector workflow
// 1. Check magic bytes -> Not binary (0% confidence)
// 2. Check MIME type (if URL) -> text/csv (85% confidence) ✓ STOP
// Result: Detected as CSV with 85% confidence

How It Works¶

Internal Processing Flow¶

// When you call generate() with CSV files:
await neurolink.generate({
  input: {
    text: "Analyze this data",
    csvFiles: ["data.csv"],
  },
});

// Internal flow:
// 1. messageBuilder.ts detects csvFiles array
// 2. Calls FileDetector.detectAndProcess("data.csv")
// 3. FileDetector runs detection strategies
// 4. Loads file content (from path/URL/buffer)
// 5. Routes to CSVProcessor.process(buffer)
// 6. CSV parsed using streaming csv-parser library
// 7. Formatted to LLM-optimized text (raw/markdown/json)
// 8. Appends to prompt text:
//    "Analyze this data
//
//    ## CSV Data from "data.csv":
//    ```csv
//    name,age,city
//    Alice,30,New York
//    Bob,25,London
//    ```"
// 9. Sends to AI provider

Memory Efficiency¶

CSV files are parsed using streaming for memory efficiency:

// CSVProcessor uses Readable streams
Readable.from([csvString])
  .pipe(csvParser())
  .on("data", (row) => {
    if (count < maxRows) rows.push(row);
  });

Large CSV files are handled efficiently:

Streaming parser: Processes line-by-line
Row limit: Configurable maxRows (default: 1000)
Memory bounded: Only holds limited rows in memory

Examples¶

Data Analysis¶

const result = await neurolink.generate({
  input: {
    text: `Analyze this customer data and provide:
    1. Total customers
    2. Average age
    3. Top 5 cities by customer count
    4. Any notable patterns or insights`,
    csvFiles: ["customers.csv"],
  },
});

Data Comparison¶

const result = await neurolink.generate({
  input: {
    text: "Compare Q1 vs Q2 sales data. What changed? Which products improved?",
    csvFiles: ["q1-sales.csv", "q2-sales.csv"],
  },
});

Data Cleaning¶

const result = await neurolink.generate({
  input: {
    text: `Review this data for:
    - Missing values
    - Duplicate entries
    - Data quality issues
    - Suggested corrections`,
    csvFiles: ["raw-data.csv"],
  },
  csvOptions: {
    maxRows: 100,
    formatStyle: "markdown",
  },
});

Schema Generation¶

const result = await neurolink.generate({
  input: {
    text: "Generate a JSON schema for this CSV data with appropriate types and constraints",
    csvFiles: ["sample-data.csv"],
  },
  csvOptions: {
    maxRows: 50,
    formatStyle: "json",
  },
});

Multimodal Analysis¶

const result = await neurolink.generate({
  input: {
    text: "Compare the sales chart with the actual CSV data. Do they match?",
    files: ["sales-chart.png", "sales-data.csv"],
  },
});

TypeScript Types¶

Only types are exposed from the package (not classes):

import type {
  FileType,
  FileInput,
  FileSource,
  FileDetectionResult,
  FileProcessingResult,
  CSVProcessorOptions,
  FileDetectorOptions,
  CSVContent,
} from "@juspay/neurolink";

// FileType union
type FileType = "csv" | "image" | "pdf" | "text" | "unknown";

// CSV processing options
type CSVProcessorOptions = {
  maxRows?: number;
  formatStyle?: "raw" | "markdown" | "json";
  includeHeaders?: boolean;
};

// File detector options
type FileDetectorOptions = {
  maxSize?: number;
  timeout?: number;
  allowedTypes?: FileType[];
};

Best Practices¶

1. Use Raw Format for Large Files¶

The raw format is recommended for large files and best token efficiency:

csvOptions: {
  formatStyle: "raw",
} // ✅ RECOMMENDED for large files

// Use json for smaller datasets or when you need structured parsing
csvOptions: {
  formatStyle: "json",
} // ✅ Good for small-medium files

2. Limit Rows for Large Files¶

For large datasets, limit rows to avoid token limits:

csvOptions: {
  maxRows: 500,
} // Process first 500 rows

3. Use Markdown for Small Datasets¶

For <100 rows, markdown tables are more readable:

csvOptions: {
  maxRows: 50,
  formatStyle: "markdown"
}

4. Provide Clear Instructions¶

Give the AI clear instructions about what to analyze:

input: {
  text: `Analyze this sales data and provide:
  1. Total revenue
  2. Top 5 products
  3. Revenue trend
  4. Recommendations`,
  csvFiles: ["sales.csv"],
}

5. Use Auto-Detection¶

Let FileDetector handle mixed file types:

files: ["data.csv", "chart.png", "report.pdf"]; // Auto-detects each type

Limitations¶

Max file size: 10MB by default (configurable)
Max rows: 1000 by default (configurable)
Encoding: UTF-8 recommended (auto-detected)
Token limits: Large CSV files may exceed provider token limits
Streaming: CSV content is parsed and formatted before sending (not streamed to LLM)

Error Handling¶

try {
  const result = await neurolink.generate({
    input: {
      text: "Analyze this",
      csvFiles: ["data.csv"],
    },
  });
} catch (error) {
  if (error.message.includes("File too large")) {
    // Handle file size error
  } else if (error.message.includes("not allowed")) {
    // Handle file type restriction
  } else if (error.message.includes("CSV")) {
    // Handle CSV parsing error
  }
}

Image Support: Similar multimodal input for images
File Detection: Auto-detect file types with confidence scores
Memory Efficient: Streaming parser for large files
Provider Agnostic: Works with all AI providers
CLI Integration: Full CLI support with options

Summary¶

CSV support is multimodal input (like images)
Use csvFiles array or files array (auto-detect)
Customize with csvOptions (maxRows, formatStyle, includeHeaders)
Works with ALL providers (not just vision models)
Memory efficient streaming parser
CLI support with --csv, --file, --csv-max-rows, --csv-format
Only types exposed from package (not classes)