Build a Resume Database: Extract and Query Hundreds of Resumes as SQLite

Managing hundreds of resume PDFs is a recruiting nightmare. You can’t search across them, can’t filter by skills or experience, and comparing candidates means opening dozens of files manually. This tutorial shows you how to transform that pile of unstructured PDFs into a queryable SQLite database using PdfParse - enabling powerful candidate searches, automated screening, and seamless ATS integration.

Demo placeholder (swap in your own clip): <video controls width="100%" poster="/blog/placeholders/resume-demo-poster.png"> <source src="/blog/resume-builder.webm" type="video/webm" /> <source src="/blog/resume-builder.mp4" type="video/mp4" /> </video>

1) Schema: parent + repeating children

A well-designed resume schema captures the hierarchical nature of candidate data:

candidates (parent): Core candidate information - name, contact, summary, years of experience
work_experience (child, repeating): Job history with companies, titles, dates, descriptions
education (child, repeating): Degrees, institutions, graduation years
skills (child, repeating): Technical skills, proficiencies, certifications

This normalized structure lets you run powerful queries like “find all candidates with 5+ years of Python experience who worked at FAANG companies.”

Copyable schema (tables)

candidates

key	type	prompt
full_name	string	”Candidate’s full name as shown on resume”
email	string	”Primary email address”
phone	string	”Phone number with country code if present”
location	string	”City and state/country (e.g., ‘San Francisco, CA’)“
linkedin_url	string	”LinkedIn profile URL if present”
portfolio_url	string	”Personal website or portfolio URL if present”
summary	string	”Professional summary or objective statement”
years_experience	number	”Total years of professional work experience”
current_title	string	”Most recent or current job title”

work_experience (repeating child of candidates)

key	type	prompt
candidate_id	number	”Link to parent candidate id”
company	string	”Company or organization name”
title	string	”Job title or role”
start_date	date	”Start date in YYYY-MM format”
end_date	string	”End date in YYYY-MM format, or ‘Present’ if current”
description	string	”Job responsibilities, achievements, and key projects”
technologies_used	string	”Technologies, tools, or skills used in this role”

education (repeating child of candidates)

key	type	prompt
candidate_id	number	”Link to parent candidate id”
institution	string	”School, university, or educational institution name”
degree	string	”Degree type: BS, BA, MS, MBA, PhD, etc.”
field_of_study	string	”Major or field of study”
graduation_year	number	”Year of graduation (YYYY format)“
gpa	string	”GPA if clearly stated (e.g., ‘3.8/4.0’)”

skills (repeating child of candidates)

key	type	prompt
candidate_id	number	”Link to parent candidate id”
skill_name	string	”Name of the skill, technology, or tool”
proficiency_level	string	”Proficiency if stated: beginner, intermediate, advanced, expert”

Copyable schema (JSON)

{
  "candidates": {
    "fields": [
      { "key": "full_name", "type": "string", "prompt": "Candidate's full name as shown on resume" },
      { "key": "email", "type": "string", "prompt": "Primary email address" },
      { "key": "phone", "type": "string", "prompt": "Phone number with country code if present" },
      { "key": "location", "type": "string", "prompt": "City and state/country (e.g., 'San Francisco, CA')" },
      { "key": "linkedin_url", "type": "string", "prompt": "LinkedIn profile URL if present" },
      { "key": "portfolio_url", "type": "string", "prompt": "Personal website or portfolio URL if present" },
      { "key": "summary", "type": "string", "prompt": "Professional summary or objective statement" },
      { "key": "years_experience", "type": "number", "prompt": "Total years of professional work experience" },
      { "key": "current_title", "type": "string", "prompt": "Most recent or current job title" }
    ]
  },
  "work_experience": {
    "parent": "candidates",
    "repeating": true,
    "fields": [
      { "key": "candidate_id", "type": "number", "prompt": "Link to parent candidate id" },
      { "key": "company", "type": "string", "prompt": "Company or organization name" },
      { "key": "title", "type": "string", "prompt": "Job title or role" },
      { "key": "start_date", "type": "date", "prompt": "Start date in YYYY-MM format" },
      { "key": "end_date", "type": "string", "prompt": "End date in YYYY-MM format, or 'Present' if current" },
      { "key": "description", "type": "string", "prompt": "Job responsibilities, achievements, and key projects" },
      { "key": "technologies_used", "type": "string", "prompt": "Technologies, tools, or skills used in this role" }
    ]
  },
  "education": {
    "parent": "candidates",
    "repeating": true,
    "fields": [
      { "key": "candidate_id", "type": "number", "prompt": "Link to parent candidate id" },
      { "key": "institution", "type": "string", "prompt": "School, university, or educational institution name" },
      { "key": "degree", "type": "string", "prompt": "Degree type: BS, BA, MS, MBA, PhD, etc." },
      { "key": "field_of_study", "type": "string", "prompt": "Major or field of study" },
      { "key": "graduation_year", "type": "number", "prompt": "Year of graduation (YYYY format)" },
      { "key": "gpa", "type": "string", "prompt": "GPA if clearly stated (e.g., '3.8/4.0')" }
    ]
  },
  "skills": {
    "parent": "candidates",
    "repeating": true,
    "fields": [
      { "key": "candidate_id", "type": "number", "prompt": "Link to parent candidate id" },
      { "key": "skill_name", "type": "string", "prompt": "Name of the skill, technology, or tool" },
      { "key": "proficiency_level", "type": "string", "prompt": "Proficiency if stated: beginner, intermediate, advanced, expert" }
    ]
  }
}

2) Why this schema structure

Normalized relational design unlocks powerful recruiting workflows:

Avoid duplication: Store candidate contact info once, not repeated across every job entry
Enable aggregation: Query across all work experience to find patterns (e.g., “candidates who worked at startups”)
Support joins: Combine skills + experience + education for sophisticated matching
Future-proof: Add new tables (certifications, projects, languages) without breaking existing queries
BI tool friendly: Most business intelligence and reporting tools expect normalized foreign key relationships

This structure mirrors how recruiting databases and ATS (Applicant Tracking Systems) store candidate data - making integration seamless.

3) Schema prompts and extraction tips

Prompt engineering for clean data:

Mark work_experience, education, and skills as repeating child tables in PdfParse
Add specific prompts per field - vague prompts like “get the title” produce inconsistent results
For end_date, explicitly allow “Present” as a string value for current positions
Extract technologies_used from job descriptions to supplement the skills table
Handle date variations: some resumes use “Jan 2020”, others “2020-01”, others “January 2020” - prompt for “YYYY-MM format”

Example prompt refinements:

skills.skill_name: "Extract individual skills from the skills section. Return each skill separately (e.g., 'Python', 'React', 'AWS'). Do not include proficiency levels in the skill name."

work_experience.description: "Complete description of responsibilities and achievements for this role. Include quantifiable results if present (e.g., 'increased sales by 30%')."

years_experience: "Calculate total years of professional work experience by summing all work history. Count partial years (e.g., 6 months = 0.5 years). Exclude internships unless specifically labeled as professional roles."

4) Powerful SQL queries for recruiting

Once you’ve processed resumes and downloaded your SQLite database, these queries unlock candidate intelligence:

Find candidates with specific skill combinations:

SELECT DISTINCT
  c.full_name,
  c.email,
  c.current_title,
  c.years_experience,
  GROUP_CONCAT(DISTINCT s.skill_name) as matching_skills
FROM candidates c
JOIN skills s ON s.candidate_id = c.id
WHERE s.skill_name IN ('Python', 'React', 'AWS', 'Docker')
GROUP BY c.id
HAVING COUNT(DISTINCT s.skill_name) >= 3  -- Must have at least 3 of the 4 skills
ORDER BY c.years_experience DESC;

Search by experience level and location:

SELECT
  full_name,
  email,
  location,
  current_title,
  years_experience
FROM candidates
WHERE years_experience BETWEEN 5 AND 10
  AND location LIKE '%San Francisco%'
ORDER BY years_experience DESC;

Find candidates from target companies:

SELECT
  c.full_name,
  c.email,
  c.phone,
  we.company,
  we.title,
  we.start_date,
  we.end_date
FROM candidates c
JOIN work_experience we ON we.candidate_id = c.id
WHERE we.company IN ('Google', 'Meta', 'Amazon', 'Microsoft', 'Apple')
ORDER BY c.full_name, we.start_date DESC;

Full-text search across job descriptions:

SELECT
  c.full_name,
  c.email,
  we.company,
  we.title,
  we.description
FROM candidates c
JOIN work_experience we ON we.candidate_id = c.id
WHERE we.description LIKE '%machine learning%'
   OR we.description LIKE '%artificial intelligence%'
   OR we.technologies_used LIKE '%TensorFlow%'
   OR we.technologies_used LIKE '%PyTorch%';

Filter by education criteria:

SELECT
  c.full_name,
  c.email,
  e.degree,
  e.field_of_study,
  e.institution,
  e.graduation_year
FROM candidates c
JOIN education e ON e.candidate_id = c.id
WHERE e.degree IN ('MS', 'PhD')
  AND e.field_of_study LIKE '%Computer Science%'
ORDER BY e.graduation_year DESC;

Most in-demand skills in your candidate pool:

SELECT
  skill_name,
  COUNT(*) as candidate_count,
  ROUND(COUNT(*) * 100.0 / (SELECT COUNT(DISTINCT candidate_id) FROM skills), 2) as percentage
FROM skills
GROUP BY skill_name
ORDER BY candidate_count DESC
LIMIT 20;

Candidates with recent experience at senior levels:

SELECT DISTINCT
  c.full_name,
  c.email,
  c.years_experience,
  we.title,
  we.company,
  we.end_date
FROM candidates c
JOIN work_experience we ON we.candidate_id = c.id
WHERE (we.title LIKE '%Senior%'
   OR we.title LIKE '%Lead%'
   OR we.title LIKE '%Principal%'
   OR we.title LIKE '%Staff%')
  AND (we.end_date = 'Present' OR we.end_date >= '2023-01')
ORDER BY c.years_experience DESC;

5) Download and integrate your resume database

From extraction to automation in three steps:

Process your resumes: Upload PDFs to PdfParse, process with your schema, review any extraction errors
Download SQLite database: Navigate to Dashboard → [Project Name] → Downloads and click “Download Database”
Integrate anywhere: Use the portable .sqlite file in Python scripts, Excel, BI tools, or custom applications

Download demo placeholder (swap in your own clip): <video controls width="100%" poster="/blog/placeholders/download-demo-poster.png"> <source src="/blog/sqlite-download.webm" type="video/webm" /> <source src="/blog/sqlite-download.mp4" type="video/mp4" /> </video>

Breadcrumb navigation: Dashboard → Projects → [Your Project] → Downloads → Download Database

The downloaded SQLite file contains all your tables with proper foreign key relationships - ready to query immediately.

6) Python integration for automated screening

Build a candidate matching engine:

import sqlite3
import pandas as pd
from typing import List, Dict

def find_matching_candidates(
    db_path: str,
    required_skills: List[str],
    min_experience: int = 0,
    target_companies: List[str] = None,
    location: str = None
) -> pd.DataFrame:
    """
    Find candidates matching specific criteria.

    Args:
        db_path: Path to the downloaded SQLite database
        required_skills: List of required skills (e.g., ['Python', 'React'])
        min_experience: Minimum years of experience
        target_companies: Optional list of companies to filter by
        location: Optional location filter (partial match)

    Returns:
        DataFrame of matching candidates with their details
    """
    conn = sqlite3.connect(db_path)

    # Build dynamic query based on filters
    query = """
    SELECT DISTINCT
        c.id,
        c.full_name,
        c.email,
        c.phone,
        c.location,
        c.current_title,
        c.years_experience,
        c.linkedin_url,
        GROUP_CONCAT(DISTINCT s.skill_name) as all_skills,
        GROUP_CONCAT(DISTINCT we.company) as companies
    FROM candidates c
    LEFT JOIN skills s ON s.candidate_id = c.id
    LEFT JOIN work_experience we ON we.candidate_id = c.id
    WHERE c.years_experience >= ?
    """

    params = [min_experience]

    if location:
        query += " AND c.location LIKE ?"
        params.append(f"%{location}%")

    query += " GROUP BY c.id"

    # Execute query
    df = pd.read_sql_query(query, conn, params=params)

    # Filter by required skills
    if required_skills:
        def has_required_skills(skills_str):
            if pd.isna(skills_str):
                return False
            skills = skills_str.lower().split(',')
            return all(
                any(req.lower() in skill for skill in skills)
                for req in required_skills
            )

        df = df[df['all_skills'].apply(has_required_skills)]

    # Filter by target companies
    if target_companies:
        def worked_at_target(companies_str):
            if pd.isna(companies_str):
                return False
            return any(
                target.lower() in companies_str.lower()
                for target in target_companies
            )

        df = df[df['companies'].apply(worked_at_target)]

    conn.close()
    return df.sort_values('years_experience', ascending=False)

# Example usage
candidates = find_matching_candidates(
    db_path='resumes.sqlite',
    required_skills=['Python', 'AWS', 'Docker'],
    min_experience=5,
    target_companies=['Google', 'Amazon', 'Microsoft'],
    location='San Francisco'
)

print(f"Found {len(candidates)} matching candidates:")
print(candidates[['full_name', 'email', 'years_experience', 'current_title']])

Export for ATS import:

import sqlite3
import csv

def export_candidates_to_csv(db_path: str, output_file: str):
    """Export all candidates to CSV for ATS import."""
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    cursor.execute("""
        SELECT
            c.full_name,
            c.email,
            c.phone,
            c.location,
            c.current_title,
            c.years_experience,
            c.linkedin_url,
            GROUP_CONCAT(DISTINCT s.skill_name, '; ') as skills,
            GROUP_CONCAT(DISTINCT we.company || ' (' || we.title || ')', '; ') as experience
        FROM candidates c
        LEFT JOIN skills s ON s.candidate_id = c.id
        LEFT JOIN work_experience we ON we.candidate_id = c.id
        GROUP BY c.id
        ORDER BY c.full_name
    """)

    with open(output_file, 'w', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        writer.writerow([
            'Full Name', 'Email', 'Phone', 'Location',
            'Current Title', 'Years Experience', 'LinkedIn',
            'Skills', 'Work History'
        ])
        writer.writerows(cursor.fetchall())

    conn.close()
    print(f"Exported to {output_file}")

# Export all candidates
export_candidates_to_csv('resumes.sqlite', 'candidates_import.csv')

AI-powered candidate screening:

from anthropic import Anthropic
import sqlite3

def ai_screen_candidate(
    db_path: str,
    candidate_id: int,
    job_description: str
) -> Dict[str, any]:
    """
    Use Claude to evaluate candidate fit for a specific role.

    Args:
        db_path: Path to SQLite database
        candidate_id: ID of candidate to evaluate
        job_description: Full job description text

    Returns:
        Dictionary with score, reasoning, and strengths/concerns
    """
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()

    # Get complete candidate profile
    cursor.execute("""
        SELECT
            c.*,
            GROUP_CONCAT(DISTINCT we.company || ' - ' || we.title ||
                         ' (' || we.start_date || ' to ' || we.end_date || '): ' ||
                         COALESCE(we.description, 'No description'), '\n\n') as experience,
            GROUP_CONCAT(DISTINCT s.skill_name, ', ') as skills,
            GROUP_CONCAT(DISTINCT e.degree || ' in ' || e.field_of_study ||
                         ' from ' || e.institution || ' (' || e.graduation_year || ')', '\n') as education
        FROM candidates c
        LEFT JOIN work_experience we ON we.candidate_id = c.id
        LEFT JOIN skills s ON s.candidate_id = c.id
        LEFT JOIN education e ON e.candidate_id = c.id
        WHERE c.id = ?
        GROUP BY c.id
    """, [candidate_id])

    candidate = cursor.fetchone()
    conn.close()

    if not candidate:
        return {"error": "Candidate not found"}

    # Build candidate summary
    candidate_summary = f"""
CANDIDATE PROFILE

Name: {candidate[1]}
Location: {candidate[4]}
Current Title: {candidate[9]}
Years of Experience: {candidate[8]}
Email: {candidate[2]}
LinkedIn: {candidate[5]}

SUMMARY
{candidate[7] or 'No summary provided'}

WORK EXPERIENCE
{candidate[10] or 'No work experience listed'}

SKILLS
{candidate[11] or 'No skills listed'}

EDUCATION
{candidate[12] or 'No education listed'}
"""

    client = Anthropic(api_key="your-api-key")

    message = client.messages.create(
        model="claude-sonnet-4-5-20250929",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": f"""You are an expert technical recruiter. Evaluate this candidate's fit for the following role.

JOB DESCRIPTION:
{job_description}

{candidate_summary}

Provide your evaluation in this format:
1. OVERALL FIT SCORE: [1-10]
2. KEY STRENGTHS: [bullet points]
3. POTENTIAL CONCERNS: [bullet points]
4. RECOMMENDATION: [Hire/Interview/Pass with reasoning]
5. SUGGESTED INTERVIEW QUESTIONS: [3-5 questions to validate fit]
"""
        }]
    )

    return {
        "candidate_id": candidate_id,
        "candidate_name": candidate[1],
        "evaluation": message.content[0].text
    }

# Screen a candidate
job_desc = """
Senior Backend Engineer - Python/AWS
We're seeking a senior backend engineer with 5+ years of experience building
scalable APIs using Python, AWS, and modern database technologies...
"""

result = ai_screen_candidate('resumes.sqlite', candidate_id=42, job_description=job_desc)
print(result['evaluation'])

7) Excel and BI tool integration

Connect Excel to your resume database:

Download and install the SQLite ODBC Driver
In Excel: Data → Get Data → From Database → From ODBC
Select your downloaded .sqlite file
Choose tables to import (candidates, skills, work_experience, education)
Use Power Query to create relationships between tables
Build pivot tables, charts, and dashboards

Power BI integration:

Open Power BI Desktop
Get Data → More → Database → SQLite
Browse to your downloaded database file
Select all tables and load
Power BI auto-detects foreign key relationships
Create visualizations:
- Skills distribution (bar chart)
- Candidates by location (map)
- Experience level breakdown (pie chart)
- Top companies represented (tree map)

Google Sheets integration (requires Google Apps Script):

// Google Apps Script to query SQLite via web service
function importCandidates() {
  // First, upload your SQLite to a web-accessible endpoint
  // or use a service like Datasette to expose an API

  const response = UrlFetchApp.fetch('https://your-datasette-instance.com/resumes/candidates.json?_shape=array');
  const candidates = JSON.parse(response.getContentText());

  const sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();

  // Clear existing data
  sheet.clear();

  // Write headers
  sheet.getRange(1, 1, 1, 6).setValues([[
    'Name', 'Email', 'Location', 'Title', 'Experience', 'Skills'
  ]]);

  // Write candidate data
  const rows = candidates.map(c => [
    c.full_name,
    c.email,
    c.location,
    c.current_title,
    c.years_experience,
    c.skills
  ]);

  sheet.getRange(2, 1, rows.length, 6).setValues(rows);
}

8) Building a resume search API

Simple Express/TypeScript API for internal recruiting tools:

import express from 'express';
import Database from 'better-sqlite3';
import cors from 'cors';

const app = express();
const db = new Database('resumes.sqlite', { readonly: true });

app.use(cors());
app.use(express.json());

// Search candidates by multiple criteria
app.get('/api/candidates/search', (req, res) => {
  const {
    skills,
    minExperience,
    maxExperience,
    location,
    company,
    degree
  } = req.query;

  let query = `
    SELECT DISTINCT c.*
    FROM candidates c
    WHERE 1=1
  `;
  const params: any[] = [];

  if (minExperience) {
    query += ' AND c.years_experience >= ?';
    params.push(Number(minExperience));
  }

  if (maxExperience) {
    query += ' AND c.years_experience <= ?';
    params.push(Number(maxExperience));
  }

  if (location) {
    query += ' AND c.location LIKE ?';
    params.push(`%${location}%`);
  }

  let candidates = db.prepare(query).all(...params);

  // Filter by skills (requires join)
  if (skills) {
    const skillList = (skills as string).split(',').map(s => s.trim().toLowerCase());

    candidates = candidates.filter(c => {
      const candidateSkills = db.prepare(
        'SELECT skill_name FROM skills WHERE candidate_id = ?'
      ).all(c.id);

      const candidateSkillNames = candidateSkills.map(
        (s: any) => s.skill_name.toLowerCase()
      );

      return skillList.every(requiredSkill =>
        candidateSkillNames.some(candidateSkill =>
          candidateSkill.includes(requiredSkill)
        )
      );
    });
  }

  // Filter by company experience
  if (company) {
    candidates = candidates.filter(c => {
      const experience = db.prepare(
        'SELECT company FROM work_experience WHERE candidate_id = ?'
      ).all(c.id);

      return experience.some((exp: any) =>
        exp.company.toLowerCase().includes((company as string).toLowerCase())
      );
    });
  }

  // Filter by education
  if (degree) {
    candidates = candidates.filter(c => {
      const education = db.prepare(
        'SELECT degree FROM education WHERE candidate_id = ?'
      ).all(c.id);

      return education.some((edu: any) =>
        edu.degree.toLowerCase() === (degree as string).toLowerCase()
      );
    });
  }

  res.json({
    count: candidates.length,
    candidates: candidates
  });
});

// Get full candidate profile
app.get('/api/candidates/:id', (req, res) => {
  const candidate = db.prepare('SELECT * FROM candidates WHERE id = ?').get(req.params.id);

  if (!candidate) {
    return res.status(404).json({ error: 'Candidate not found' });
  }

  const workExperience = db.prepare(
    'SELECT * FROM work_experience WHERE candidate_id = ? ORDER BY start_date DESC'
  ).all(req.params.id);

  const education = db.prepare(
    'SELECT * FROM education WHERE candidate_id = ? ORDER BY graduation_year DESC'
  ).all(req.params.id);

  const skills = db.prepare(
    'SELECT * FROM skills WHERE candidate_id = ?'
  ).all(req.params.id);

  res.json({
    ...candidate,
    work_experience: workExperience,
    education: education,
    skills: skills
  });
});

// Get skills analytics
app.get('/api/analytics/skills', (req, res) => {
  const skillStats = db.prepare(`
    SELECT
      skill_name,
      COUNT(*) as candidate_count,
      ROUND(COUNT(*) * 100.0 / (SELECT COUNT(DISTINCT candidate_id) FROM skills), 2) as percentage
    FROM skills
    GROUP BY skill_name
    ORDER BY candidate_count DESC
    LIMIT 50
  `).all();

  res.json(skillStats);
});

// Get company distribution
app.get('/api/analytics/companies', (req, res) => {
  const companyStats = db.prepare(`
    SELECT
      company,
      COUNT(DISTINCT candidate_id) as candidate_count
    FROM work_experience
    GROUP BY company
    ORDER BY candidate_count DESC
    LIMIT 30
  `).all();

  res.json(companyStats);
});

app.listen(3000, () => {
  console.log('Resume API running on http://localhost:3000');
});

Frontend search interface example (React):

import { useState } from 'react';

interface Candidate {
  id: number;
  full_name: string;
  email: string;
  location: string;
  current_title: string;
  years_experience: number;
}

export default function CandidateSearch() {
  const [results, setResults] = useState<Candidate[]>([]);
  const [filters, setFilters] = useState({
    skills: '',
    minExperience: '',
    location: '',
    company: ''
  });

  const searchCandidates = async () => {
    const queryParams = new URLSearchParams();

    if (filters.skills) queryParams.append('skills', filters.skills);
    if (filters.minExperience) queryParams.append('minExperience', filters.minExperience);
    if (filters.location) queryParams.append('location', filters.location);
    if (filters.company) queryParams.append('company', filters.company);

    const response = await fetch(`http://localhost:3000/api/candidates/search?${queryParams}`);
    const data = await response.json();

    setResults(data.candidates);
  };

  return (
    <div className="p-6">
      <h1 className="text-2xl font-bold mb-6">Resume Database Search</h1>

      <div className="grid grid-cols-2 gap-4 mb-6">
        <input
          type="text"
          placeholder="Skills (comma-separated)"
          value={filters.skills}
          onChange={e => setFilters({...filters, skills: e.target.value})}
          className="border p-2 rounded"
        />

        <input
          type="number"
          placeholder="Min years experience"
          value={filters.minExperience}
          onChange={e => setFilters({...filters, minExperience: e.target.value})}
          className="border p-2 rounded"
        />

        <input
          type="text"
          placeholder="Location"
          value={filters.location}
          onChange={e => setFilters({...filters, location: e.target.value})}
          className="border p-2 rounded"
        />

        <input
          type="text"
          placeholder="Company"
          value={filters.company}
          onChange={e => setFilters({...filters, company: e.target.value})}
          className="border p-2 rounded"
        />
      </div>

      <button
        onClick={searchCandidates}
        className="bg-blue-600 text-white px-6 py-2 rounded hover:bg-blue-700"
      >
        Search Candidates
      </button>

      <div className="mt-8">
        <h2 className="text-xl font-semibold mb-4">
          Results ({results.length} candidates)
        </h2>

        <div className="space-y-4">
          {results.map(candidate => (
            <div key={candidate.id} className="border p-4 rounded shadow-sm">
              <h3 className="font-bold">{candidate.full_name}</h3>
              <p className="text-gray-600">{candidate.current_title}</p>
              <div className="mt-2 text-sm text-gray-500">
                <p>{candidate.location} • {candidate.years_experience} years experience</p>
                <p>{candidate.email}</p>
              </div>
            </div>
          ))}
        </div>
      </div>
    </div>
  );
}

9) Advanced analytics and reporting

Skill gap analysis - identify missing skills in your talent pool:

-- Define your target skill set and compare against available candidates
WITH desired_skills(skill) AS (
  VALUES
    ('Kubernetes'),
    ('Docker'),
    ('AWS'),
    ('Terraform'),
    ('Python'),
    ('Go'),
    ('React'),
    ('PostgreSQL')
)
SELECT
  d.skill,
  COUNT(s.skill_name) as candidates_with_skill,
  ROUND(COUNT(s.skill_name) * 100.0 / (SELECT COUNT(*) FROM candidates), 2) as coverage_percentage,
  (SELECT COUNT(*) FROM candidates) - COUNT(s.skill_name) as candidates_missing_skill
FROM desired_skills d
LEFT JOIN skills s ON LOWER(s.skill_name) = LOWER(d.skill)
GROUP BY d.skill
ORDER BY coverage_percentage DESC;

Experience distribution analysis:

SELECT
  CASE
    WHEN years_experience < 2 THEN '0-2 years (Junior)'
    WHEN years_experience < 5 THEN '2-5 years (Mid-level)'
    WHEN years_experience < 10 THEN '5-10 years (Senior)'
    ELSE '10+ years (Expert/Lead)'
  END as experience_level,
  COUNT(*) as candidate_count,
  ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM candidates), 2) as percentage
FROM candidates
GROUP BY experience_level
ORDER BY MIN(years_experience);

Top educational institutions:

SELECT
  e.institution,
  COUNT(DISTINCT e.candidate_id) as candidate_count,
  GROUP_CONCAT(DISTINCT e.degree) as degrees_offered,
  GROUP_CONCAT(DISTINCT e.field_of_study) as fields
FROM education e
GROUP BY e.institution
ORDER BY candidate_count DESC
LIMIT 20;

Career path analysis - common job progressions:

-- Find candidates who moved from Company A to Company B
SELECT
  c.full_name,
  c.email,
  we1.company as previous_company,
  we1.title as previous_title,
  we2.company as current_company,
  we2.title as current_title
FROM candidates c
JOIN work_experience we1 ON we1.candidate_id = c.id
JOIN work_experience we2 ON we2.candidate_id = c.id
WHERE we1.company = 'Google'
  AND we2.company = 'Amazon'
  AND we1.start_date < we2.start_date;

Diversity metrics - location distribution:

SELECT
  CASE
    WHEN location LIKE '%San Francisco%' OR location LIKE '%SF%' THEN 'San Francisco Bay Area'
    WHEN location LIKE '%New York%' OR location LIKE '%NYC%' THEN 'New York'
    WHEN location LIKE '%Seattle%' THEN 'Seattle'
    WHEN location LIKE '%Austin%' THEN 'Austin'
    WHEN location LIKE '%Remote%' THEN 'Remote'
    ELSE 'Other'
  END as location_group,
  COUNT(*) as candidate_count
FROM candidates
GROUP BY location_group
ORDER BY candidate_count DESC;

10) Best practices and troubleshooting

Schema design tips:

Start simple: Begin with core fields, add optional fields after testing with real resumes
Use child tables for repeating data: Work experience, education, skills, certifications should always be child tables
Consider JSON arrays for highly variable data: Awards, publications, languages, hobbies work well as JSON if they’re not searchable criteria
Normalize contact info: Store one email/phone per candidate to avoid duplicates

Extraction quality tips:

Test with diverse formats: Canva templates, LaTeX academic CVs, Word documents, LinkedIn exports
Handle edge cases:
- Career gaps (unemployed periods)
- Contractor/freelance work (many short-term positions)
- International education (different degree naming conventions)
- Bilingual resumes (choose primary language)
Validate critical fields:
- Email regex validation: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
- Phone number normalization: convert all to +1-XXX-XXX-XXXX format
- Date consistency: enforce YYYY-MM format for all date fields

Privacy and compliance:

Encrypt at rest: Use SQLCipher or encrypt the SQLite file with strong encryption before storing
GDPR compliance: Implement data retention policies (delete candidates after N months without activity)

Redact sensitive info: Before sharing databases externally, remove personal identifiers:

UPDATE candidates SET
  email = 'redacted_' || id || '@example.com',
  phone = 'XXX-XXX-' || SUBSTR(phone, -4),
  full_name = 'Candidate ' || id;

Access controls: Limit who can download the full database; consider role-based exports

Common extraction issues and fixes:

Issue	Cause	Solution
Skills as paragraphs instead of individual entries	Vague prompt	Prompt: “Extract each skill separately. Return individual skills (e.g., ‘Python’, ‘AWS’), not full sentences.”
Inconsistent date formats	Resumes use various formats	Prompt: “Return dates in YYYY-MM format. For ‘Jan 2020’, return ‘2020-01’.”
Missing work descriptions	Some resumes only list titles	Adjust expectation: make `description` optional, rely on `technologies_used` for details
Duplicate skills with slight variations	”JavaScript” vs “Javascript” vs “JS”	Post-processing: normalize skill names with a mapping table
Years of experience calculation wrong	Model misunderstands overlapping positions	Be explicit: “Sum all work periods, excluding overlaps and internships”

11) Scaling to thousands of resumes

Batch processing workflow:

Organize by job posting: Create separate PdfParse projects for each role or hiring campaign
Process in batches: Upload 50-100 resumes at a time to monitor extraction quality
Review and retry: Check error rates after each batch, refine prompts if >5% fail
Download incrementally: Download SQLite after each batch, merge databases locally

Merging multiple SQLite databases:

import sqlite3

def merge_resume_databases(output_db: str, input_dbs: list):
    """Merge multiple resume databases into one."""
    conn_out = sqlite3.connect(output_db)
    cursor_out = conn_out.cursor()

    # Create tables in output database
    cursor_out.execute('''
        CREATE TABLE IF NOT EXISTS candidates (
            id INTEGER PRIMARY KEY,
            full_name TEXT,
            email TEXT,
            phone TEXT,
            location TEXT,
            linkedin_url TEXT,
            portfolio_url TEXT,
            summary TEXT,
            years_experience REAL,
            current_title TEXT
        )
    ''')

    # Similar CREATE TABLE statements for other tables...

    candidate_id_offset = 0

    for db_file in input_dbs:
        conn_in = sqlite3.connect(db_file)
        cursor_in = conn_in.cursor()

        # Copy candidates with ID offset
        cursor_in.execute('SELECT * FROM candidates')
        for row in cursor_in.fetchall():
            new_row = (row[0] + candidate_id_offset,) + row[1:]
            cursor_out.execute('''
                INSERT INTO candidates VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            ''', new_row)

        # Copy work_experience with updated foreign keys
        cursor_in.execute('SELECT * FROM work_experience')
        for row in cursor_in.fetchall():
            new_row = (None, row[1] + candidate_id_offset) + row[2:]
            cursor_out.execute('''
                INSERT INTO work_experience (id, candidate_id, company, title, start_date, end_date, description, technologies_used)
                VALUES (?, ?, ?, ?, ?, ?, ?, ?)
            ''', new_row)

        # Update offset for next database
        cursor_in.execute('SELECT MAX(id) FROM candidates')
        max_id = cursor_in.fetchone()[0] or 0
        candidate_id_offset += max_id

        conn_in.close()

    conn_out.commit()
    conn_out.close()
    print(f"Merged {len(input_dbs)} databases into {output_db}")

# Merge multiple job posting databases
merge_resume_databases(
    output_db='all_candidates_2025.sqlite',
    input_dbs=[
        'backend_engineer_q1.sqlite',
        'frontend_engineer_q1.sqlite',
        'data_scientist_q1.sqlite'
    ]
)

Performance optimization for large databases:

-- Create indexes for common query patterns
CREATE INDEX idx_candidates_experience ON candidates(years_experience);
CREATE INDEX idx_candidates_location ON candidates(location);
CREATE INDEX idx_skills_name ON skills(skill_name);
CREATE INDEX idx_skills_candidate ON skills(candidate_id);
CREATE INDEX idx_work_company ON work_experience(company);
CREATE INDEX idx_work_candidate ON work_experience(candidate_id);

-- Analyze database for query optimization
ANALYZE;

12) Real-world use cases

Recruiting agency workflow:

Create separate projects for each client company
Process incoming resumes daily
Download and merge weekly into master candidate database
Run weekly reports: new skills trending, top universities, experience distribution
Export matches to client ATS systems

Internal HR talent pool:

Process all applicants from career site submissions
Tag candidates by application date, source (LinkedIn, referral, career site)
Quarterly reviews: re-screen previous applicants against new openings
Build internal referral pipeline by searching for companies employees worked at

Portfolio company recruiting (VC/PE firms):

Aggregate resumes across all portfolio companies
Identify candidates who applied to multiple portfolio companies
Cross-pollinate talent: suggest candidates to portfolio companies based on skill matches
Track hiring metrics: time-to-hire, source quality, skill demand trends

Get started building your resume database

Ready to transform your recruiting workflow? Here’s how to get started:

Create a PdfParse account - free tier includes 20 pages to test with sample resumes
Set up your schema - copy the JSON schema from this article or customize to your needs
Upload and process - drag and drop resume PDFs, click “Process Files”
Review and refine - check extraction quality, adjust prompts if needed
Download SQLite - get your portable database file ready for integration
Build your workflow - integrate with Python, Excel, BI tools, or custom applications

Need help with your setup? Contact us at support@pdfparse.net for:

Custom schema design for specialized roles (medical, legal, executive)
Bulk processing guidance for thousands of resumes
ATS and HRIS integration support
Privacy and compliance consulting (GDPR, CCPA)
API access for automated workflows

Continue learning about structured document extraction:

Build an Invoice Extractor with Child Tables and Foreign Keys - Learn the parent-child table pattern for invoices with line items
Build a Bank Statement Extractor with JSON Arrays - See how to model transaction data using JSON arrays for flexibility
Introducing PdfParse - Product overview and core features walkthrough