How to Convert PDF to Text in .NET (C#)

Parsing PDF Files using IFilter (C#, .NET)
How to extract text from PDF files using Microsoft IFilter interface and Adobe PDF IFilter implementation. 
How to Convert PDF to Text in .NET (VB)
How to extract plain text from PDF file using PDFBox.NET library. Sample Visual Studio project download (VB).
Parsing PDF Files using iTextSharp (C#, .NET)
How to extract text from PDF files using iTextSharp library. Sample Visual Studio 2010 project included (C#).
PDFBox in .NET
PDFBox.NET is a .NET port of PDFBBox created using IKVM.NET. The latest version (1.8.9) is available for download.
How to extract plain text from PDF file using PDFBox.NET library. Sample Visual Studio project download (C#).
Downloads

This sample requires the following dlls from the PDFBox.NET package:

As a reference:

  • IKVM.OpenJDK.Core.dll
  • IKVM.OpenJDK.SwingAWT.dll
  • pdfbox-1.8.9.dll

In addition to these libraries, it is necessary to copy the following files to the application directory:

  • commons-logging.dll
  • fontbox-1.8.9.dll
  • IKVM.OpenJDK.Text.dll
  • IKVM.OpenJDK.Util.dll
  • IKVM.Runtime.dll

You can also download the full PDFBox.NET package (including all dependencies).

Sample code (C#)

using org.apache.pdfbox.pdmodel;
using org.apache.pdfbox.util;

// ...

private static string ExtractTextFromPdf(string path)
{
  PDDocument doc = null;
  try {
    doc = PDDocument.load(path)
    PDFTextStripper stripper = new PDFTextStripper();
    return stripper.getText(doc);
  }
  finally {
    if (doc != null) {
      doc.close();
    }
  }
}

See also how to how to convert PDF to text in VB (.NET).

Other Methods