PDF Merge for .NET: Fast, Reliable Document Combining


When to merge PDFs server-side vs client-side

Merging on the server is typical when:

  • You need consistent output owned by the application (e.g., generated reports).
  • Files are uploaded by many users and must be combined centrally.
  • You must apply permissions, stamps, or other processing.

Client-side merging (e.g., using WebAssembly libraries or browser-based utilities) can reduce server load and improve privacy when users handle their own documents.


  • iText7 (iTextSharp for .NET) — powerful, feature-rich. Commercial license required for many commercial uses.
  • PdfSharp / PdfSharpCore — MIT-like licenses; good for basic operations but limited feature set.
  • PdfPig — read-focused library; writing/merging is more limited.
  • QuestPDF — more for PDF generation than manipulation.
  • Syncfusion, Aspose.PDF, GemBox.Pdf — commercial components with strong capabilities and support.

Choose based on license, features (form fields, annotations, compression), platform (.NET Framework vs .NET Core/.NET 6+), and performance.


Key considerations before merging

  • Maintain bookmarks, outlines, and page labels if needed — not all libraries preserve these.
  • AcroForms (PDF forms) require special handling to avoid name collisions.
  • Metadata (XMP), permissions, and encryption often need merging or reapplying.
  • Memory usage: merging many or large PDFs can be memory‑intensive. Stream-based approaches are preferable.
  • Threading and concurrency: ensure the library is thread-safe if used in parallel.

Example approaches

Below are step‑by‑step code examples using three different libraries: PdfSharpCore (open-source), iText7 (popular but commercial for many uses), and a brief note about using external command-line tools when appropriate.

1) Using PdfSharpCore (suitable for .NET Core/.NET 5+)

PdfSharpCore is a cross-platform port of PdfSharp usable on .NET Core. It works well for basic merging.

Install:

dotnet add package PdfSharpCore dotnet add package PdfSharpCore.Fonts --version <latest-if-needed> 

Code example:

using PdfSharpCore.Pdf; using PdfSharpCore.Pdf.IO; using System.IO; public static void MergePdfs(string[] sourceFiles, string outputFile) {     using (var outDoc = new PdfDocument())     {         foreach (var file in sourceFiles)         {             using (var inStream = File.OpenRead(file))             {                 var inputDoc = PdfReader.Open(inStream, PdfDocumentOpenMode.Import);                 for (int i = 0; i < inputDoc.PageCount; i++)                 {                     var page = inputDoc.Pages[i];                     outDoc.AddPage(page);                 }             }         }         outDoc.Save(outputFile);     } } 

Notes:

  • PdfSharpCore’s import mode copies pages but may not preserve all interactive features (forms, some annotations).
  • For very large PDFs, prefer streaming and ensure you have sufficient memory.

iText7 has robust support for forms, bookmarks, encryption, and more. Its license is AGPL; commercial licensing is available.

Install:

dotnet add package itext7 

Code example:

using System.IO; using iText.Kernel.Pdf; using iText.Kernel.Utils; public static void MergeWithIText(string[] sourceFiles, string destinationFile) {     using (var pdfWriter = new PdfWriter(destinationFile))     {         using (var pdfDoc = new PdfDocument(pdfWriter))         {             var merger = new PdfMerger(pdfDoc);             foreach (var file in sourceFiles)             {                 using (var src = new PdfDocument(new PdfReader(file)))                 {                     merger.Merge(src, 1, src.GetNumberOfPages());                 }             }         }     } } 

Notes:

  • iText preserves bookmarks and most document structure. For AcroForms, you may need to explicitly flatten or rename fields to avoid collisions.
  • iText supports streaming and low-memory modes; consult docs for PdfWriter properties.

3) Using a commercial library (example: Syncfusion or Aspose)

Commercial libraries often offer easiest integration, richer features, and support for enterprise needs (form merging, annotations, font embedding, performance tuning, and licensing that avoids AGPL).

Typical workflow:

  • Install vendor NuGet package.
  • Use their Merge API (often a single call with files/streams).
  • Configure options (preserve bookmarks, handle forms).

Example pseudo-code (vendor-specific):

var merger = new VendorPdfMerger(); merger.PreserveBookmarks = true; merger.MergeFiles(sourceFiles, outputFile); 

4) Shelling out to command-line tools

For some scenarios, using an external tool (like qpdf or Ghostscript) via process calls is practical, especially when licensing or platform constraints limit library choices.

Example with qpdf: qpdf –empty –pages file1.pdf file2.pdf – out.pdf

In .NET:

var psi = new ProcessStartInfo("qpdf", $"--empty --pages {file1} {file2} -- {outFile}") { RedirectStandardOutput = true }; Process.Start(psi).WaitForExit(); 

Be mindful of security (sanitize filenames), availability on target servers, and performance.


Handling PDF forms (AcroForms)

If input PDFs contain forms, merging naively can result in fields with identical names colliding. Strategies:

  • Flatten forms before merging (convert fields to regular content).
  • Rename fields to unique names per document (programmatically).
  • Use library-specific form-merge features (iText can handle complex cases).

Example: flatten with iText7:

using (var src = new PdfDocument(new PdfReader(srcFile))) {     src.GetFirstPage().GetDocument().GetCatalog().GetAcroForm(true).FlattenFields();     // then merge flattened doc } 

Preserving bookmarks and outlines

Not all libraries preserve bookmarks by default. iText’s PdfMerger keeps outlines; some other libraries require copying outlines explicitly. If bookmarks are important, verify library behavior with sample documents.


Performance and memory tips

  • Use stream-based APIs rather than loading entire files into memory when possible.
  • Merge sequentially and write to disk incrementally if memory is constrained.
  • For large batches, split work into smaller groups and merge intermediate results.
  • Reuse PdfWriter/PdfDocument where supported to avoid repeated initialization costs.

Security and sanitization

  • Validate files from users (check file type, scan for malware).
  • Sanitize filenames and avoid injection when calling external tools.
  • When handling encrypted PDFs, require passwords and handle securely in memory.
  • Apply output encryption only if needed and supported by the library.

Testing checklist

  • Merge PDFs with images, fonts, annotations, and forms.
  • Validate resulting PDF in multiple viewers (Adobe Reader, Chrome, PDF.js).
  • Check metadata, bookmarks, and page order.
  • Measure memory and CPU for target batch sizes.

Troubleshooting common issues

  • Blank pages after merge: ensure pages are imported, not referenced incorrectly; test with sample files.
  • Missing fonts: embed fonts or ensure target viewers have required fonts.
  • Lost annotations/forms: use a library that supports interactive features or flatten before merging.
  • File corruption: ensure streams are closed and libraries are used per their threading model.

Example project structure

  • Services/PdfMergeService.cs — central merge logic, unit-testable.
  • Controllers/UploadController.cs — handles file uploads and validation.
  • BackgroundJobs/MergeJob.cs — for large/async merge operations.
  • Tests/MergeTests — automated checks with diverse sample PDFs.

Conclusion

Merging PDFs in .NET is straightforward for basic needs and can become complex when forms, bookmarks, encryption, or high performance are involved. For simple merges, open-source libraries like PdfSharpCore work well. For production features and robustness, consider iText7 or a commercial component. Always test with representative documents, handle forms carefully, and adopt stream-based patterns for large files.


If you want, I can: provide a complete sample project (dotnet CLI ready) using one of the libraries above; show code for handling AcroForms specifically; or produce a memory-optimized merging routine. Which would you like?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *