Monday, February 20, 2012

PDF to XML using iTextsharp

Here is the very simple way of creating the XML from PDF document. I used Form fields in the PDF document. Then using the iTextsharp, I looped through the Acrofields or Form fields and created the flat XML document out of it. You can customize to create more complex structures if need be.

XmlDocument doc = new XmlDocument();
PdfReader reader = new PdfReader(@"C:\Input.pdf");
AcroFields fields = reader.AcroFields;
doc.LoadXml(string.Format("<{0}/>", root));
foreach (string keyName in fields.Fields.Keys)
{
AcroFields.Item item = fields.GetFieldItem(keyName);
XmlElement elt = doc.CreateElement(keyName);
elt.InnerXml = "<![CDATA[" + fields.GetField(keyName) + "]]>"";
doc.DocumentElement.AppendChild(elt);
}

doc.Save(@"C:\output.xml");

13 comments:

  1. i am also have the same requirement. please suggest what are the required ddl's to run the code.

    ReplyDelete
  2. It is not working properly.Please more verify it

    ReplyDelete
  3. list of libraries to add for in achieving the above the above task.

    ReplyDelete
    Replies
    1. Add general iTextSharp.dll core to the project and add the below given libraries to the application.
      using System.Xml;
      using iTextSharp.text;
      using iTextSharp.text.pdf;

      Delete
    2. Hi Nagebdra, I have a question do you know what is the root value?
      Thanks

      Delete
  4. you can try this free online pdf to text converter to convert pdf to text online.

    ReplyDelete
  5. What is the root value in the code ?

    ReplyDelete
  6. Could you please send me a complete code ?

    ReplyDelete
  7. Not work above code, Please share code PDF to XML

    ReplyDelete
  8. Not work above code, Please share code PDF to XML

    ReplyDelete
  9. Thank you for such a well written article. It’s full of insightful information and entertaining descriptions. Your point of view is the best among many. this

    ReplyDelete