Html to Pdf in .NET
I searched around for some OpenSource projects converting HTML to PDF, and stumbled upon a great library called iTextSharp . It's actually a Java library ported to .NET (typically isn't it?), and it has some really nice features. There are some good commercial products out there doing the same, but in my opinion this is core functionality, so if one can get it for free and even get the source, nothing is better than that!
Regarding PDF creation, iTextSharps main function is generating PDF from scratch, not converting them from HTML. The library has a really understandable API for developers that are not familiar with the PDF specification, me beeing one of them.
Playing around with the API and reading some news lists and forum posts, I finally managed to get a working sample on how to export a GridView to PDF. This is a pretty simple sample, but you can play around with the API to add more formatting and do your stuff. Actually you can export anything in the XHTML, providing the markup is legal. However, not all html tags are supported. The intention of the author was not to make a HTML2PDF converter, but more like create PDF's from HTML if the markup supports the engine. So you will probably not be able to convert dynamic content you do not have control over, but it's excellent for creating reports etc. Supported tags are: "ol ul li a pre font span br p div body table td th tr i b u sub sup em strong s strike h1 h2 h3 h4 h5 h6 img"
So how does it work, then? Well, first you need to get the latest version of iTextSharp. Just place the dll to your bin folder in your Web Application Project (you're not using Web Projects, are you
) and add a reference to it. Then create a new page, add a PlaceHolder to it, The placeholder will be the section of the HTML that you will export to PDF. You can add some more controls to the placeholder if you need. Inside the placeholder add a GridView and bind it to your datasource.
Add a ASP:Button to the page. This will trigger the export. The code when the button is clicked is doing all the exporting stuff:
protected void ButtonCreatePdf_Click(object sender, EventArgs e)
{
//Set content type in response stream
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=FileName.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
//Render PlaceHolder to temporary stream
System.IO.StringWriter stringWrite = new StringWriter();
System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);
PlaceholderPdf.RenderControl(htmlWrite);
StringReader reader = new StringReader(stringWrite.ToString());
//Create PDF document
Document doc = new Document(PageSize.A4);
HTMLWorker parser = new HTMLWorker(doc);
PdfWriter.GetInstance(doc, Response.OutputStream);
doc.Open();
try
{
//Create a footer that will display page number
HeaderFooter footer = new HeaderFooter(new Phrase("This is page: "), true)
{ Border = Rectangle.NO_BORDER };
doc.Footer = footer;
//Parse Html
parser.Parse(reader);
}
catch (Exception ex)
{
//Display parser errors in PDF.
//Parser errors will also be wisible in Debug.Output window in VS
Paragraph paragraph = new Paragraph("Error! " + ex.Message);
paragraph.SetAlignment("center");
Chunk text = paragraph.Chunks[0] as Chunk;
if (text != null)
{
text.Font.Color = Color.RED;
}
doc.Add(paragraph);
}
finally
{
doc.Close();
}
}
Almost there. Clicking the button will result in an exception:
Control 'GridView1' of type 'GridView' must be placed inside a form tag with runat=server.
This is because we are trying to render the PlaceHolder control in a stream and not in a WebForm. A neat .NET security feature to prevent Injection attacks. This exception is simply ignored by adding the following code to the page:
public override void VerifyRenderingInServerForm(Control control)
{
}
Now, clicking the button again results in another Exception:
RegisterForEventValidation can only be called during Render();
Again, a security feature of .NET. Ignore this excpetion by setting EnableEventValidation="False" in the Page header. If you are concerned about the security of the page, please check official documentation of the features that has been disabled for the page.
Now, clicking the button should result in a PDF document. If not, debug the application and check Debut Output window for exceptions. Probably the XHTML in the placeholder is not valid.
Note that I am using HTMLWorker class instead of the HtmlParser class in the iTextSharp library. According the the author, the HTMLParser is not supported. I tried both, and the HTMLWorker swallows a lot more HTML markup than the HtmlParser.
You probably also want to clean the HTML before parsing it with the HTMLWorker. Typically you want to remove javascript postbacks, anchors etc. from the GridView. This can be achieved with the following code:
string html = stringWrite.ToString();
html = Regex.Replace(html, "</?(a|A).*?>", "");
StringReader reader = new StringReader(html);
You can also extend the HTMLWorker class making it more specialized for your purpose. For instance it would be great to be able to define pagebreaks in the final PDF document. Simply create a new class inherited from HTMLWorker.
public class HTMLWorkerExtended : HTMLWorker
{
public HTMLWorkerExtended(IDocListener document) : base(document)
{}
public override void StartElement(String tag, Hashtable h)
{
if (tag.Equals("newpage"))
document.Add(Chunk.NEXTPAGE);
else
base.StartElement(tag, h);
}
}
Now, simply replace the HTMLWorker with the extended version and add a <newpage /> element to the HTML in the placeholder where you want a new page.
There are some css styles not supported by default. For instance it is not possible to set the background-color style for an image or tablecell/-row. The only solution for adding more style support is changing the iTextSharp source. It's pretty simple, however. Open \text\html\simpleparser\FactoryProperties.cs. In the InsertStyle method add the following code to the foreach loop:
else if (key.Equals(Markup.CSS_KEY_BGCOLOR))
{
Color c = Markup.DecodeColor(prop[key]);
if (c != null)
{
int hh = c.ToArgb() & 0xffffff;
String hs = "#" + hh.ToString("X06", NumberFormatInfo.InvariantInfo);
h["bgcolor"] = hs;
}
}
Update: Another example for adding border color to a table:
First, set the border style for the table, ie. style="border-color: #ff0000;"
Then again, you need to apply the style to FactoryProperties.cs file as in the example above.
else if (key.Equals(Markup.CSS_KEY_BORDERCOLOR))
{
Color c = Markup.DecodeColor(prop[key]);
if (c != null)
{
int hh = c.ToArgb() & 0xffffff;
String hs = "#" + hh.ToString("X06", NumberFormatInfo.InvariantInfo);
h["border-color"] = hs;
}
}
In addition you need to alter the output of the table as there is no default "bordercolor" style property in the IncTable class.
Open IncTable.cs and change the following code in the BuildTable method:
Existing code:
for (int row = 0; row < rows.Count; ++row) {
ArrayList col = (ArrayList)rows[row];
for (int k = 0; k < col.Count; ++k) {
table.AddCell((PdfPCell)col[k]);
}
}
Replace with:
String bordercolor = (String)props["border-color"];
for (int row = 0; row < rows.Count; ++row)
{
ArrayList col = (ArrayList)rows[row];
for (int k = 0; k < col.Count; ++k)
{
PdfPCell cell = (PdfPCell)col[k];
cell.BorderColor = Markup.DecodeColor(bordercolor);
table.AddCell(cell);
}
}
This will change the border color on the cell in the table. Hint: It could be wise to check if the cell already has a border color before overwriting it with the table border color.
Recompile and add the new DLL to your project.
Happy Coding!
20 Comments to Html to Pdf in .NET
Hi Hamang,
When I'm exproting my placeholder to pdf. I'm getting Unnecessary spaces for table cells. Please can you help me for this.
Rajasekhar…
October 14, 2008
"The library has a really understandable API for developers that are not familiar with the PDF specification, me beeing one of them"
What are you referring to? There IS NOT a API for this product just posts on forums from developers in the need for an API…
October 15, 2008
Christoffer, I agree that the the documentation is somewhat lacking. You can read the iTextsharp tutorials at http://itextsharp.sourceforge.net/tutorial/index.html and there are some discussions over at http://www.nabble.com/iTextSharp-f4188.html. The API however, is pretty straight forward and understandable in my opinion.
It has long been looking for this information, Thank you for your work.
January 19, 2009
By itextsharp.dll i can display all html page data into pdf but for table its boder color get black. please suggest solution.
waiting for reply!!
January 19, 2009
@saharsh, it's not default supported, as with many of the styles. You need to add this by yourself. I have updated the article with the code.
March 16, 2009
Can u pls send me the complete sample working code , as there is some issue while using HtlWorkwer..
thanks in advance..
April 2, 2009
thanks, another waste of time.
April 8, 2009
Hi,
Great article btw, it's been a godsend for me.
I have a question which I hope you can help with, I am using your method detailed above for converting HTML into a PDF but some of the text on my html page is in czech and unfortunately some of the characters are ommited from the pdf as the default font for the page does not support them.
I have done some digging around and it looks like I need to change the font Codepage to cp1250 in order to support the special characters.
Could you give any pointers on how this can be achieved please….?
Thanks in advance
James
May 21, 2009
Really you gave me good solution for my issue thank you very much
June 10, 2009
Wonderful article, it helped me alot..
Thanks….:)
August 22, 2009
Great Article thanks,but I'am having problem about characters in gridview.Some turkish characters do not seem.Can we make pdf that support utf-8 encoding or something else.
August 25, 2009
Hi,
can you send me updated Dll? i can not find it anywhere on web.
i am facing problems like background color of table cell.
Thanks,
Mahavir Shah
093289 35308
October 6, 2009
Can you please provide full sample code for conversion from html to pdf ..thanks in advance…anc
October 23, 2009
Picture is not exporting how could i export picture?
December 20, 2009
hello,
first of all thank u very much i got pdf file from asp.net page using this article . but i have one problem how can remove unnecessry space in pdf file. Inshort i want to well fromated file n if possible give me code for display imag also in pdf file.
December 29, 2009
hello Khem Raj
there are two way for images
1. give online path for the image
2. following code
doc.Open();
//add an image to document
iTextSharp.text.Image img1 = iTextSharp.text.Image.GetInstance(Request.MapPath("~/Images/RelIcon-Pdf.jpg"));
img1.Alignment = iTextSharp.text.Image.TITLE;
img1.ScalePercent(100f); // change it's size
doc.Add(img1);
// end of code for image
// here start your code
It looks as though the PDF is trying to create but keep getting "There was an error opening this document. The file is damaged and could not be repaired."
Any advice?
[url=http://www.crme.uiuc.edu/cheap viagra[/url]
cheap viagra
July 16, 2010
hey doc has not footer property and I cant create HeaderFooter
September 22, 2008