Html to Pdf in .NET
I searched around for some OpenSource projects converting HTML to PDF, and stumbled upon a great library called iTextSharp . It's actually a Java library ported to .NET (typically isn't it?), and it has some really nice features. There are some good commercial products out there doing the same, but in my opinion this is core functionality, so if one can get it for free and even get the source, nothing is better than that!
Regarding PDF creation, iTextSharps main function is generating PDF from scratch, not converting them from HTML. The library has a really understandable API for developers that are not familiar with the PDF specification, me beeing one of them.
Playing around with the API and reading some news lists and forum posts, I finally managed to get a working sample on how to export a GridView to PDF. This is a pretty simple sample, but you can play around with the API to add more formatting and do your stuff. Actually you can export anything in the XHTML, providing the markup is legal. However, not all html tags are supported. The intention of the author was not to make a HTML2PDF converter, but more like create PDF's from HTML if the markup supports the engine. So you will probably not be able to convert dynamic content you do not have control over, but it's excellent for creating reports etc. Supported tags are: "ol ul li a pre font span br p div body table td th tr i b u sub sup em strong s strike h1 h2 h3 h4 h5 h6 img"
So how does it work, then? Well, first you need to get the latest version of iTextSharp. Just place the dll to your bin folder in your Web Application Project (you're not using Web Projects, are you
) and add a reference to it. Then create a new page, add a PlaceHolder to it, The placeholder will be the section of the HTML that you will export to PDF. You can add some more controls to the placeholder if you need. Inside the placeholder add a GridView and bind it to your datasource.
Add a ASP:Button to the page. This will trigger the export. The code when the button is clicked is doing all the exporting stuff:
protected void ButtonCreatePdf_Click(object sender, EventArgs e)
{
//Set content type in response stream
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition", "attachment;filename=FileName.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
//Render PlaceHolder to temporary stream
System.IO.StringWriter stringWrite = new StringWriter();
System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);
PlaceholderPdf.RenderControl(htmlWrite);
StringReader reader = new StringReader(stringWrite.ToString());
//Create PDF document
Document doc = new Document(PageSize.A4);
HTMLWorker parser = new HTMLWorker(doc);
PdfWriter.GetInstance(doc, Response.OutputStream);
doc.Open();
try
{
//Create a footer that will display page number
HeaderFooter footer = new HeaderFooter(new Phrase("This is page: "), true)
{ Border = Rectangle.NO_BORDER };
doc.Footer = footer;
//Parse Html
parser.Parse(reader);
}
catch (Exception ex)
{
//Display parser errors in PDF.
//Parser errors will also be wisible in Debug.Output window in VS
Paragraph paragraph = new Paragraph("Error! " + ex.Message);
paragraph.SetAlignment("center");
Chunk text = paragraph.Chunks[0] as Chunk;
if (text != null)
{
text.Font.Color = Color.RED;
}
doc.Add(paragraph);
}
finally
{
doc.Close();
}
}
Almost there. Clicking the button will result in an exception:
Control 'GridView1' of type 'GridView' must be placed inside a form tag with runat=server.
This is because we are trying to render the PlaceHolder control in a stream and not in a WebForm. A neat .NET security feature to prevent Injection attacks. This exception is simply ignored by adding the following code to the page:
public override void VerifyRenderingInServerForm(Control control)
{
}
Now, clicking the button again results in another Exception:
RegisterForEventValidation can only be called during Render();
Again, a security feature of .NET. Ignore this excpetion by setting EnableEventValidation="False" in the Page header. If you are concerned about the security of the page, please check official documentation of the features that has been disabled for the page.
Now, clicking the button should result in a PDF document. If not, debug the application and check Debut Output window for exceptions. Probably the XHTML in the placeholder is not valid.
Note that I am using HTMLWorker class instead of the HtmlParser class in the iTextSharp library. According the the author, the HTMLParser is not supported. I tried both, and the HTMLWorker swallows a lot more HTML markup than the HtmlParser.
You probably also want to clean the HTML before parsing it with the HTMLWorker. Typically you want to remove javascript postbacks, anchors etc. from the GridView. This can be achieved with the following code:
string html = stringWrite.ToString();
html = Regex.Replace(html, "</?(a|A).*?>", "");
StringReader reader = new StringReader(html);
You can also extend the HTMLWorker class making it more specialized for your purpose. For instance it would be great to be able to define pagebreaks in the final PDF document. Simply create a new class inherited from HTMLWorker.
public class HTMLWorkerExtended : HTMLWorker
{
public HTMLWorkerExtended(IDocListener document) : base(document)
{}
public override void StartElement(String tag, Hashtable h)
{
if (tag.Equals("newpage"))
document.Add(Chunk.NEXTPAGE);
else
base.StartElement(tag, h);
}
}
Now, simply replace the HTMLWorker with the extended version and add a <newpage /> element to the HTML in the placeholder where you want a new page.
There are some css styles not supported by default. For instance it is not possible to set the background-color style for an image or tablecell/-row. The only solution for adding more style support is changing the iTextSharp source. It's pretty simple, however. Open \text\html\simpleparser\FactoryProperties.cs. In the InsertStyle method add the following code to the foreach loop:
else if (key.Equals(Markup.CSS_KEY_BGCOLOR))
{
Color c = Markup.DecodeColor(prop[key]);
if (c != null)
{
int hh = c.ToArgb() & 0xffffff;
String hs = "#" + hh.ToString("X06", NumberFormatInfo.InvariantInfo);
h["bgcolor"] = hs;
}
}
Update: Another example for adding border color to a table:
First, set the border style for the table, ie. style="border-color: #ff0000;"
Then again, you need to apply the style to FactoryProperties.cs file as in the example above.
else if (key.Equals(Markup.CSS_KEY_BORDERCOLOR))
{
Color c = Markup.DecodeColor(prop[key]);
if (c != null)
{
int hh = c.ToArgb() & 0xffffff;
String hs = "#" + hh.ToString("X06", NumberFormatInfo.InvariantInfo);
h["border-color"] = hs;
}
}
In addition you need to alter the output of the table as there is no default "bordercolor" style property in the IncTable class.
Open IncTable.cs and change the following code in the BuildTable method:
Existing code:
for (int row = 0; row < rows.Count; ++row) {
ArrayList col = (ArrayList)rows[row];
for (int k = 0; k < col.Count; ++k) {
table.AddCell((PdfPCell)col[k]);
}
}
Replace with:
String bordercolor = (String)props["border-color"];
for (int row = 0; row < rows.Count; ++row)
{
ArrayList col = (ArrayList)rows[row];
for (int k = 0; k < col.Count; ++k)
{
PdfPCell cell = (PdfPCell)col[k];
cell.BorderColor = Markup.DecodeColor(bordercolor);
table.AddCell(cell);
}
}
This will change the border color on the cell in the table. Hint: It could be wise to check if the cell already has a border color before overwriting it with the table border color.
Recompile and add the new DLL to your project.
Happy Coding!
43 Comments to Html to Pdf in .NET
Hi Hamang,
When I'm exproting my placeholder to pdf. I'm getting Unnecessary spaces for table cells. Please can you help me for this.
Rajasekhar…
October 14, 2008
"The library has a really understandable API for developers that are not familiar with the PDF specification, me beeing one of them"
What are you referring to? There IS NOT a API for this product just posts on forums from developers in the need for an API…
October 15, 2008
Christoffer, I agree that the the documentation is somewhat lacking. You can read the iTextsharp tutorials at http://itextsharp.sourceforge.net/tutorial/index.html and there are some discussions over at http://www.nabble.com/iTextSharp-f4188.html. The API however, is pretty straight forward and understandable in my opinion.
It has long been looking for this information, Thank you for your work.
January 19, 2009
By itextsharp.dll i can display all html page data into pdf but for table its boder color get black. please suggest solution.
waiting for reply!!
January 19, 2009
@saharsh, it's not default supported, as with many of the styles. You need to add this by yourself. I have updated the article with the code.
March 16, 2009
Can u pls send me the complete sample working code , as there is some issue while using HtlWorkwer..
thanks in advance..
April 2, 2009
thanks, another waste of time.
April 8, 2009
Hi,
Great article btw, it's been a godsend for me.
I have a question which I hope you can help with, I am using your method detailed above for converting HTML into a PDF but some of the text on my html page is in czech and unfortunately some of the characters are ommited from the pdf as the default font for the page does not support them.
I have done some digging around and it looks like I need to change the font Codepage to cp1250 in order to support the special characters.
Could you give any pointers on how this can be achieved please….?
Thanks in advance
James
May 21, 2009
Really you gave me good solution for my issue thank you very much
June 10, 2009
Wonderful article, it helped me alot..
Thanks….:)
August 22, 2009
Great Article thanks,but I'am having problem about characters in gridview.Some turkish characters do not seem.Can we make pdf that support utf-8 encoding or something else.
August 25, 2009
Hi,
can you send me updated Dll? i can not find it anywhere on web.
i am facing problems like background color of table cell.
Thanks,
Mahavir Shah
093289 35308
October 6, 2009
Can you please provide full sample code for conversion from html to pdf ..thanks in advance…anc
October 23, 2009
Picture is not exporting how could i export picture?
December 20, 2009
hello,
first of all thank u very much i got pdf file from asp.net page using this article . but i have one problem how can remove unnecessry space in pdf file. Inshort i want to well fromated file n if possible give me code for display imag also in pdf file.
December 29, 2009
hello Khem Raj
there are two way for images
1. give online path for the image
2. following code
doc.Open();
//add an image to document
iTextSharp.text.Image img1 = iTextSharp.text.Image.GetInstance(Request.MapPath("~/Images/RelIcon-Pdf.jpg"));
img1.Alignment = iTextSharp.text.Image.TITLE;
img1.ScalePercent(100f); // change it's size
doc.Add(img1);
// end of code for image
// here start your code
It looks as though the PDF is trying to create but keep getting "There was an error opening this document. The file is damaged and could not be repaired."
Any advice?
[url=http://www.crme.uiuc.edu/cheap viagra[/url]
cheap viagra
July 16, 2010
hey doc has not footer property and I cant create HeaderFooter
September 7, 2010
This is a nice article. But i am facing problem when i need to parse the HTML having the Fusion Chart. It will not parse the HTML throwing the ERROR.
Please help me if you have any sort of idea regarding this.
Regards,
TJ
September 14, 2010
HeaderFooter class i am not getting. i think i missed the namespace for that. Does any body have idea on that?
Thanks for giving good article.but i have one doubt
\text\html\simpleparser\FactoryProperties.cs
what is this path .i am not getting where to use that for css.
Thanks in advance
Thanks man. I am looking for this coding.
October 4, 2010
I am very happy to see this post. can you please provide after converting pdf. how to allow user to print it. i have print button on aspx page . when they click on that they need to print it using pdf one
November 12, 2010
hello,
Plz can you help me??
I want output pdf with html and css.
i searched lot but cant found any satisfactory solution
it is very urgent..
iTextSharp is good but it can't do what this html to pdf converter can do.
April 1, 2011
I have a doubt when you talking on:
"Open \text\html\simpleparser\FactoryProperties.cs. In the InsertStyle method add the following code to the foreach loop:"
i didn't understand where is this code?
April 16, 2011
Convsert to datagrid to pdf
=====================================================================================\
Using 3.0.3.0 version of iTextSharp.dll
=====================================================================================
Document document = new Document(PageSize.A4, 0, 0, 50, 50);
System.IO.MemoryStream msReport = new System.IO.MemoryStream();
try
{
PdfWriter writer = PdfWriter.GetInstance(document, msReport);
document.AddAuthor("Vimal Lak");
document.AddSubject("Export to PDF");
document.Open();
iTextSharp.text.Table datatable = new iTextSharp.text.Table(gridlastthree.Columns.Count);
datatable.Padding = 2;
datatable.Spacing = 0;
float[] headerwidths = new float[gridlastthree.Columns.Count];
for (int i = 0; i < gridlastthree.Columns.Count; i++)
{
headerwidths[i] = 20;
}
datatable.Widths = headerwidths;
Cell cell = new Cell(new Phrase("Previous SRS History Of Vehicle No :"+ Label2.Text, FontFactory.GetFont(FontFactory.HELVETICA, 16, Font.BOLD)));
cell.HorizontalAlignment = Element.ALIGN_CENTER;
cell.Leading = 30;
cell.Colspan = 5;
cell.Border = Rectangle.NO_BORDER;
// cell.BackgroundColor = new iTextSharp.text.Color(System.Drawing.Color.Gray);
cell.BackgroundColor = new iTextSharp.text.Color(System.Drawing.Color.SteelBlue);
datatable.AddCell(cell);
datatable.DefaultCellBorderWidth = 1;
datatable.DefaultHorizontalAlignment = 1;
datatable.DefaultRowspan = 2;
datatable.AddCell("Sr.No");
datatable.AddCell("SRS NO");
datatable.AddCell("K.M. Reading");
datatable.AddCell("SRS DATE");
datatable.AddCell("Amount");
int count = 0;
for (int i = 0; i < gridlastthree.Items.Count; i++)
{
datatable.DefaultHorizontalAlignment = Element.ALIGN_LEFT;
count = i + 1;
datatable.AddCell(count.ToString());
datatable.AddCell(gridlastthree.Items[i].Cells[1].Text);
datatable.AddCell(gridlastthree.Items[i].Cells[2].Text);
datatable.AddCell(gridlastthree.Items[i].Cells[3].Text);
datatable.AddCell(gridlastthree.Items[i].Cells[4].Text);
}
document.Add(datatable);
}
catch(Exception ex)
{
Response.Write(e);
}
document.Close();
Response.Clear();
Response.AddHeader("content-disposition", "attachment;filename=Report.pdf");
Response.ContentType = "application/pdf";
Response.BinaryWrite(msReport.ToArray());
Response.End();
=====================================================================================
Using 4.1.2.0 Version of ITextSharp.dll
=====================================================================================
Response.ContentType = "application/pdf";
Response.AddHeader("content-disposition","attachment;filename=GridViewExport.pdf");
Response.Cache.SetCacheability(HttpCacheability.NoCache);
StringWriter sw = new StringWriter();
HtmlTextWriter hw = new HtmlTextWriter(sw);
gridlastthree.AllowPaging = false;
gridlastthree.RenderControl(hw);
StringReader sr = new StringReader(sw.ToString());
Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 10f, 0f);
HTMLWorker parser = new HTMLWorker(pdfDoc);
PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
pdfDoc.Open();
parser.Parse(sr);
pdfDoc.Close();
Response.End();
HtmlForm form = new HtmlForm();
StringWriter sw = new StringWriter();
HtmlTextWriter hTextWriter = new HtmlTextWriter(sw);
string html = sw.ToString();
string htmlDisplayText = @"<html><body bgcolor="red"><h4>Dear bishnu2</h4> your address pdp isAn early version of the patterns was workshopped at PLoP After several internal workshops and updates, a later version was
workshopped at PLoP The patterns are now mature enough that I teach a
class based on the patterns at AG Communication
Systems.
Copyright © 1999 AG
Communication Systems Corporation
</body></html>";
htmlDisplayText = htmlDisplayText.Replace("{EMPLOYEETABLE}", html);
Document document = new Document();
MemoryStream ms = new MemoryStream();
PdfWriter writer = PdfWriter.GetInstance(document, ms);
StringReader se = new StringReader(htmlDisplayText);
HTMLWorker obj = new HTMLWorker(document);
document.Open();
obj.Parse(se);
// step 5: we close the document
document.Close();
Response.Clear();
Response.AddHeader("content-disposition", "attachment; filename=report.pdf");
Response.ContentType = "application/pdf";
Response.Buffer = true;
Response.OutputStream.Write(ms.GetBuffer(), 0, ms.GetBuffer().Length);
Response.OutputStream.Flush();
Response.End();
July 30, 2011
Thank you very Mr: Vimal Lak
U save my time a lot god bless u
September 1, 2011
I'm using iTextSharp version 5.1.2. There is no FactoryProperties.cs file, no InsertStyle method, and IncTable.cs file. How can I modify iTextSharp's source to get background color and table border color? I can't figure out how to achive that functionality in this version.
Thnx!
It is somehow relaxing when you read a posting that is not only informative but fun. I will bookmark Html to Pdf in .NET | hamang.net. I have been seeking for information about this subject matter for moths and yours is the best I have located. I trully enjoyed your blog post.
I tried the above code (without Grid and with iTextSharp 5+) version with some modifications. Strangely, there is no error but the PDF that is rendered is BLANK.
Does the parser.parse(reader) call automatically adds parsed HTML to PDF?
If yes, then it is not happening in my case.
Any ideas?
[...] Hamang.net as well [...]
October 19, 2011
thks..
November 28, 2011
I'm using iTextSharp version 5.1.2. There is no FactoryProperties.cs file, no InsertStyle method, and IncTable.cs file. How can I modify iTextSharp's source to get background color and table border color? I can't figure out how to achive that functionality in this version.
Thnx!
November 28, 2011
i found FactoryProperties.cs in 5.0 version when i include your snippet its giving me error
Cannot implicitly convert type 'iTextSharp.text.BaseColor' to 'System.Drawing.Color'
at line
Color c = Markup.DecodeColor(prop[key]);
how to fix this error..
March 2, 2012
i am using itextsharp 5.1.3 version in that i am not able to find the factoryprooerties.cs, my application is converting html to pdf but table width and styles are not proper according to styles..plz help me out in this..
April 27, 2012
Hmm.. on button click, a .pdf is generated, but with this error in the .pdf that opens:
Error! Unable to cast object of type 'iTextSharp.text.html.simpleparser.CellWrapper' to type
'iTextSharp.text.Paragraph'.
May 4, 2012
Can you show how you would also add page numbers to this? I've got this working, but am not sure how to add x of y when using your sample.
May 14, 2012
Love your samples for changing styles in gridviews, but like Ananth, not finding them in the file you say. What version is this for?
June 8, 2012
awesome article….. helped me lots…. thank u very much
Leave a comment
Search
Knut Hamang
Recent Posts
Recent Comments
- Yoteadviertoportero.Wordpress.Com on Using log4net in Web Applications – a real-life example
- Raj on Using log4net in Web Applications – a real-life example
- onitsuka tiger mexico 66 black/mazarine on Working with Visual Studio 2005 and ADO.NET
- reviews on Working with Visual Studio 2005 and ADO.NET
- RamS on Working with Visual Studio 2005 and ADO.NET
September 22, 2008