webcam porn

Pdf

Html to Pdf in .NET

Thursday, August 14th, 2008 | .NET | 43 Comments

I searched around for some OpenSource projects converting HTML to PDF, and stumbled upon a great library called iTextSharp . It's actually a Java library ported to .NET (typically isn't it?), and it has some really nice features. There are some good commercial products out there doing the same, but in my opinion this is core functionality, so if one can get it for free and even get the source, nothing is better than that!

Regarding PDF creation, iTextSharps main function is generating PDF from scratch, not converting them from HTML. The library has a really understandable API for developers that are not familiar with the PDF specification, me beeing one of them.

Playing around with the API and reading some news lists and forum posts, I finally managed to get a working sample on how to export a GridView to PDF. This is a pretty simple sample, but you can play around with the API to add more formatting and do your stuff. Actually you can export anything in the XHTML, providing the markup is legal. However, not all html tags are supported. The intention of the author was not to make a HTML2PDF converter, but more like create PDF's from HTML if the markup supports the engine. So you will probably not be able to convert dynamic content you do not have control over, but it's excellent for creating reports etc. Supported tags are: "ol ul li a pre font span br p div body table td th tr i b u sub sup em strong s strike h1 h2 h3 h4 h5 h6 img"

So how does it work, then? Well, first you need to get the latest version of iTextSharp. Just place the dll to your bin folder in your Web Application Project (you're not using Web Projects, are you ;-) ) and add a reference to it. Then create a new page, add a PlaceHolder to it, The placeholder will be the section of the HTML that you will export to PDF. You can add some more controls to the placeholder if you need. Inside the placeholder add a GridView and bind it to your datasource.

Add a ASP:Button to the page. This will trigger the export. The code when the button is clicked is doing all the exporting stuff:

protected void ButtonCreatePdf_Click(object sender, EventArgs e)

{

//Set content type in response stream

Response.ContentType = "application/pdf";

Response.AddHeader("content-disposition", "attachment;filename=FileName.pdf");

Response.Cache.SetCacheability(HttpCacheability.NoCache);

//Render PlaceHolder to temporary stream

System.IO.StringWriter stringWrite = new StringWriter();

System.Web.UI.HtmlTextWriter htmlWrite = new HtmlTextWriter(stringWrite);

PlaceholderPdf.RenderControl(htmlWrite);

StringReader reader = new StringReader(stringWrite.ToString());

//Create PDF document

Document doc = new Document(PageSize.A4);

HTMLWorker parser = new HTMLWorker(doc);

PdfWriter.GetInstance(doc, Response.OutputStream);

doc.Open();

try

{

//Create a footer that will display page number

HeaderFooter footer = new HeaderFooter(new Phrase("This is page: "), true)

{ Border = Rectangle.NO_BORDER };

doc.Footer = footer;

//Parse Html

parser.Parse(reader);

}

catch (Exception ex)

{

//Display parser errors in PDF.

//Parser errors will also be wisible in Debug.Output window in VS

Paragraph paragraph = new Paragraph("Error! " + ex.Message);

paragraph.SetAlignment("center");

Chunk text = paragraph.Chunks[0] as Chunk;

if (text != null)

{

text.Font.Color = Color.RED;

}

doc.Add(paragraph);

}

finally

{

doc.Close();

}

}

Almost there. Clicking the button will result in an exception:

Control 'GridView1' of type 'GridView' must be placed inside a form tag with runat=server.

This is because we are trying to render the PlaceHolder control in a stream and not in a WebForm. A neat .NET security feature to prevent Injection attacks. This exception is simply ignored by adding the following code to the page:

public override void VerifyRenderingInServerForm(Control control)

{

}

Now, clicking the button again results in another Exception:

RegisterForEventValidation can only be called during Render();

Again, a security feature of .NET. Ignore this excpetion by setting EnableEventValidation="False" in the Page header. If you are concerned about the security of the page, please check official documentation of the features that has been disabled for the page.

Now, clicking the button should result in a PDF document. If not, debug the application and check Debut Output window for exceptions. Probably the XHTML in the placeholder is not valid.

Note that I am using HTMLWorker class instead of the HtmlParser class in the iTextSharp library. According the the author, the HTMLParser is not supported. I tried both, and the HTMLWorker swallows a lot more HTML markup than the HtmlParser.

You probably also want to clean the HTML before parsing it with the HTMLWorker. Typically you want to remove javascript postbacks, anchors etc. from the GridView. This can be achieved with the following code:

string html = stringWrite.ToString();

html = Regex.Replace(html, "</?(a|A).*?>", "");

StringReader reader = new StringReader(html);

You can also extend the HTMLWorker class making it more specialized for your purpose. For instance it would be great to be able to define pagebreaks in the final PDF document. Simply create a new class inherited from HTMLWorker.

public class HTMLWorkerExtended : HTMLWorker

{

public HTMLWorkerExtended(IDocListener document) : base(document)

{}

public override void StartElement(String tag, Hashtable h)

{

if (tag.Equals("newpage"))

document.Add(Chunk.NEXTPAGE);

else

base.StartElement(tag, h);

}

}

Now, simply replace the HTMLWorker with the extended version and add a <newpage /> element to the HTML in the placeholder where you want a new page.

There are some css styles not supported by default. For instance it is not possible to set the background-color style for an image or tablecell/-row. The only solution for adding more style support is changing the iTextSharp source. It's pretty simple, however. Open \text\html\simpleparser\FactoryProperties.cs. In the InsertStyle method add the following code to the foreach loop:

else if (key.Equals(Markup.CSS_KEY_BGCOLOR))

{

Color c = Markup.DecodeColor(prop[key]);

if (c != null)

{

int hh = c.ToArgb() & 0xffffff;

String hs = "#" + hh.ToString("X06", NumberFormatInfo.InvariantInfo);

h["bgcolor"] = hs;

}

}

Update: Another example for adding border color to a table:

First, set the border style for the table, ie. style="border-color: #ff0000;"

Then again, you need to apply the style to FactoryProperties.cs file as in the example above.

else if (key.Equals(Markup.CSS_KEY_BORDERCOLOR))
{
    Color c = Markup.DecodeColor(prop[key]);
    if (c != null)
    {
        int hh = c.ToArgb() & 0xffffff;
        String hs = "#" + hh.ToString("X06", NumberFormatInfo.InvariantInfo);
        h["border-color"] = hs;
    }
}

In addition you need to alter the output of the table as there is no default "bordercolor" style property in the IncTable class.

Open IncTable.cs and change the following code in the BuildTable method:
Existing code:

for (int row = 0; row < rows.Count; ++row) {
    ArrayList col = (ArrayList)rows[row];
    for (int k = 0; k < col.Count; ++k) {
        table.AddCell((PdfPCell)col[k]);
    }
}

Replace with:

String bordercolor = (String)props["border-color"];
for (int row = 0; row < rows.Count; ++row)
{
    ArrayList col = (ArrayList)rows[row];
    for (int k = 0; k < col.Count; ++k)
    {
        PdfPCell cell = (PdfPCell)col[k];
        cell.BorderColor = Markup.DecodeColor(bordercolor);
        table.AddCell(cell);
    }
}

This will change the border color on the cell in the table. Hint: It could be wise to check if the cell already has a border color before overwriting it with the table border color.

Recompile and add the new DLL to your project.

Happy Coding!

Tags: