Detect Link Clicks Inside dangerouslySetInnerHTML

About dangerouslySetInnerHTML

This prop simply allows us to inject our React code with a raw HTML string.

It appears quite scary at first with the word dangerously right at the front, but that's just a simple reminder to the developer using it that it can potentially be dangerous if not used mindfully.

Why Is It dangerous?

dangerouslySetInnerHTML accepts a raw HTML string, any HTML string. Now imagine inside the raw HTML string you are going to pass into dangerouslySetInnerHTML had a line like this.

<script src="//website.com/trojan.js">

Your website will load this malicious script and who knows what harm it will do on your site. You can read more on Cross-Site Scripting (XSS) here.

The lesson? Be mindful of what HTML we are going to be passing in dangerouslySetInnerHTML. For example, don't straight up take user input (say a textbox) and then display it with dangerouslySetInnerHTML in your code.

Can I Use dangerouslySetInnerHTML?

There are a couple of places where I've used dangerouslySetInnerHTML.

Headless CMS

A Headless CMS is just the back-end part of a CMS. This allows us to use whatever we want for the front-end. We can retrieve the data we need from the Headless CMS via API calls.

A CMS usually has a WYSIWYG editor and the content in the editor is typically saved as HTML. If we were to retrieve the content via an API call, we'll get back an HTML string that represents the content in the WYSIWYG editor.

If we want to display a whole blog post onto our site, the easist way will be to dump the whole HTML string into dangerouslySetInnerHTML. Like so:

<article dangerouslySetInnerHTML={{ __html: postHtml }} />

Now depending on the CMS platform, the WYSIWYG can allow <script/> tags. If someone makes a mistake or the CMS platform gets hacked. A malicious <script/> tag could end up going all the way into your React app.

A good way to mitigate for XSS, in this case, is to sanitize the HTML string. You can do this with a library such as sanitize-html. This will strip away all potentially harmful tags.

Static HTML Pages

We may have a copy for our site in Word/Google Doc. Say, for example, a privacy policy page or a terms & condition page. It would be nice if we can just copy and paste the text and just dump it onto our site instead of modifying the HTML and turning them into React components.

For example. Turning this:

<h1>Hello World</h1>

Into this:

<Typography component="h1" variant="h2">Hello World</Typograph>

What I typically do in this case is to use a Google Doc to HTML converter where I can copy my Google Doc and then paste it into this converter and it gives me back raw HTML. From there I just copy and paste the raw HTML into my project and have my React code read this HTML (either from a file or a variable) and dangerouslySetInnerHTML the content into the site.

Now, whenever the content, and we need to update those static pages change. I can just dump the raw HTML onto my site and not worry about turning things into React Components again.

Using The Router For <a/> Tag Clicks

So for this blog, I have each post in raw HTML. The downside is that the <a/> tags, when clicked, default to the browser's behaviour and not as a React router <Link/> component. Meaning we lose the benefits of super-fast internal page switching.

The Approach

An approach would be to parse the HTML string and detect for the <a/> tag and replace it with our desired React component. Doesn't sound too bad but luckily I used my gift of Googling things for a bit and found a pretty interesting library, html-to-react. We'll be using this as an alternative to dangerouslySetInnerHTML.

Current Method

This is my current solution.

// Whatever you HTML tags you want to support
type HtmlTag = 'div' | 'article' | 'section';

type Props = {
  tag: HtmlTag,
  html: string,
};

export default const RawHtml = ({
  html,
  tag: Tag
}: Props) => (
  <Tag dangerouslySetInnerHTML={{
    __html: html
  }} />
);

As you can see, it just dangerously inserts everything. And since we are not using the <Link/> component from Next.js, all of our link clicks inside the html string will force the browser to hit our server for that new page.

Replace <a/> tags with <Link/>

Let's modify our component to use html-to-react

import HtmlToReact from 'html-to-react';


type Props = {
  html: string,
};

export default const  RawHtml = ({ html }: Props) => {
  htmlParser.parseWithInstructions(
    html,
    () => true,
    processingInstructions
  )
};

processingInstruction is an array of instructions that will tell html-to-react how to process the raw HTML.

Now let's create the special instruction for processing the <a/> tags.

import Link from 'next/link';


const aTagInstruction = {
  replaceChildren: false,
  shouldProcessNode: node => node.name === 'a',
  processNode: (node, children, idx) => {
    const { href, ...props } = node.attribs;
    const isExternal = !/^\/[\w-]/.test(href);
    return (
      <Link href={href}>
        <a
          key={idx}
          rel={isExternal ? 'noopener' : undefined}
          target={isExternal ? '_blank' : undefined}
          {...props}
        >
          {children}
        </a>
      </Link>
    );
  },
};

Be sure to omit replaceChildren or set it to false. Otherwise, you'll end up scratching your head for a good while wondering why you have nested <a/> tags when you inspect element.

We're also adding in the rel attribute and setting it to noopener. This prevents malicious external sites from doing anything with us. Not sure why we will be linking to them, but better safe than sorry.

And of course, making external sites open in a new tab by setting the target to _blank.

Next, let's create a default instruction to tell the processor how to handle the rest of the other nodes.

const defaultInstruction = {
  shouldProcessNode: () => true,
  processNode: processNodeDefinitions.processDefaultNode,
};

And then put everything in processingInstructions:

const processingInstructions = [
  aTagInstruction,
  defaultInstruction,
];

And that's it. Pretty neat eh?

Conclusion

Now the internal links in the raw HTML become React router <Link/> components, meaning they are super fast 🏃‍♂️💨