On Extracting an Element From a Web Page With CSS Styles

The goal: to extract an element from a web page and display it independently from it’s original website.

Note: I initially toyed with running this whole process server-side, and it still may be the more efficient way to to this. This is extraordinarily difficult though, you will soon see why.

First, we fetch the complete web page from our target site. For example, news.google.com. We then dump the web page into an invisible div.

1
2
3
4
5
6
7
8
9
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://news.google.com/');
$html = curl_exec($ch);
?>

<div style="display:none;">
    <?php echo $html; ?>
</div>

Our target element is the Recent News box on the right of the page. The id is ‘s_BREAKING_NEWS_BOX’. Extracting the HTML is easy enough with jQuery:

1
$('<div>').append($('#s_BREAKING_NEWS_BOX').clone()).remove().html();

If we were to just output the HTML at this point, we would get text with default browser styles. We need to copy the computed CSS styles from the original rendering to our new element. Firefox, Chrome and Safari use getComputedStyle, IE uses currentStyle.

Here’s our JavaScript function to do just that:

1
2
3
4
5
6
7
8
function getPropValue(ele, styleProp) {
    if (ele.currentStyle) {
        var y = ele.currentStyle[styleProp];
    } else if (window.getComputedStyle) {
        var y = document.defaultView.getComputedStyle(ele, null).getPropertyValue(styleProp);
    }
    return y;
}

This function requires an element, and the CSS property that we want to grab. So we can define an array of properties we want to copy, then loop through these properties to get the entire style for the element.

1
2
3
4
5
6
7
8
9
var styles = ["color", "font-family", "font-size", "line-height", "white-space", "padding", "display", "float", "border", "border-top", "border-right", "border-bottom", "border-left", "border-color", "border-width", "border-style", "padding-top", "padding-right", "padding-bottom", "padding-left", "height", "font-weight", "margin-top", "margin-left", "margin-bottom", "margin-right", "text-decoration"];

function getStyles(ele, styles) {
    var values = new Array();
    for (var i=0; i < styles.length; i++) {
        values[i] = getPropValue(ele, styles[i]);
    }
    return values;
}

If we give getStyles an element reference and an array of styles, we get an array of computed CSS values for that element. But remember, we need to do this for every single child element, not just the parent s_BREAKING_NEWS_BOX element. So we need a recursive loop that runs getStyles on the parent, the children, grandchildren, etc. We also have a running count, index, to keep give each elements styles a position in an array, element_styles. Thank to Adam Bratt for help getting this to work.

1
2
3
4
5
6
7
8
9
10
11
function loopChildrenGrab(this_ele, styles, element_styles) {
    element_styles[index] = getStyles(this_ele, styles);
    index++;

    if ( $(this_ele).children().length > 0 ) {
        $(this_ele).children().each(function(){
            loopChildrenGrab(this, styles, element_styles);
        });
    }
    return element_styles;
}

Now we have a multidimensional array of every single elements style, order by the elements position in the DOM. So we can delete the whole web page that we loaded at first, and then run the reverse of the previous functions to assign each CSS property to our copied HTML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
function pushStyles(ele, styles, values) {
    for (var i=0; i < styles.length; i++) {
        $(ele).css(styles[i], values[i]);
    }
}

function loopChildrenPush(this_ele, styles, element_styles) {
    pushStyles(this_ele, styles, element_styles[count]);
    count++;

    if ( $(this_ele).children().length > 0 ) {
        $(this_ele).children().each(function() {
            loopChildrenPush(this, styles, element_styles);
        });
    }
}

Tada! We now have successfully copied the HTML and flattened the CSS for each element. s_BREAKING_NEWS_BOX can now be loaded independently of its parent page.

What would it take to do this server-side? In short, render the HTML in some sort of web view, then extract the flattened CSS for every element. While it is possible, and most likely the way to proceed with this project, letting the browser do the heavy lifting rendering CSS is definitely easier in the short term.

Comments