Cross-Browser Scripting – Part one

December 27, 2009 2 Comments by Richard

The Problem:

I’ll be up-front about something here; I don’t particularly like ActiveX.  I understand a lot of the reasons for creating it, and I won’t go so far as to claim that it shouldn’t exist or anything like that; in fact, it does very well for certain types of things.  The main thing I don’t like about it is the complexity; it’s not something that you just pick up and start using one day.  You have to do a lot of research to understand what is happening, and though I’ve been working on ActiveX Controls hosted in Internet Explorer for over a year now, there is still a lot I don’t understand about what is really happening under the covers.

Because of this, I am a big fan of NPAPI.  I think it’s a great technology.  It’s simple.  It’s straightforward.  It provides an interface for one purpose, and that is to allow 3rd party developers to write browser plugins.  It’s great.  But, there is a problem: in my world, you can’t just say “I like Firefox better than IE, so I’m not going to support IE.”  Why?  I will make no claims here about security, usability, viability, or even disability of any browsers.  I will simply refer to one statistic: according to the w3schools browser statistics, Internet Explorer accounted for over 25% of the total web traffic every month last year.

So we find the classic challenge of all browser plugin developers: Internet Explorer only supports ActiveX controls, other browsers support NPAPI, and a plugin written for one isn’t neccesarily going to be easy to just port to the other.  Thus was born FireBreath, in an attempt to solve this issue.  Today, however, I want to address only one piece of the solution that we use in FireBreath:  How to create a scripting interface that will be “write once, run anywhere” on web browsers.

Common Scripting Interface

When I first began working on plugins, I worked on a project that had 3 seperate defintions of the javascript interface to the plugin.  One was for Mac, one for Internet Explorer on Windows, and one for Firefox on Windows.  The IE version was defined in an ActiveX .idl file and implemented in an ActiveX object.  The Firefox/win one was defined in an “XPCOM” .idl file and implemented in a nsScriptablePeer object (copied from a sample plugin from the mozilla source), and the Mac one was defined using NPRuntime and wrapped the XPCOM class.

This was, in short, a nightmare.

The problem I was initially trying to solve is that we had some APIs that had to be used differently on Mac and Windows, and they differed in less noticeable ways between Firefox and IE on Windows.  Don’t get me wrong — it all worked just fine.  However, from a maintenance point of view, this was certainly not optimal.  I could give a lot of meaningless history here, but eventually I came up with an idea that I dubbed (at that company) the Common Scripting Interface — a class that contained all the aspects that a good javascript object would need, but was not specific to any one browser.

Down with the IDL

Okay, this is where I’m going to start raising a lot of eyebrows.  I respectfully submit that using an IDL, as it is defined by Internet Explorer, is generally a bad idea on a control intended to be used from Javascript.  Why?  Simply put, Javascript is a dynamic language; when you pass parameters around, they are not type constrained.  Most javascript functions don’t care if you give them an integer, a string, a bool, or something else; they just use the parameters they are given.  Of course it’s a bit more complicated than that, but Javascript inherently is dynamic.  Thus, it can be important to have the ability to use a dynamic interface on a plugin that interfaces with it.

I hasten to remind you at this point that making an interface dynamic just for the sake of it being dynamic is a bad idea.  However, there are many use cases where you may have members (particularly properties that expose other javascript objects) that may only be named and become accessible at runtime; it’s too late to add something to the IDL.  I believe this is part of the reason that we have settled on NPRuntime on Gecko-based browsers, and Microsoft created an IDispatchEx interface to better support it on ActiveX.  Thus, we have all the tools we need to create a dynamic interface, and I have loosely patterned it after the NPAPI/NPRuntime interface, which I consider to be a fairly good interface for browser plugins.

What is JSAPI?

If you have looked at all at the FireBreath project that I have been working on for the last several months, you may have looked at or seen reference to the JSAPI class.  JSAPI stands for Javascript API.  JSAPI is intended to be a Javascript compatible object written in C++.  It relies heavily on the FB::variant class, which essentially is a class that you can put any datatype in.  If there is enough interest, I can blog more about that type later, but suffice it to say it’s very similar to boost::any but with data conversion tools built in.  A JSAPI object (an object that extends FB::JSAPI) provides all functionality that a Javascript object should, whether it is used or not.

What does a Javascript object do?

Let’s take a big step backwards now.  Forget anything you know about Browser Plugins or ActiveX Controls.  Forget anything you even know about C++, Java, C#, Logo, or any other language, even, except Javascript, HTML, DHTML, and CSS.  Those are the tools of the browser.  When we design the interface for a Browser Plugin, we shouldn’t be thinking like a C++ developer.  We should be looking at it the way we would if we were writing an object in Javascript; we should provide the interfaces that we would expect to find, and use the tools and syntax that we would expect to find in Javascript.

So, let’s look at what a Javascript object can do:

  • Javascript objects can have methods that accept parameters and return a value
    • var coolData = plugin.doSomething(1, 1, 2, 3, 5, "right now");
  • Javascript objects can expose properties that a caller can get or set:
    • var oldName = plugin.name;
    • plugin.name = "Some cool name";
  • Javascript objects can provide events, to which a caller can attach an event handler in the syntax of their browser:
    • plugin.addEventHandler("click", function() { alert("The user clicked on me! Ouch!"); }, false);
    • plugin.attachEvent("onclick", function() { alert("The user clicked on me! Ouch!"); });

When you write it out like this, it doesn’t seem like much; remember, though, that a property of a javascript object could return any primitive datatype, an array, or even another javascript object, itself containing methods, properties, and events.

The JSAPI Interface

One additional thing to remember about a Javascript Object (now taking into account what we know about C++ as well) is that we don’t control the lifecycle of a javascript object.  What that means is that your plugin may go away before Javascript is done with the object that you gave it.  So, if that object uses any references to your plugin, (And what plugin javascript object wouldn’t? It’s the interface to the page, right?) you need to have a way to deal with not only cleaning up your object even after the plugin goes away, but making sure that the javascript object somehow finds out that the object it depends on is gone and invalidates itself in some way.  One way to do that is by using Shared pointers and Weak pointers, but that’s a topic for another day.

Like the NPObject and IDispatch types that provide its interface with the browser, JSAPI is reference counted, so that it will get destroyed whenever the browser.  Here are the method prototypes from JSAPI.h to satisfy the requirements mentioned above.  You can also look at the full source of JSAPI.h if you want.

Methods

// ...
    virtual bool HasMethod(std::string methodName) = 0;
    virtual variant Invoke(std::string methodName, std::vector<variant>& args) = 0;
// ...

Not much is required; one call to find out if the method exists, one to invoke the method.  Notice that the arguments is passed in a STL vector of type variant; you can pass any datatypes in here, but that will be limited by the Browser-level wrapper that converts the arguments from the browser datatype (VARIANT on IE, NPVariant on firefox) to the JSAPI variant datatype.  I’ll discuss this in more detail in a later post.

Properties

// ...
    virtual bool HasProperty(std::string propertyName) = 0;
    virtual bool HasProperty(int idx) = 0;

    virtual variant GetProperty(std::string propertyName) = 0;
    virtual variant GetProperty(int idx) = 0;
    virtual void SetProperty(std::string propertyName, const variant value) = 0;
    virtual void SetProperty(int idx, const variant value) = 0;
// ...

You’ll notice one strange thing right off the bat here; there are two of each method.  One takes a std::string for the first parameter, one takes an int.  Any thoughts on why?

Imagine the following code:

for (var i = 0; i < plugin.files.length; i++) {
    alert(plugin.files[i]);
}

This is a pretty standard piece of javascript code, right?  Array access notation is pretty common in Javascript.  Javascript, however, treats everything as an object of one kind or another. An array to javascript is really just an object with numeric properties.  Similarly, we need to be able to support numeric properties for array style access.  That is why we have extra property methods for a property identified by an “int”.

Events

// ...
    virtual void registerEventMethod(std::string name, BrowserObjectAPI *event);
    virtual void unregisterEventMethod(std::string name, BrowserObjectAPI *event);
    virtual void registerEventInterface(BrowserObjectAPI *event);
    virtual void unregisterEventInterface(BrowserObjectAPI *event);
    virtual BrowserObjectAPI *getDefaultEventMethod(std::string name);
    virtual void setDefaultEventMethod(std::string name, BrowserObjectAPI *event);

protected:
    // Used to fire an event to the listeners attached to this JSAPI
    virtual void FireEvent(std::string eventName, std::vector<variant>&);
// ...

Events are by far the most difficult piece of this whole class; we have to make allowances for multiple methods of registering events and multiple browsers’ methods for keeping track of the handlers. The BrowserObjectAPI type that you see used here is also a JSAPI object, but one that wraps a browser object (an NPObject or a IDispatch object, depending on the browser).

There are three different ways an event can be registered (at this point in time):

  1. Through a “user-defined” event registering method implemented by our browser scripting object wrapper (this is used by all objects on firefox and all but the root object on IE).
    • The BrowserObjectAPI object will essentially be a method that we can invoke by calling InvokeAsync with empty string (“”) as the method name
  2. Through the Connection Point interface on an ActiveX Control (obviously used on the root object in IE)
    • The BrowserObjectAPI object will be an object on which we can invoke a method of the same name as the event it is attached to
  3. Through a property of the same name as the event (old-style javascript events, as in “document.onload = ….”
    1. The BrowserObjectAPI object will essentially be a method that we can invoke by calling InvokeAsync with empty string (“”) as the method name

These three types are the reason that we have three different register/unregister methods on the browser for events.  When FireEvent is called by the JSAPI child object (that you write), it iterates through all possible event handlers and invokes them asynchronously.  (Events have to be invoked on the main UI thread of the plugin, so the InvokeAsync call on BrowserObjectAPI automatically makes sure that is where it gets called; hence it’s async).

Next time

We have discussed and looked at the basic needs of the JSAPI object.  Part two will discuss the COMJavascriptObject class and the JSAPI_IDispatch helper class that constitute the interface between ActiveX and our JSAPI interfaces.  Part three will cover the NPJavascriptObject that provides the interface between NPRuntime and JSAPI.  Feel free to read ahead in the source =]

In the mean time, if you like where this is going and want to play with FireBreath and see JSAPI in action for yourself, there are some really easy getting started instructions on the FireBreath project page.

2 Comments

  1. Guybrush
    7 years ago

    Hi I am reading about Internet explorer extensibility and I found Browser Extensions of which (Shortcut menu extensions,Toolbars,Explorer Bars,Browser Helper Objects) and Content Extensions (Active Documents, ActiveX Controls, etc) now my problem when I delved further I find BHO and ActiveX Controls a bit confusing because they seem to both have the same power. Wikipedia makes it even worst by stating “The Adobe Acrobat plugin that allows Internet Explorer users to read PDF files within their browser is a BHO.” is a plugin a BHO or an ActiveX control ?

    http://msdn.microsoft.com/en-us/library/aa74131
    http://en.wikipedia.org/wiki/Browser_Helper_Obj

  2. taxilian
    7 years ago

    gee, I’m sure slow at answering comments sometimes. Sorry about that. Technically an ActiveX control can be a BHO — BHO just means “Browser Helper Object”. I believe that BHOs must be ActiveX objects, in fact, though I dont think they have to be controls.

    BHOs have a little more capability to tie into the browser than a normal ActiveX Control, however; you might say that a BHO is analogous to an Extension in FireFox, whereas an ActiveX Control is like a NPAPI plugin. Of course, an ActiveX Control could also be a BHO, and vise versa; they are not mutally exclusive.

Post a Comment

Your email is never published or shared. Required fields are marked *