How to lex javascript for an assert_js system?


#1

Ruboids:

Someone recently posted this:

o There’s a difference between syntax checking and verification of
functional correctness

That is indeed why a test case that spot-checks your syntax is less
useful
than a test case that understands your needs. All unit testing
starts at the former and aims at the latter. Here’s an example of
testing Javascript’s syntax:

ondblclick = div.attributes[‘ondblclick’]
assert_match /^new Ajax.Updater(“hammy_id”/, ondblclick

It trivially asserts that a DIV (somewhere) contains an ondblclick
handler, and that this has a Script.aculo.us Ajax.Updater in it. The
assertion naturally cannot test that the Updater will indeed update a
DIV.

To get closer to the problem, we might decide to get closer to the
Javascript. We may need a mock-Javascript system, to evaluate that
string.
It could return the list of nuances commonly called a “Log String Test”.
Here’s an example, using a slightly more verbose language:

public void testPaintGraphicsintint() {
Mock mockGraphics = new Mock(Graphics.class);
mockGraphics.expects(once()).method(“setColor”).with(eq(Color.decode(“0x6491EE”)));
mockGraphics.expects(once()).method(“setColor”).with(same(Color.black));
mockGraphics.expects(once()).method(“drawPolygon”);
mockGraphics.expects(once()).method(“drawPolygon”);
hex.paint((Graphics) mockGraphics.proxy());
mockGraphics.verify();
}

From the top, that mocks your graphics display driver, and retains its
non-retained graphics commands. Then the mockGraphics object
verifies a certain series of calls, with such-and-so parameters.

(That is a Log String Test because it’s the equivalent of writing
commands
like “setColor” and “drawPolygon” into a log file, and then reading this
to assert things.)

That test case indeed fits the ideal of moving away from testing raw
syntax, and closer to testing semantics. Such a test, for example, could
more easily ignore extraneous calls, and then check that two dynamic
polygons did not overlap.

Now suppose I envision this testage:

def ondblclick(ypath)
%(new Ajax.Updater(“node”,
“/ctrl/act”,
{ asynchronous:true,
evalScripts:true,
method:“get”,
parameters:“i_b_a=Parameter” })
).gsub("\n", ‘’).squeeze(’ ')
end

def test_some_js
js = ondblclick()
parse = assert_js(js)
statement = parse.first
assert_equal ‘new Ajax.Updater’, statement.get_method
assert_equal ‘“node”’, statement.get_param(0)
assert_equal ‘"/ctrl/act"’, statement.get_param(1)
json = statement.get_param(2)
assert_equal true, json[‘evalScripts’]
end

The goal is the target JS can flex easily - can reorder its Json, or
change fuzzy details, or add new features - without breaking the tests.
Ideally, only changes that break project requirements will break tests.

Now suppose I want to write that assert_js() using less than seven
billion
lines of code.

The first shortcut is to only parse code we expect. I’m aware that’s
generally against the general philosophy of parsing, but I’m trying to
sell an application, not a JS parser. That’s a private detail. I can
accept, for example, only parsing the JS emitted by Rails’s standard
gizmos.

So before getting down to some actual questions, here’s the code my
exquisite parsing skills have thrashed out so far:

def test_assert_js
source = ‘new Ajax.Updater(’+
'“node”, '+
'"/controller/action", '+
'{ asynchronous:true, '+
'evalScripts:true, '+
'method:“get”, '+
‘parameters:“i_b_a=Parameter” })’

js = assert_js(source)
assert_equal 'new Ajax.Updater', js.keys.first
parameters = js.values.first['()']
assert_equal '"node"', parameters[0]
assert_equal '"/controller/action"', parameters[1]
json = parameters[2]['{}']
assert_equal 'true', json['evalScripts']
assert_equal '"get"', json['method']
assert_equal '"i_b_a=Parameter"', json['parameters']

end

Now that’s good enough for government work, and I could probably upgrade
the interface to look more like my idealized example…

…but the implementation is a mish-mash of redundant
Regexps and run-on methods:

Qstr = /^("?["]),?\s/

def assert_json(source)
js = {}
identifier = /([[:alnum:]_]+):confused:

while m = source.match(identifier)
  source = m.post_match
  n = source.match(/^([[:alnum:]_]+),?\s*/)
  n = source.match(Qstr) unless n
  break unless n
  js[m.captures[0]] = n.captures[0]
  source = n.post_match
end

return { '{}' => js }

end

def assert_js(source)
js = {}
qstr = /^("?["]),?\s/
json = /^({.}),?\s/

if source =~ /^([^\("]+)(.*)$/
  js[$1] = assert_js($2)
elsif source =~ /^\((.*)\)$/
  js['()'] = assert_js($1)
else
  index = 0

  while (m = source.match(qstr)) or
        (m = source.match(json))
    break if m.size < 1

    if source =~ /^\{/
      js[index] = assert_json(m.captures[0])
    else
      js[index] = m.captures[0]
    end

    source = m.post_match
    index += 1
  end
end

return js

end

Now the questions. Is there some…

…way to severely beautify that implementation?
…lexing library I could easily throw in?
…robust JS Lexer library already out there?
…assert_js already out there?


#2

On 12/30/06, Phlip removed_email_address@domain.invalid wrote:

Now the questions. Is there some…

…way to severely beautify that implementation?
…lexing library I could easily throw in?
…robust JS Lexer library already out there?
…assert_js already out there?

Parsing with regexps makes baby Jesus cry. Javascript itself can be
quite flexible, so it may be possible to do enough with a standard
interpreter’s run-time. Alternatively, have a look at the Mozilla
projects repository for a real interpreter you could hack on.


#3

spooq wrote:

Parsing with regexps makes baby Jesus cry. Javascript itself can be
quite flexible, so it may be possible to do enough with a standard
interpreter’s run-time. Alternatively, have a look at the Mozilla
projects repository for a real interpreter you could hack on.

This question is for an academic paper, so it has even more ridiculous
constraints on the amount of fun I can have. (And why hasn’t the
industry invented a Lex in a Bottle, using Regexp-like strings and a
BNF notation?)

Can I add a parser to the Syntax library? It only does Ruby, XML, and
YAML so far…

And, yes, JavaScript was designed to be parsed, unlike some other
languages…


#4

On 1/2/07, Phlip removed_email_address@domain.invalid wrote:

BNF notation?)
Not sure exactly how you want to improve on lex?

Can I add a parser to the Syntax library? It only does Ruby, XML, and
YAML so far…

I don’t see how that’s better than grabbing the Javascript grammar off
the web in BNF :
http://www.mozilla.org/js/language/es4/formal/lexer-grammar.html
http://www.antlr.org/grammar/1153976512034/ecmascriptA3.g
etc.

And, yes, JavaScript was designed to be parsed, unlike some other
languages…

No names need be mentioned… :wink:

http://corion.net/perl-dev/Javascript-PurePerl.html does javascript to
xml, which seems the best/quickest solution that I’ve seen in my 2
minutes of googling. Just write enough perl to put that into a file
somewhere, and get into a nicer language ASAP :wink:


#5

http://lxr.mozilla.org/mozilla/source/js/src/js.c

Have a look around line 2315.

Doing some investigation into aspect-oriented programming in
javascript may also be worthwhile.

I’ll stop now :slight_smile:


#6

spooq wrote:

I might go with Javascript-Pure-Perl - see below. The following is just
wrap-ups.

http://lxr.mozilla.org/mozilla/source/js/src/js.c

Have a look around line 2315.

Interesting, but I don’t get It. The IT object … exists at debug
time, and traces all its calls?

Not sure exactly how you want to improve on lex?

Regexp is itself also a “little language”. However, to get to the
language, we don’t need to write a .regex file, compile it with special
compilers, produce a .c file, compile this, link into it, bind to it,
yack yack yack, and so on just to use it.

So, I envision Ruby lines like Lex.new.e(’ LetterE -> E | e’). Instead
of externally compiling the little language, we just host it.

That is not important for the current project…

Can I add a parser to the Syntax library? It only does Ruby, XML, and
YAML so far…

I don’t see how that’s better than grabbing the Javascript grammar off
the web in BNF :

In theory, I only need a dirt-simple way to spot-check the source; I’m
not writing a JavaScript interpreter. But it could be better if it
doesn’t force me to externally compile the lexer.

http://corion.net/perl-dev/Javascript-PurePerl.html does javascript to
xml

Righteous! The project already uses XML (and unit tests with
assert_xpath), so that will fit right in!

Thanks! I honestly would never have thought to try Perl…


#7

On 1/2/07, Phlip removed_email_address@domain.invalid wrote:

time, and traces all its calls?
It just makes a pre-defined object that you can poke at when you run
scripts in that interpreter. The implication was that you could
recreate the Ajax.* methods and use them to log.

Not sure exactly how you want to improve on lex?

Regexp is itself also a “little language”. However, to get to the
language, we don’t need to write a .regex file, compile it with special
compilers, produce a .c file, compile this, link into it, bind to it,
yack yack yack, and so on just to use it.

Lex generates C and lives by the rules of that coding universe. Doing
stuff at run-time can be difficult and wierd there. Much easier to
transform to a familiar language and compile and link with exactly the
same tools you use for the rest of your project. Yack (yacc) is an
entirely different project :wink:

So, I envision Ruby lines like Lex.new.e(’ LetterE -> E | e’). Instead
of externally compiling the little language, we just host it.

Which would be living by the rules and expectations of the Ruby
universe. Not that theres anything wrong with that; I happen to quite
like living there myself. It’s just useful to remember there’s more
than one way of doing things.

That is not important for the current project…

Agreed.

http://corion.net/perl-dev/Javascript-PurePerl.html does javascript to
xml

Righteous! The project already uses XML (and unit tests with
assert_xpath), so that will fit right in!

Not sure what assert_path is, guess it’s a function from some kind of
test harness.

Thanks! I honestly would never have thought to try Perl…

It’s not exactly my first choice either, but any port will do in a
storm. At least you found one acceptable suggestion in my ramblings :slight_smile:


#8

To spooq:

Consider this snip of C++, via Boost/Spirit:

rule<> LetterE = chr_p(‘e’) | chr_p(‘E’);

The bad news, of course, is all the excessive chr_p stuff. The good
news is that’s raw C++, not even a string, and it all compiles at
compile time.

Giles B. wrote:

Sorry, why do you want to do this in the first place? The original
post mentioned unit testing, if you want to unit test JavaScript,
there are much easier ways.

It’s a secret. You’l see!..


#9

Sorry, why do you want to do this in the first place? The original
post mentioned unit testing, if you want to unit test JavaScript,
there are much easier ways.


#10

On 1/2/07, Phlip removed_email_address@domain.invalid wrote:

To spooq:

Consider this snip of C++, via Boost/Spirit:

rule<> LetterE = chr_p(‘e’) | chr_p(‘E’);

The bad news, of course, is all the excessive chr_p stuff. The good
news is that’s raw C++, not even a string, and it all compiles at
compile time.

Boost is indeed cool, they push the boundaries of C++ further than
anyone. I really like their XML parser, much better than that horrid
Xerces port. C++ != C though, especially when lex was written. :slight_smile:

Could you let me know how the perl script is going, either here or
off-list?


#11

On 1/2/07, Giles B. removed_email_address@domain.invalid wrote:

Sorry, why do you want to do this in the first place? The original
post mentioned unit testing, if you want to unit test JavaScript,
there are much easier ways.

No doubt, I said as much in my first reply, but I elaborated on the
parsing because that interests me.


#12

spooq wrote:

Could you let me know how the perl script is going, either here or off-list?

Awesome - it works perfectly as a lexer, and it only produces two tiny
bugs (so far). One is ‘new Ajax.Updater’ doesn’t fly, and you need 'ajax

new Ajax.Updater’. The other is you gotta have a ; on the ends of lines.

So here’s a sample test case. I am attempting to test-FIRST Javascript
(thru Rails). That’s much harder than just acceptance-testing it thru
the
existing test rigs, but the rewards will be substantial.

assert_xpath '/form/textarea' do |textarea|
  assert_js textarea.attributes['onkeydown'] do
    assert_xpath 'Statement[1]' do
      assert_xpath '//Identifier[ @name = "Callee" and . = 

“editor_keydown” ]’
end
end
end

assert_xpath asserts that the hidden @xdoc variable can call
XPath.first()
on the given string without returning nil. So you can pack lots of
goodies
into your XPath strings, including queries and string comparisons.

The first assert_xpath calls after something generated XHTML and then
loaded it into @xdoc. So we can assert this XHTML contains a FORM
containing a TEXTAREA.

assert_js just copies the given Javascript into a temporary file, calls
jsToXml.pl on it, and loads this into @xdoc.

The above test case trivially tests that <TEXTAREA
onkeydown=‘editor_keydown(event);’ …>. That looks too trivial to
test-first, but it’s the little things that add up into big bugs if you
don’t apply a little rigor to your development process!


#13

Phlip wrote:

accept, for example, only parsing the JS emitted by Rails’s standard
gizmos.

Now the questions. Is there some…

…way to severely beautify that implementation?

For Rails apps, by way of not testing the library('s JavaScript
output): stub out RJS’s JavaScriptGenerator (‘page’ object)? That was
the first thing I thought when I heard about RJS. Surfing the code, it
looks quite possible using Mocha if it was desirable. I haven’t thought
through all the ramifications. The OP includes testing the source page
– maybe also some work on the APIs to link to and submit forms to Ajax
actions would permit those to be stubbed out as well.

Has someone already created such a beast? (Besides Google’s GWT?)


Ryan P.
Obtiva Training and Consulting
Agile, Ruby, Rails, Java Eclipse RCP
http://obtiva.com/


#14

Ryan P. wrote:

For Rails apps, by way of not testing the library('s JavaScript
output): stub out RJS’s JavaScriptGenerator (‘page’ object)?

Ultimately we are up against a “log string test”. That’s where you call
an
emulator or a real deal of some type, it emits a log of its behavior,
and
you parse into this log.

You could, for example, take some server program, turn its log level up,
call a high level function, read the log file as a string, and perform
Regular Expressions on it to pull out target data. (/Error (.*)/ is a
good
start!:wink:

Next, you might write a mock that records the calls sent to it as a
sequence
of data items, like the MockGraphics example I started this thread with.

Here’s the diagnostic when an assert_rjs fails:

Content did not include:
$(“moose_panel”).width = “50%”;.
<"$(“wiki_panel”).width = “50%”;\n$(“mouse_panel”).width =
“50%”;\nElement.update(“mouse_panel”, “<iframe height=\“100%\”
src=\”/character/hammy_squirrel\" id=\“test_frame\”
width=\“100%\”/>");\nElement.show(“mouse_panel”);"> expected to
be =~
</$(“moose_panel”).width\ =\ “50%”;/>.

Note that’s not even perfectly robust for a log string test. I could
write
page[‘moose_panel’].width = ‘50%’, then a few lines later write
page[‘mouse_panel’].width = ‘50%’, and this won’t catch the bug:
assert_rjs
:page, ‘moose_panel’, :width=, ‘50%’. It would find the first line
spelled
right, not the later line spelled wrong.

A better Log String Test would snarf each line as it tested.

Don’t get me wrong - assert_rjs is an excellent place to start; it’s a
lowest common denominator that at least matches Rails’ incredible talent
for
lean and expressive statements. I would use it first before seeking a
way to
test semantics.

But that’s what a mock RJS jigger would do - test that we called our RJS
object in such-and-so ways.

That was
the first thing I thought when I heard about RJS. Surfing the code, it
looks quite possible using Mocha if it was desirable. I haven’t thought
through all the ramifications. The OP includes testing the source page
– maybe also some work on the APIs to link to and submit forms to Ajax
actions would permit those to be stubbed out as well.

I want a test case that fails if two Ajax commands overlap each other
and
blot each other out. That requires emulating DOM, and I really think
someone
smarter than I could do the equivalent in 2% of the lines of code I
envision.

Has someone already created such a beast? (Besides Google’s GWT?)

http://www.google.com/search?domains=code.google.com&sitesearch=code.google.com&q=test

They seem aware of JUnit. :wink:


#15

Giles B. wrote:

… just trying to figure out if it’s an exercise for the
challenge itself or to address some obscure flaw in the existing
techniques.

Yes.


#16

On 1/3/07, spooq removed_email_address@domain.invalid wrote:

On 1/2/07, Giles B. removed_email_address@domain.invalid wrote:

Sorry, why do you want to do this in the first place? The original
post mentioned unit testing, if you want to unit test JavaScript,
there are much easier ways.

No doubt, I said as much in my first reply, but I elaborated on the
parsing because that interests me.

No harm there, just trying to figure out if it’s an exercise for the
challenge itself or to address some obscure flaw in the existing
techniques.