By exploiting some peculiarities of the popular Web programming
framework Ruby on Rails, MIT researchers have developed a system that
can quickly comb through tens of thousands of lines of application
code to find security flaws.
In tests on 50 popular Web applications written using Ruby on Rails,
the system found 23 previously undiagnosed security flaws, and it took
no more than 64 seconds to analyze any given program.
The researchers will present their results at the International
Conference on Software Engineering, in May.
According to Daniel Jackson, professor in the Department of Electrical
Engineering and Computer Science, the new system uses a technique
called static analysis, which seeks to describe, in a very general
way, how data flows through a program.
"The classic example of this is if you wanted to do an abstract
analysis of a program that manipulates integers, you might divide the
integers into the positive integers, the negative integers, and zero,"
Jackson explains. The static analysis would then evaluate every
operation in the program according to its effect on integers' signs.
Adding two positives yields a positive; adding two negatives yields a
negative; multiplying two negatives yields a positive; and so on.
"The problem with this is that it can't be completely accurate,
because you lose information," Jackson says. "If you add a positive
and a negative integer, you don't know whether the answer will be
positive, negative, or zero. Most work on static analysis is focused
on trying to make the analysis more scalable and accurate to overcome
those sorts of problems."
With Web applications, however, the cost of accuracy is prohibitively
high, Jackson says. "The program under analysis is just huge," he
says. "Even if you wrote a small program, it sits atop a vast edifice
of libraries and plug-ins and frameworks. So when you look at
something like a Web application written in language like Ruby on
Rails, if you try to do a conventional static analysis, you typically
find yourself mired in this huge bog. And this makes it really
infeasible in practice."
That vast edifice of libraries, however, also gave Jackson and his
former student Joseph Near, who graduated from MIT last spring and is
now doing a postdoc at the University of California at Berkeley, a way
to make to make static analysis of programs written in Ruby on Rails
practical.
A library is a compendium of code that programmers tend to use over
and over again. Rather than rewriting the same functions for each new
program, a programmer can just import them from a library.
Ruby on Rails — or Rails, as it's called for short — has the
peculiarity of defining even its most basic operations in libraries.
Every addition, every assignment of a particular value to a variable,
imports code from a library.
Near rewrote those libraries so that the operations defined in them
describe their own behavior in a logical language. That turns the
Rails interpreter, which converts high-level Rails programs into
machine-readable code, into a static-analysis tool. With Near's
libraries, running a Rails program through the interpreter produces a
formal, line-by-line description of how the program handles data.
In his PhD work, Near used this general machinery to build three
different debuggers for Ruby on Rails applications, each requiring
different degrees of programmer involvement. The one described in the
new paper, which the researchers call Space, evaluates a program's
data access procedures.
Near identified seven different ways in which Web applications
typically control access to data. Some data are publicly available,
some are available only to users who are currently logged in, some are
private to individual users, some users — administrators — have access
to select aspects of everyone's data, and so on.
For each of these data-access patterns, Near developed a simple
logical model that describes what operations a user can perform on
what data, under what circumstances. From the descriptions generated
by the hacked libraries, Space can automatically determine whether the
program adheres to those models. If it doesn't, there's likely to be a
security flaw.
Using Space does require someone with access to the application code
to determine which program variables and functions correspond to which
aspects of Near's models. But that isn't an onerous requirement: Near
was able to map correspondences for all 50 of the applications he
evaluated. And that mapping should be even easier for a programmer
involved in an application's development from the outset, rather than
coming to it from the outside as Near did.

No comments:
Post a Comment