четверг, 5 февраля 2009 г.

Irregexp, Google Chrome's New Regexp Implementation

Original: Irregexp, Google Chrome's New Regexp Implementation

One of the new features in the most recent dev-channel release of Google Chrome (2.0.160.0) is Irregexp, a completely new implementation of regular expressions (regexps) in the V8 JavaScript engine. Irregexp builds on V8's existing infrastructure for memory management and native code generation and is tailored to work well for the kinds of regexps used by JavaScript programs on the web. The result is a considerable improvement in V8's regexp performance.

While the V8 team has been working hard to improve JavaScript performance, one part of the language that we have so far not given much attention is regexps. Ou termediate automaton representation. This is in many ways the "natural" and most accessible representation and makes it much easier to analyze and optimize the regexp. For instance, when compiling /Sun|Mon/ the automaton representation lets us recognize that both alternatives have an 'n' as their third character. We can quickly scan the input until we find an 'n' and then start to match the regexp two characters earlier. Irregexp looks up to four characters ahead and matches up to four characters at a time.

After optimization we generate native machine code which uses backtracking to try different alternatives. Backtracking can be time-consuming so we use optimizations to avoid as much of it as we can. There are techniques to avoid backtracking altogether but the nature of regexps in J div>And BTW, we'll have sessions

Комментариев нет: