| [ Index ] |
PHP Cross Reference of Limb3 |
[Source view] [Print] [Project Stats]
base include file for SimpleTest
| Version: | $Id: parser.php 5999 2007-06-18 13:13:08Z pachanga $ |
| File Size: | 775 lines (30 kb) |
| Included or required: | 0 times |
| Referenced: | 0 times |
| Includes or requires: | 0 files |
ParallelRegex:: (5 methods):
ParallelRegex()
addPattern()
match()
_getCompoundedRegex()
_getPerlMatchingFlags()
SimpleStateStack:: (4 methods):
SimpleStateStack()
getCurrent()
enter()
leave()
SimpleLexer:: (13 methods):
SimpleLexer()
addPattern()
addEntryPattern()
addExitPattern()
addSpecialPattern()
mapHandler()
parse()
_dispatchTokens()
_isModeEnd()
_isSpecialMode()
_decodeSpecial()
_invokeParser()
_reduce()
SimpleHtmlLexer:: (6 methods):
SimpleHtmlLexer()
_getParsedTags()
_addSkipping()
_addTag()
_addInTagTokens()
_addAttributeTokens()
SimpleHtmlSaxParser:: (11 methods):
SimpleHtmlSaxParser()
parse()
createLexer()
acceptStartToken()
acceptEndToken()
acceptAttributeToken()
acceptEntityToken()
acceptTextToken()
ignore()
decodeHtml()
normalise()
SimpleSaxListener:: (4 methods):
SimpleSaxListener()
startElement()
endElement()
addContent()
Class: ParallelRegex - X-Ref
Compounded regular expression. Any of| ParallelRegex($case) X-Ref |
| Constructor. Starts with no patterns. param: boolean $case True for case sensitive, false |
| addPattern($pattern, $label = true) X-Ref |
| Adds a pattern with an optional label. param: string $pattern Perl style regex, but ( and ) param: string $label Label of regex to be returned |
| match($subject, &$match) X-Ref |
| Attempts to match all patterns at once against a string. param: string $subject String to match against. param: string $match First matched portion of return: boolean True on success. |
| _getCompoundedRegex() X-Ref |
| Compounds the patterns into a single regular expression separated with the "or" operator. Caches the regex. Will automatically escape (, ) and / tokens. param: array $patterns List of patterns in order. |
| _getPerlMatchingFlags() X-Ref |
| Accessor for perl regex mode flags to use. return: string Perl regex flags. |
Class: SimpleStateStack - X-Ref
States for a stack machine.| SimpleStateStack($start) X-Ref |
| Constructor. Starts in named state. param: string $start Starting state name. |
| getCurrent() X-Ref |
| Accessor for current state. return: string State. |
| enter($state) X-Ref |
| Adds a state to the stack and sets it to be the current state. param: string $state New state. |
| leave() X-Ref |
| Leaves the current state and reverts to the previous one. return: boolean False if we drop off |
Class: SimpleLexer - X-Ref
Accepts text and breaks it into tokens.| SimpleLexer(&$parser, $start = "accept", $case = false) X-Ref |
| Sets up the lexer in case insensitive matching by default. param: SimpleSaxParser $parser Handling strategy by param: string $start Starting handler. param: boolean $case True for case sensitive. |
| addPattern($pattern, $mode = "accept") X-Ref |
| Adds a token search pattern for a particular parsing mode. The pattern does not change the current mode. param: string $pattern Perl style regex, but ( and ) param: string $mode Should only apply this |
| addEntryPattern($pattern, $mode, $new_mode) X-Ref |
| Adds a pattern that will enter a new parsing mode. Useful for entering parenthesis, strings, tags, etc. param: string $pattern Perl style regex, but ( and ) param: string $mode Should only apply this param: string $new_mode Change parsing to this new |
| addExitPattern($pattern, $mode) X-Ref |
| Adds a pattern that will exit the current mode and re-enter the previous one. param: string $pattern Perl style regex, but ( and ) param: string $mode Mode to leave. |
| addSpecialPattern($pattern, $mode, $special) X-Ref |
| Adds a pattern that has a special mode. Acts as an entry and exit pattern in one go, effectively calling a special parser handler for this token only. param: string $pattern Perl style regex, but ( and ) param: string $mode Should only apply this param: string $special Use this mode for this one token. |
| mapHandler($mode, $handler) X-Ref |
| Adds a mapping from a mode to another handler. param: string $mode Mode to be remapped. param: string $handler New target handler. |
| parse($raw) X-Ref |
| Splits the page text into tokens. Will fail if the handlers report an error or if no content is consumed. If successful then each unparsed and parsed token invokes a call to the held listener. param: string $raw Raw HTML text. return: boolean True on success, else false. |
| _dispatchTokens($unmatched, $matched, $mode = false) X-Ref |
| Sends the matched token and any leading unmatched text to the parser changing the lexer to a new mode if one is listed. param: string $unmatched Unmatched leading portion. param: string $matched Actual token match. param: string $mode Mode after match. A boolean return: boolean False if there was any error |
| _isModeEnd($mode) X-Ref |
| Tests to see if the new mode is actually to leave the current mode and pop an item from the matching mode stack. param: string $mode Mode to test. return: boolean True if this is the exit mode. |
| _isSpecialMode($mode) X-Ref |
| Test to see if the mode is one where this mode is entered for this token only and automatically leaves immediately afterwoods. param: string $mode Mode to test. return: boolean True if this is the exit mode. |
| _decodeSpecial($mode) X-Ref |
| Strips the magic underscore marking single token modes. param: string $mode Mode to decode. return: string Underlying mode name. |
| _invokeParser($content, $is_match) X-Ref |
| Calls the parser method named after the current mode. Empty content will be ignored. The lexer has a parser handler for each mode in the lexer. param: string $content Text parsed. param: boolean $is_match Token is recognised rather |
| _reduce($raw) X-Ref |
| Tries to match a chunk of text and if successful removes the recognised chunk and any leading unparsed data. Empty strings will not be matched. param: string $raw The subject to parse. This is the return: array/boolean Three item list of unparsed |
Class: SimpleHtmlLexer - X-Ref
Breas HTML into SAX events.| SimpleHtmlLexer(&$parser) X-Ref |
| Sets up the lexer with case insensitive matching and adds the HTML handlers. param: SimpleSaxParser $parser Handling strategy by |
| _getParsedTags() X-Ref |
| List of parsed tags. Others are ignored. return: array List of searched for tags. |
| _addSkipping() X-Ref |
| The lexer has to skip certain sections such as server code, client code and styles. |
| _addTag($tag) X-Ref |
| Pattern matches to start and end a tag. param: string $tag Name of tag to scan for. |
| _addInTagTokens() X-Ref |
| Pattern matches to parse the inside of a tag including the attributes and their quoting. |
| _addAttributeTokens() X-Ref |
| Matches attributes that are either single quoted, double quoted or unquoted. |
Class: SimpleHtmlSaxParser - X-Ref
Converts HTML tokens into selected SAX events.| SimpleHtmlSaxParser(&$listener) X-Ref |
| Sets the listener. param: SimpleSaxListener $listener SAX event handler. |
| parse($raw) X-Ref |
| Runs the content through the lexer which should call back to the acceptors. param: string $raw Page text to parse. return: boolean False if parse error. |
| createLexer(&$parser) X-Ref |
| Sets up the matching lexer. Starts in 'text' mode. param: SimpleSaxParser $parser Event generator, usually $self. return: SimpleLexer Lexer suitable for this parser. |
| acceptStartToken($token, $event) X-Ref |
| Accepts a token from the tag mode. If the starting element completes then the element is dispatched and the current attributes set back to empty. The element or attribute name is converted to lower case. param: string $token Incoming characters. param: integer $event Lexer event type. return: boolean False if parse error. |
| acceptEndToken($token, $event) X-Ref |
| Accepts a token from the end tag mode. The element name is converted to lower case. param: string $token Incoming characters. param: integer $event Lexer event type. return: boolean False if parse error. |
| acceptAttributeToken($token, $event) X-Ref |
| Part of the tag data. param: string $token Incoming characters. param: integer $event Lexer event type. return: boolean False if parse error. |
| acceptEntityToken($token, $event) X-Ref |
| A character entity. param: string $token Incoming characters. param: integer $event Lexer event type. return: boolean False if parse error. |
| acceptTextToken($token, $event) X-Ref |
| Character data between tags regarded as important. param: string $token Incoming characters. param: integer $event Lexer event type. return: boolean False if parse error. |
| ignore($token, $event) X-Ref |
| Incoming data to be ignored. param: string $token Incoming characters. param: integer $event Lexer event type. return: boolean False if parse error. |
| decodeHtml($html) X-Ref |
| Decodes any HTML entities. param: string $html Incoming HTML. return: string Outgoing plain text. |
| normalise($html) X-Ref |
| Turns HTML into text browser visible text. Images are converted to their alt text and tags are supressed. Entities are converted to their visible representation. param: string $html HTML to convert. return: string Plain text. |
Class: SimpleSaxListener - X-Ref
SAX event handler.| SimpleSaxListener() X-Ref |
| Sets the document to write to. |
| startElement($name, $attributes) X-Ref |
| Start of element event. param: string $name Element name. param: hash $attributes Name value pairs. return: boolean False on parse error. |
| endElement($name) X-Ref |
| End of element event. param: string $name Element name. return: boolean False on parse error. |
| addContent($text) X-Ref |
| Unparsed, but relevant data. param: string $text May include unparsed tags. return: boolean False on parse error. |
| Generated: Sat Nov 22 03:48:54 2008 | Cross-referenced by PHPXref 0.7 |