PHP Lex Engine Sourcecode Analysis(undone)

Andrew.Hann發表於2015-05-21

catalog

1. PHP詞法解析引擎Lex簡介
2. PHP標籤解析

1. PHP詞法解析引擎Lex簡介

Relevant Link:

2. PHP標籤解析

\php-5.4.41\Zend\zend_language_scanner.l

int lex_scan(zval *zendlval TSRMLS_DC)
{
restart:
    //設定當前token的首位置為當前位置
    SCNG(yy_text) = YYCURSOR;

yymore_restart:
//這段註釋定義了各個型別的正規表示式匹配，在詞法解析程式（如bison、re2c等）程式將本檔案轉化為c程式碼時會用到
/*!re2c
re2c:yyfill:check = 0;
LNUM    [0-9]+
DNUM    ([0-9]*"."[0-9]+)|([0-9]+"."[0-9]*)
EXPONENT_DNUM    (({LNUM}|{DNUM})[eE][+-]?{LNUM})
HNUM    "0x"[0-9a-fA-F]+
BNUM    "0b"[01]+
LABEL    [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
WHITESPACE [ \n\r\t]+
TABS_AND_SPACES [ \t]*
TOKENS [;:,.\[\]()|^&+-/*=%!~$<>?@]
ANY_CHAR [^]
NEWLINE ("\r"|"\n"|"\r\n")

/* compute yyleng before each rule */
<!*> := yyleng = YYCURSOR - SCNG(yy_text);
。。

0x1: 匹配PHP標籤

1.1 <script language=php>、<script language='php'>、<script language="php">

//首先是匹配<script language=php>標籤，原始碼如下,無論這裡面有多少個空白字元全部無視，最後php也可以加上單引號或雙引號
<INITIAL>"<script"{WHITESPACE}+"language"{WHITESPACE}*"="{WHITESPACE}*("php"|"\"php\""|"'php'"){WHITESPACE}*">" 
{
    YYCTYPE *bracket = (YYCTYPE*)zend_memrchr(yytext, '<', yyleng - (sizeof("script language=php>") - 1));

    //因為<script>標籤本身是在html中的，所以判斷當前是否在掃描html，如果是的話就跳轉到inline_html去
    if (bracket != SCNG(yy_text)) 
    {
        /* Handle previously scanned HTML, as possible <script> tags found are assumed to not be PHP's */
        YYCURSOR = bracket;
        goto inline_html;
    }

    //不然就將當前狀態改為ST_IN_SCRIPTING並返回T_OPEN_TAG，表示這是一個php的標籤。
    HANDLE_NEWLINES(yytext, yyleng);
    zendlval->value.str.val = yytext; /* no copying - intentional */
    zendlval->value.str.len = yyleng;
    zendlval->type = IS_STRING;
    BEGIN(ST_IN_SCRIPTING);
    return T_OPEN_TAG;
}

1.2 <%=、<%

<INITIAL>"<%" 
{
    //檢查php.ini裡面的asp_tags標籤是否為On，如果是則表示進入指令碼並返回T_OPEN_TAG
    if (CG(asp_tags)) 
    {
        zendlval->value.str.val = yytext; /* no copying - intentional */
        zendlval->value.str.len = yyleng;
        zendlval->type = IS_STRING;
        BEGIN(ST_IN_SCRIPTING);
        return T_OPEN_TAG;
    } 
    else 
    {
        //否則就轉到inline_char_handler去
        goto inline_char_handler;
    }
}

1.3 <?=、<?

//短標籤<?=和<?
<INITIAL>"<?" 
{
    //判斷short_open_tag是否為On
    if (CG(short_tags)) 
    {
        zendlval->value.str.val = yytext; /* no copying - intentional */
        zendlval->value.str.len = yyleng;
        zendlval->type = IS_STRING;
        BEGIN(ST_IN_SCRIPTING);
        return T_OPEN_TAG;
    } 
    else 
    {
        goto inline_char_handler;
    }
}

//"<?=" 不需要short_open_tag標誌開啟
<INITIAL>"<?=" 
{
    zendlval->value.str.val = yytext; /* no copying - intentional */
    zendlval->value.str.len = yyleng;
    zendlval->type = IS_STRING;
    BEGIN(ST_IN_SCRIPTING);
    return T_OPEN_TAG_WITH_ECHO;
}

1.4 <?php

<INITIAL>"<?php"([ \t]|{NEWLINE}) 
{
    zendlval->value.str.val = yytext; /* no copying - intentional */
    zendlval->value.str.len = yyleng;
    zendlval->type = IS_STRING;
    HANDLE_NEWLINE(yytext[yyleng-1]);
    BEGIN(ST_IN_SCRIPTING);
    return T_OPEN_TAG;
}

1.5 inline_char_handler

如果以上的標籤匹配都失敗，就會匹配ANY_CHAR，判斷是否掃描完了，是的話直接返回0，不是就接下去執行inline_char_handler和inline_html段的程式碼

<INITIAL>{ANY_CHAR}
{
    if (YYCURSOR > YYLIMIT) 
    {
        return 0;
    }

inline_char_handler:
    while (1) 
    {
        /*
        inline_char_handler中的程式碼是對整個字串掃描，memchr表示的是從YYCURSOR開始的YYLIMIT - YYCURSOR長度內的字串中搜尋'<'字元 
        1. 如果找到則匹配'?'、'%'、's'等字元，如果滿足條件則結束迴圈 
        2. 而匹配到's'或'S'則將YYCURSOR往回退一格並重新開始php標籤的匹配
        */
        YYCTYPE *ptr = memchr(YYCURSOR, '<', YYLIMIT - YYCURSOR);

        YYCURSOR = ptr ? ptr + 1 : YYLIMIT;

        if (YYCURSOR < YYLIMIT) 
        {
            switch (*YYCURSOR) 
            {
                case '?':
                    if (CG(short_tags) || !strncasecmp((char*)YYCURSOR + 1, "php", 3) || (*(YYCURSOR + 1) == '=')) 
                    { /* Assume [ \t\n\r] follows "php" */
                        break;
                    }
                    continue;
                case '%':
                    if (CG(asp_tags)) 
                    {
                        break;
                    }
                    continue;
                case 's':
                case 'S':
                    /* Probably NOT an opening PHP <script> tag, so don't end the HTML chunk yet
                     * If it is, the PHP <script> tag rule checks for any HTML scanned before it */
                    YYCURSOR--;
                    yymore();
                default:
                    continue;
            }

            YYCURSOR--;
        }

        break;
    }
    ..

1.6 inline_html

如果是inline_html的程式碼，直接複製這段程式碼(PHP引擎對HTML程式碼原樣輸出)，隨後返回T_INLINE_HTML

..
//inline_html掃描的是不在php標籤裡面的的程式碼，也就是說這些php程式碼可能夾雜在諸如html等程式碼中
inline_html:
    yyleng = YYCURSOR - SCNG(yy_text);

    if (SCNG(output_filter)) 
    {
        int readsize;
        size_t sz = 0;
        readsize = SCNG(output_filter)((unsigned char **)&(zendlval->value.str.val), &sz, (unsigned char *)yytext, (size_t)yyleng TSRMLS_CC);
        zendlval->value.str.len = sz;
        if (readsize < yyleng) 
        {
            yyless(readsize);
        }
    } 
    else 
    {
      zendlval->value.str.val = (char *) estrndup(yytext, yyleng);
      zendlval->value.str.len = yyleng;
    }
    zendlval->type = IS_STRING;
    HANDLE_NEWLINES(yytext, yyleng);
    return T_INLINE_HTML;
}

從PHP原始碼中我們也可以看到，PHP對結束閉合標籤是可選的，解析器並不會強制要求一定要有結束閉合標籤，PHP官方文件解釋如下

The closing tag of a PHP block at the end of a file is optional, and in some cases omitting it is helpful when using include or require, so unwanted whitespace will not occur at the end of files, and you will still be able to add headers to the response later. It is also handy if you use output buffering, and would not like to see added unwanted whitespace at the end of the parts generated by the included files.

不使用結束閉合標籤的好處有以下幾個

1. 如果這個是一個被別人包含的程式，沒有這個結束符，可以減少很多很多問題，比如說：header, setcookie, session_start這些動作之前不能有輸出，如果不小心在?> 後邊加了不可見字元(多餘的空格、換行符)等破壞頁面顯示，就會報"Header already sent"錯誤，不寫的話不會有此問題。另，可以直接把游標移到最後，接著程式設計 
2. PHP閉合標籤"?>"在PHP中對PHP的分析器是可選的。但是，如果使用閉合標籤，任何由開發者，使用者，或者FTP應用程式插入閉合標籤後面的空格都有可能會引起多餘的輸出、php錯誤、之後的輸出無法顯示、空白頁。因此，所有的php檔案應該省略這個php閉合標籤，並插入一段註釋來標明這是檔案的底部並定位這個檔案在這個應用的相對路徑。這樣有利於你確定這個檔案已經結束而不是被刪節的

Relevant Link:

http://php.net/manual/en/language.basic-syntax.instruction-separation.php
http://doophp.sinaapp.com/archives/php/end-symbol.html
http://blog.csdn.net/yanhui_wei/article/details/7951424
http://blog.csdn.net/wuyangbotianshi/article/details/41728091

PHP Simulation HTTP Request(undone)
2014-12-29
PHPHTTP
Linux Kernel File IO Syscall Kernel-Source-Code Analysis(undone)
2014-08-24
Linux
NetLink Communication Mechanism And Netlink Sourcecode Analysis
2015-04-13
Nginx Parsing HTTP Package、header/post/files/args Sourcecode Analysis
2015-12-23
NginxHTTPPackageHeader
Changes in PHP 5/Zend Engine 2.0 (轉)
2008-01-21
PHP
SQLChop、SQLWall(Druid)、PHP Syntax Parser Analysis
2015-09-07
SQLUIPHP
TOMOYO Linux(undone)
2014-12-08
Linux
lex yacc 學習
2017-03-20
Chkrootkit Sourcecode Learning
2014-08-18
lex yacc 入門教程
2017-03-20
PHP 5/Zend Engine 2.0的改進(二) (轉)
2007-08-17
PHP
Lex詞法分析器
2019-05-10
詞法分析
谷歌向所有開發者開放Google App Engine PHP Runtime
2013-10-11
谷歌GoAPPPHP
RISK ANALYSIS
2024-11-05
Cloud Design Patterns Book Reading(undone)
2015-02-28
Cloud
Linux/Unix System Level Attack、Privilege Escalation(undone)
2014-08-13
Linux
Pisa-Proxy SQL 解析之 Lex & Yacc
2022-07-07
SQL
PHP 危矣？Zend Engine 團隊宣佈脫離 Rogue Wave
2018-10-21
PHP
Flutter Analysis Options
2020-05-29
Flutter
HanLP Analysis for Elasticsearch
2019-04-22
HanLPElasticsearch
Oracle Hang Analysis
2010-01-18
Oracle
LEX與YACC學習資料連結
2013-04-24
Js template engine
2014-02-12
JS
A Security Analysis Of Browser Extensions
2020-08-19
生存分析(survival analysis)
2016-12-06
Profitability Analysis – General tables
2014-03-30
Statistics and Data Analysis for Bioinformatics
2024-10-30
ORM
TCP Socket Establish；UDP Send Package Process In Kernel Sourcecode Learning
2014-07-30
TCPUDPPackage
Elasticsearch Analysis 分析器
2021-09-09
Elasticsearch
Slither: A Static Analysis Framework For Smart
2020-10-28
Framework
UEFI BIOS Rootkit Analysis
2017-04-23
iOS
Regression Analysis Using Excel
2015-01-11
Excel
An Analysis of Sequential Recommendation Datasets
2024-04-24
Web Scraping & Data Analysis
2024-10-24
WebAPI
安裝 Docker Engine
2024-06-04
Docker
C編譯器LEX 和 YACC輸入原始檔。 (轉)
2007-12-10
編譯
Amandroid - Argus static analysis framework
2017-07-15
AndroidFramework
Exercise 5: Field data acquisition and analysis
2024-10-03
UI

PHP Lex Engine Sourcecode Analysis(undone)

相關文章