Minutes PHP Developers Meeting

Paris, November 11th and 12th, 2005

Attendees:

Contents

1. Unicode

The first part of the meeting was dedicated to issues related to the Unicode support for PHP 6.

1.1 Unicode on/off modes

Issue: Currently it is possible to have Unicode on or off on a per request basis, requiring the storage of both non-Unicode and Unicode variants of class, method and function names in all the symbol tables.

Discussion: Having to allow both a non-Unicode and a Unicode versions of names in the symbol table is deemed unnecessary and we agree on allowing only a server wide configuration setting to enable or disable Unicode support. This makes implementation easier for some parts of the engine, it causes less problems for opcode caches and it is slightly faster as no runtime conversion of the names is necessary. We also discussed whether we should even allow Unicode mode to be turned off as current micro benchmarks show that the Unicode implementations of some of the string functions are up to 300% slower, and whole applications up to 25% slower. Disallowing Unicode mode to be turned off is expected to slow down the adoption of PHP 6 too as many ISPs would be reluctant to install a version that immediately slows down the applications of their users. When we provide a switch they can start by turning it off and users would have an easy way of asking their ISPs to turn Unicode mode on by simply reconfiguring it in php.ini. This is also why we chose to pick a runtime configuration setting as opposed to a configure-time configuration switch. We do need some trickery in order to be able to parse the setting from php.ini as we need to know whether to enable or disable Unicode mode before we activate our extensions. Another reason for providing a runtime switch instead of an compile switch is that distributions would only have to create one binary.

Conclusion:

  1. We provide a run-time switch in php.ini to enable or disable Unicode semantics. This setting will default to "On". When Unicode semantics are off you will still have access to Unicode features.

1.2 Different String Types

Issue: A number of people are unhappy with the current implementation where there are either too many different string types (binary, string, unicode) or the multiple implementations of many internal engine functions and helper functions.

Discussion: After some discussion everybody attending seems to agree that only having two string types (binary and string) makes sense. The Unicode semantics switch will control what type the string literals are by default, of course. Documentation will need to mention that with the switch on, all "strings" are Unicode and with the switch off, all "strings" are Binary.

Conclusions:

  1. We use IS_STRING internally to represent binary data. In documentation and user land exposures we use "binary" as term. (For example as name for casts).
  2. We use IS_UNICODE internally to represent unicode string data. In documentation and user land exposures we use "string" as term.

1.3 Extension upgrading

Issue: Extensions need to be upgraded and we need to be able to make sure that non-upgraded extensions will not be activated when Unicode mode is selected.

Discussion: We need to look at extensions and figure out which common things are there to be solved for supporting Unicode. For theses tasks we then create an API that extensions can use to implement Unicode support.

For PDO we need to have some way where the drivers send back UTF-16 if we are in unicode mode. If the driver does not support it, then PDO needs to up convert based on what the driver communicates.

Conclusions:

  1. We remove old parameter parsing API so that the extensions are forced use the new one which helps with Unicode support.
  2. We add a flag to the extension's module structure stating whether it supports Unicode or not. If the extension does not support Unicode it will not be loaded during start up if the Unicode switch is "On".
  3. We create a PHP/Unicode API that extensions can use to support Unicode in an easier way.
  4. Wez looks into what should be done in PDO to support Unicode properly.

1.4 Bundling ICU

Issue: As ICU will be required by PHP regardless of whether the Unicode switch is on or off, we need to decide on bundling the ICU library in part or fully.

Discussion: ICU is quite a large library and bundling it with PHP will increase the download size, but as PHP requires a specific version of ICU (3.4) it is worthwhile to bundle. Another option beside bundling is to push distributions to include ICU 3.4 and make PHP rely on it. Reasoning for this is that now the distributions only have the responsibility to pick an ICU version that works together with PHP. People who compile PHP from source can also easily compile ICU from source or are competent enough to install it with their distribution's tools.

Conclusion:

  1. We will make our build system bail out if an supported ICU library could not be found, and in the bail out message we provide a small set of instructions on where to get ICU. Included in this message is also a link to our documentation that provides a more extensive coverage on installing ICU.
  2. We will write to maintainers of distributions to lobby for including ICU 3.4 in their distribution.

1.5 Filename Encoding

Issue: Files on a file system can have names encoded in different character sets. For example Windows can make use of it's UTF16 based filename API, while on Linux it simply depends on the application.

Conclusions:

  1. We need to implement the already described "filename_encoding" setting to set the expected filename encoding.
  2. When functions such as readdir() encounter a filename that can not exist in the encoding that is set with the "filename_encoding" option (broken characters for example, or a latin1 name when the "filename_encoding" is set to UTF8) it returns a binary string.

1.6 Collator Caching

Issue: Collators are used for comparing strings and it is quite expensive to open (and close) one each time. A method to cache them needs to be found.

Discussion: In order to prevent a huge amount of memory to be used by this cache we need to limit the amount of objects we store in the cache. We deem the last 3 enough as in most cases you would only have one default collator, and perhaps a secondary one. We also think that an application would often not use more than one encoder between two character sets.

Conclusion:

  1. We will store the 3 last opened collators and encoding objects in a thread/process wide cache.

1.7 Optimising []

Issue: Currently using the [] operator to select an arbitrary character is very slow as PHP needs to start scanning the string from the start. This is because it is impossible to calculate the correct memory position as the UTF16 encoding that we use allows either 2 or 4 bytes per character.

Discussion: A suggestion is to optimise [] by storing the offset/char nr (in the zval). This will perhaps increase the size of a zval and we're wondering whether it's worthwhile to optimise already.

Conclusion:

  1. We postpone whether we will implement the suggested implementation until later.

1.8 Locale Sensitivity

Issue: Some PHP functions currently make use of the system locales which have some problems, such as different names on different platforms and the non-availability of many locales on a specific installation. ICU's support for locales is very extensive and offers a lot of settings.

Discussion: Locale configuration is important for localised applications and by relying on ICU's database of locale data it is possible to build a locale-aware application that can be deployed in a reliable way on a multitude of platforms and installations. In order to make full use of this functionality PHP's functions that deal with locales should be converted to ICU locales. ICU's offering of locale-aware functions is extensive, however in most cases most of this functionality is either not needed, nor wanted. This is why we should pick a conservative default for the options, but also provide a more extensive API so that the more advanced features can be used too.

As ICU's string comparison is locale aware we need to implement this in some way. We chose not to implement locale-aware comparisons for == and strcmp() as currently they are not. There is a separate function called strcoll() which is now based on POSIX locales. This one will be rewritten for ICU locales, while the == and strcmp() will stay they same as they are now. By keeping them as they are now we are not breaking any current usage and offer (a bit) faster and generic string comparation functionality.

Conclusions:

  1. == should be the same as strcmp, and not using collation. strcoll() does.
  2. We will use locale based functions where they make sense, and we pick a conservative default. Examples are strtoupper/strtolower, stristr etc..

1.10 Conversion Errors

Issue: While converting between different encodings a conversion error might occur as not all characters in the source string might be stored in the target character set.

Discussion: Currently PHP does not make use of exceptions for any of the internals and implementing conversion errors with exceptions will be breaking this current behaviour. By removing the need for automatic conversions between native strings and unicode strings and v.v. this issue is also less important as now the user will be largely responsible to do character set conversions. By introducing an exception mode to the error modes we already support for characters encoding failures we give the user full flexibility with handling conversion failures.

Conclusions:

  1. We will not use exceptions for implicit conversion errors.
  2. We provide an additional error mode for character set conversion failures that throw exceptions on failures.

2. Cleanup of Functionality

2.1 register_globals

Issue: Register globals are the source of many application's security problems and cause a constant grief.

Discussion: We shortly discussed how we want to attend users on the disappearance of this functionality. We decided that if we find the setting during the startup of PHP we raise an E_CORE_ERROR which will prevent the server from starting with a message that points to the documentation. The documentation should explain why this functionality was removed, and some introduction on safe programming.

Conclusions:

  1. We are going to remove the functionality.
  2. We throw an E_CORE_ERROR when starting PHP and when we detect the register_globals setting

2.2 magic_quotes

Issue: Magic_quotes can be cumbersome for application developers as it is a setting that can be set to on or off without any influence from within the script itself as input parameters are escaped before the script starts.

Discussion: In the same way as with the remove of the register_globals functionality, we decided that if we find the setting during the startup of PHP we raise an E_CORE_ERROR which will prevent the server from starting with a message that points to the documentation. The documentation should explain why this functionality was removed, and point the users at the input_filter extension as replacement.

Conclusions:

  1. We remove the magic_quotes feature from PHP.
  2. We throw an E_CORE_ERROR when starting PHP and when we detect the magic_quotes, magic_quotes_sybase or magic_quotes_gpc setting.

2.3 safe_mode

Issue: safe_mode is a feature in PHP that checks whether files to be opened or included have the same GID/UID as the starting script. This can cause many problems, for example if an application generates a cache file, it will do this with the user ID that belongs to the web server (usually "nobody"). As an application is usually uploaded by the user belonging to the web account (say "client") the scripts can no longer open the files that the application. The same problems happen when for example an application generates an image.

Discussion: As safe_mode is a name that gives the wrong signals as making PHP safe, we all agreed that we should remove this function. It can never be made totally safe as there will always be ways to circumvent safe_mode through libraries. This kind of functionality also better belongs in the web server or other security scheme. open_basedir is a feature that we will keep, and we will point users to this functionality in the error message that is thrown when we detect this setting on start-up.

Conclusions:

  1. We remove the safe_mode feature from PHP.
  2. We throw an E_CORE_ERROR when starting PHP and when we detect the safe_mode setting.

2.4 Deprecated Behaviour

Issue: There are some places in PHP where we keep deprecated behaviour from earlier PHP versions. Some of those might finally be dropped in PHP 6.

Discussion: We only discussed a few cases where we might want to drop the deprecated behaviour as we didn't have a full list of all cases.

The first issue that we raised was changing the E_NOTICE error for call-time-pass-by-reference to an E_ERROR, or simply throwing a parse error. We argued over this case and we decided to change this E_NOTICE to an E_STRICT instead as it was argued that there is nothing wrong with doing a call-time pass by reference.

The second issue was removing support for "var" altogether in PHP 6. Now it is an alias for "public", but it will raise an E_STRICT warning. As there is no real reason why we should remove it, we agreed on simply making "var" an alias to "public" and removing the warning.

The last issue that came up under this subject is the return-by-reference of the result of "new <object-name>". First we thought that there might be some reason to keep this in case you have a factory having a list of references to already instantiated objects, but as the behaviour would be exactly the same keeping those by value we came to the conclusion that there is no reason to try to do either of the two examples:

<?php
$foo =& new StdClass();
?>

<?php
function &foo()
{
    return new StdClass();
}

$f = foo();
?>

Both these cases should return E_STRICT instead.

Conclusions:

  1. Make the call-time-pass-by-reference an E_STRICT error.
  2. We make "var" an alias for "public" and remove the warning for it.
  3. Assign "new" by reference will throw an E_STRICT error.

2.5 zend.ze1_compatibility mode

Issue: zend.ze1_compatibility_mode tries to keep the old PHP 4 behaviour where objects will be copied on assignment unless they are assigned by reference (and the same for passing to a function as that is assigning too). It also affects casting objects to an integer.

Discussion: This functionality does not work 100% and its functionality was introduced to migrate PHP 4 users to PHP 5 in an easier way. As this is not an issue anyway, we intend to remove this setting.

Conclusions:

  1. We remove the zend.ze1_compatibility_mode feature from PHP.
  2. We throw an E_CORE_ERROR when starting PHP and when we detect the zend.ze1_compatibility_mode setting.

2.6 Support for Freetype 1 and GD 1

Issue: FreeType 1 and GD 1 are archaic versions of the true type font rendering and graphics manipulation libraries.

Discussion: As they are old versions that are no longer maintained, and the new versions are much better we see no problems by removing support for them.

Conclusions:

  1. We remove support for Freetype 1.
  2. We remove support for GD 1.

2.7 open_basedir

Issue: open_basedir is a feature that restricts the opening of files by PHP to certain directories.

Discussion: This feature is relatively straightforward, and although it also suffers from libraries being able to work around it, we decided to keep it as it saves a useful purpose without causing any headaches to users (like safe_mode) does.

Conclusions:

  1. We keep the open_basedir functionality.

2.8 dl()

Issue: dl() causes many problems as we are never unloading modules. In threaded environments we already disable dl() already.

Discussion: The first impression was that we can remove this functionality, but there is some use for it in for example the CLI version of PHP. Instead of registering dl() in the core we will leave it up to each SAPI to register this function, as necessary. For the current SAPIs that we have we will only keep it for CLI and embed.

Conclusions:

  1. We do not remove it fully, but only enable it when a SAPI layer registers it explicitly.

2.9 CGI/FastCGI mode

Issue: The CGI/FastCGI code is messy.

Discussion: FastCGI is better than CGI but it can currently be disabled which results in messy code. We will clean up the code and always enable FastCGI for CGI SAPI.

Conclusions:

  1. Clean up the code, so that FastCGI mode can not be disabled.

2.10 Dynamic class inheritance

Issue: Dynamic class initialisation or inheritance makes things slow(er).

Discussion: It is useful to be able to do "if (...) class {...} else class {...}" although it makes classes slower as inheritance is the done at runtime and not at compile time. It also causes problems for accelerators such as APC. As it is useful and plenty of scripts use it, it would not be a good idea to remove. It is also possible for compiler caches to detect this, and they can then throw warnings/errors if required.

Conclusions:

  1. We keep it in the engine, and leave it up to the caches to spit out warnings/errors.

2.11 register_long_arrays, HTTP_*_VARS

Issue: register_long_arrays and the long versions of the super globals have been deprecated since some time, and do not serve a real purpose.

Discussion: The $_GET[], $_POST[], etc style superglobals are a better alternative since they are shorter and have the same behavior. The register_long_arrays option is also off by default making it less of a problem to remove this.

Conclusions:

  1. We remove the register_long_arrays setting and HTTP_*_VARS globals from PHP.
  2. We throw an E_CORE_ERROR when starting PHP and when we detect the register_long_arrays setting.

2.12 old type constructors

Issue: Currently PHP 5 also supports the "old style" constructors from PHP 4, which have the same name as the class name. This makes it impossible to have a class without constructor and a method of the same name as the class (as it would be called as constructor).

Discussion: We discussed this subject and it was brought up that having a constructor with the same name as the class name could cause problems in the following code:

<?php
        class A {
                function B() {
                }
        }
        class B {
        }

        $b = new B();
?>

It was thought that this would call A::B() as constructor when instantiating class B. However, this is not the case so there are no problems except then the one mentioned above in "Issue".

Conclusions:

  1. We keep the alternative old-style constructor.

2.13 Case sensitivity of identifiers

Issue: Case insensitivity of functions and classes is something a lot of developers want to get rid of for quite some time, as it is an inconsistent behaviour compared to variable names, which are case sensitive. It also causes interesting problems such as in bug #35050.

Discussion: Making this change outright is not a good idea, as there are plenty of people using a "wrong" case for the internal functions, such as "Header", "ImageCreate" etc which are officially all lower case letters. This will create too much of a head ache.

We are looking in how to make this change in a gradual way - perhaps one where we create upper- and lowercase aliases for the functions and do a two-phase lookup; the ideal case is to match the natural function name case on the first lookup. If that fails, then lowercase and try again; if that succeeds emit a warning about the case mismatch. This gives us a stepping stone for implementing case-sensitivity in the future.

Conclusions:

  1. We're going to try to find a way to see how we can make this change gradually, but do not "fix" it for PHP 6.

2.14 break $var

Issue: "break $var" doesn't really work and there is no real reason for this. All you can do with it is assign a number to $var and use that to break out of that many loops.

Discussion: It doesn't work and we don't see any use for it.

Conclusions:

  1. We remove support for dynamic break levels.

3. PECL

This section deals about moving extensions in and out of PECL and other extension related issues.

3.1 XMLReader / XMLWriter in the distribution, on by default

Discussion: XML Reader provides a simple XML parser internally based on SAX parsing, and XML Writer provides an easy API for writing XML files. Both extensions should make working with XML files a lot easier.

Conclusions:

  1. XML Reader into the core distribution and on by default
  2. XML Writer into the core distribution and on by default

3.2 Move non-PDO DB extensions to PECL

Issue: PHP 5.1 introduces PDO, an extension that unifies Database APIs. With this we do not "need" older extensions to access databases anymore.

Discussion: We can not remove the "old" extensions, as at least OCI8 and MySQLi provide a very rich set of features, which are not all supported by PDO. Some "old" extensions can probably be moved to PECL as they are either unmaintained, or superseded by PDO.

Conclusions:

  1. We decide on moving DB extensions out of the core later.

3.3 Move ereg to PECL

Issue: Currently we have two extensions dealing with regular expressions, and soon there will be a third one based on ICU.

Discussion: Currently we see some problems with the bundled ereg library in some places due to people specifying --with-regex=system. We also see distributions linking against another library than our bundled one to prevent conflicts with the apache bundled regex library, or the system's one. As most people seem to prefer linking against something else than our bundled version, it seems proper to remove this bundled library. If we remove the bundled library, then we need to make the ereg functions into an extension, otherwise we can not enable them in all cases. Some functionality in the core of PHP also uses POSIX regular expressions, those should be rewritten to use PCRE then.

Conclusions:

  1. We make ereg an extension
  2. The PCRE extension will not be allowed to be disabled.
  3. The core of PHP should be made to work with PCRE so that we can safely disable ereg
  4. We unbundle the regex library

3.4 Split ext/dba into a core extensions and sub-extensions in PECL

Discussion: Marcus wants to make ext/dba into a core extension, with all the drivers in PECL. Splitting it up into separate extensions makes it much easier to change handlers in php.ini easily.

Conclusions:

  1. ext/dba should be handled in the same way as PDO
  2. All the handlers stay in the distribution.

3.5 Fileinfo extension in the distribution

Issue: PHP currently doesn't have any reliable mechanism for MIME-type detection.

Discussion: The mime_magic extension doesn't work very well, and there is an extension in PECL (Fileinfo). We suggest to include this extension into the core, and enable it by default as MIME-type detection is something that most web applications need. In the mean while we want to get rid of the "mime_magic" extension in the core.

Currently the Fileinfo extension opens its database whenever you request it, and this is not very efficient. We need to change the extension so that it loads its database on MINIT, and possible see if we can link in the database into the binary, instead of relying on an external file.

Conclusions:

  1. We move mime_magic from the core to PECL
  2. We move the Fileinfo extension to the core, and enable it by default.
  3. The Fileinfo extension should be updated to only load its database once on MINIT.

3.6 Other extensions to PECL?

Issue: There are some extensions in the distribution that are either unmaintained, or just not generally useful.

Discussion: We had a quick look at the current extensions in the core, but decided not to go over this and just continue the current practise of evaluating them one by one.

Conclusions:

  1. We decide on moving one by one on a later time

3.7 Fix ext/soap and add support for wsse/secext

Issue: The SOAP extension is getting more and more used, but has some limitations regarding the support for security extensions.

Discussion: The remaining issues need to be fixed, and some (though not all) support for the security extensions need to be implemented. As the extension is useful for many things, we also decided to turn it on by default.

Conclusions:

  1. ext/soap will be turned on by default
  2. We implement some of the security extensions.

3.8 Allow files with an open stream to be deleted

Issue: Currently it is not possible to delete opened files, and this feature is requested.

Discussion: On Unix this is not a problem, you can simply unlink() the file. On Windows however this is not possible as Windows simply prohibits a file from being deleted when it is open.

Conclusions:

  1. Wez is going to check for a way on how to make this possible on Windows.

3.9 ext/bitset

Issue: ext/bitset is a tiny extension that allows you to do operation on bitsets. It is requested to be put in the core distributions.

Discussion: The extension doesn't rely on any libraries, and is deemed useful enough to put in the core. When we went over the code and tried to compile it we noticed a lot of CS differences compared to our published standards, and there were failed tests.

Conclusions:

  1. We add it only to the core distribution if the above mentioned problems are solved.

4 Engine Additions

4.1 Add a 64bit integer

Issue: Being limited to a signed 32 bit integer is becoming more and more of a nuisance, hence this suggestion for adding a 64 bit integer type.

Discussion: The first idea was to make our current integer into a 64bit version, but that can cause unwanted changes in behaviour that are very hard to detect. We can also not restrict the current integer type to 32bits for the same reason. We do need a new 64bit type, and we will be adding that as a new variable type. The current integer we leave alone, so that it is an 32bit integer on 32bit platforms, and a 64bit integer on 64bit platforms.

Conclusions:

  1. We leave the current integer type alone
  2. We add a new 64bit integer that is always 64bits regardless of platform
  3. The cast name for this new type is (int64) and internally we use IS_INT64 and RETURN_INT64 etc..
  4. We do not add a specialised 32bit only integer type

4.2 Adding "goto"

Issue: Goto is currently missing in PHP, and although there is a limited use for this construct in some cases it can reduce the amount of code a lot.

Discussion: There are some inherent problems with implementing goto, as jumping into a foreach() loop will almost be impossible as at the start of the loop something is initialised. The same is most likely true for other loop constructs.

As goto will most often be used to jump out of nested if statements, we think that restricting the construct so that you can only jump out of a construct is possible. Similarly restricting the construct so that you can only jump down should satisfy people who do not want the ability to jump all over the place.

The name "goto" is misleading, and often associated with BAD THINGS(tm). Because our proposed solution is not a real GOTO construct, we will instead reuse the "break" keyword, and extend it with a static label.

An example of using a labeled break:

<?php
for ($i = 0; $i < 9; $i++)
{
        if (true) {
                break blah;
        }
        echo "not shown";
blah:
        echo "iteration $i\n";
}
?>

Conclusions:

  1. We extend "break" by allowing breaking to a label.
  2. We ask Sara to make a patch for this, and we see how it is going to look like. We decide on that.

4.3 ifsetor() as "replacement" for $foo = isset($foo) ? $foo : "something else"

Issue: Many people requested the "ifsetor()" operator that can set a variables default value if it was not set before, akin to:

<?php
// If $_GET['foo'] is set, then its value will be assigned to $foo,
// otherwise 42 will be assigned to $foo.
$foo = ifsetor($_GET['foo'], 42);
?>

Discussion:

The name for this new operator is heavily disputed and we could not agree on a decent name for it. As this operator is most often used for setting default values for input variables we do need some kind of functionality here.

Instead of implementing ifsetor() we remove the requirement for the "middle" parameter to the ?: operator. The middle parameter then defaults to the first one. If the first parameter is not set, then we will still throw an E_NOTICE. An example on how that might work:

<?php
// Evaluates to $_GET['foo'] if it's not set (with a notice) or false. It
// evaluates to 42 if $_GET['foo'] evaluates to true.
$foo = $_GET['foo'] ?: 42;

// Evaluates to "true" if $blå equals 42 and it evaluates to 54 otherwise.
$blå = $blå == 42 ?: 54;

$bar = bar() ?: 9;
?>

In combination with the new input_filter extension you then reach the original goal of setting a default value to a non-set input variable with:

<?php
$blahblah = input_filter_get(GET, 'foo', FL_INT) ?: 42;
?>

If the input filter's logical filters (prefixed with FL) do not detect the correct type, the value will be false. If it's false, then the above expression assigns 42 to $blahblah.

Conclusions:

  1. We drop the middle value for the ?: operator.
  2. We did not agree on the implementation of ifsetor().

4.4 Allow foreach syntax for multi-dimensional arrays

Issue: There was a suggestion to allow the following construct:

foreach( $array as $k => list($a, $b))

Discussion: Currently the way on how to implement this is with the following code:

<?php
$a = array(array(1, 2), array(3, 4));
foreach( $a as $k => $v) {
        list($a, $b) = $v;
}
?>

So it is not really required to implement this functionality. But we seemed it useful enough to include this new syntax in PHP 6. This means that the above example can now be written as:

<?php
$a = array(array(1, 2), array(3, 4));
foreach( $a as $k => list($a, $b)) {
}
?>

Conclusions:

  1. We add this syntax, and Andrei prepares a patch.

4.5 Cleanup for {} vs. []

Issue: Currently you can use both {} and [] to access both a certain character in a string and array elements. The suggestion is to make {} only work on strings and add substr() functionality to it, and to make [] only work on arrays.

Discussion: Although we deprecated (through the manual) the use of [] for string indexes, a lot of people still do not use this. And internally there is absolutely no difference between {} and []. Having two syntaxes for the same thing makes no sense, and getting rid of [] would break all sorts of stuff. The original reason for the {} was a technical one to simplify the parser, but the landscape has changed and that reason no longer exists.

As far a code readability and obviousness goes, I doubt anybody would guess their way to the $str{5} syntax. If you were new to PHP and you were going to try to guess how you would get a character offset in a string, your first guess to reading characters from a string would be []. Removing the obvious syntax just doesn't make any sense. The other place {} is used outside of control blocks is in quoted strings where "{$foo{1}}" is much uglier than "{$foo[1]}".

Because having two syntaxes doing exactly the same does not make any sense either, we agreed on deprecating the {} syntax in 5.1 with an E_STRICT, and removing it in PHP 6 altogether.

Conclusions:

  1. We will undeprecate [] for accessing characters in strings.

  2. {} will be deprecated in PHP 5.1.0 with an E_STRICT and removed in PHP 6.

  3. For both strings and arrays, the [] operator will support substr()/array_slice() functionality:

    • [2,3] is elements (or characters) 2, 3, 4
    • [2,] is elements (or characters) 2 to the end
    • [,2] is elements (or characters) 0, 1, 2
    • [,-2] is from the start until the last two elements in the array/string
    • [-3,2] this is the same as substr and array_slice()
    • [,] doesn't work on the left side of an equation.

    With these rules, the behaviour for strings will be:

    • $str = "foo"; $str[] = "d"; we modify to make a concatenation.
    • $str = "fo"; $str[] = "od"; will concatenate to "food"
    • $str = ""; $str[] = "d"; should become the string "d", this should become an e_strict in 5.1.1. We need to check how common this is first.

4.6 Changes to the shut-up (@) operator that disallow (@ini_set(...))

Issue: @ operator is very slow

Discussion: When not requiring to have edge cases like @ini_set("error_reporting", E_ALL); working correctly we can make it much faster. Ilia and Marcus already had a patch for that.

Conclusions:

  1. We check with Andi if he has a valid reason to not accept that patch.

4.7 Allow foreach() without "as" part (I guess for iterators)

Issue: In some cases with Iterators you might not need the "as $varname" part in the foreach() statement.

Discussion: This is an edge case, and it does not make sense to add this to the language. It can be much better implemented with a function (such as splforeach()) which allows this behaviour.

Conclusions:

  1. We do not want to add it.

4.8 Named Parameters

Issue: The functionality of named parameters was suggested. Named parameters allow you to "skip" certain parameters to functions. If it would be implemented, then it might look like:

<?php
function foo ($a = 42, $b = 43, $c = 44, $d = 45)
{
        // echos 42, 53, 54, 45
        echo "$a $b $c $d\n";
}

foo(c => 54, b => 53);
?>

Discussion: We don't see the real need for named parameters, as they seem to violate PHP's KISS principle. It also makes for messier code.

Conclusions:

  1. We do not want to add it.

4.9 Make parameter order consistent over all functions

Issue: One point that people find annoying in PHP is the non-standard way of how parameters are ordered to functions. Because there is no consistent way, they always have to use the manual to see what the order is.

Discussion: We went over the string functions and found that there are only two functions that have "needle, haystack" instead of "haystack, needle", namely in_array() and array_search(). For in_array() it makes sense in a logical way to work in the same way as SQL, where you first specify the value, and then you check if it fits "in the array". As array_search() was modelled on this is_array() function the parameter order is the same.

As there are not many inconsistencies, and changing them would cause quite some problems for current applications we decided not to change the order.

Conclusions:

  1. We do not change parameter ordering for internal functions.

4.10 Minor function changes: microtime()

Issue: It was suggested that microtime(true) become the default behaviour. Currently if you pass no parameters the microtime function returns the current time as "microseconds <space> unix_timestamp".

Discussion: As you usually would want to have the full floating point number back, many people use the following snippet (and perhaps even wrap that in a function):

<?php
$m = microtime();
$e = explode(' ', $m);
echo $e[0] + $e[1], "\n";
?>

We want to change the behaviour to return a normal float straight away (which you can now do by passing "true" as first parameter). The following snippet:

<?php
$m = microtime(true);
echo $m, "\n";
$e = explode(' ', $m);
echo $e[0] + $e[1], "\n";
?>

Throws only a notice, while the result is still correct. As it's only a notice, we feel safe enough to change the default behaviour to return a float. We do need to investigate what happens if any of the following values are passed though: none, null, false and true.

Conclusions:

  1. We will change the default behaviour of microtime() to return a float.

5. Changes to OO functionality

5.1 "function require __construct(" to force calling the parent's constructor

Issue: Some extensions such as PDO allow their classes to be inherited. The constructors of those inherited classes are required to call the extension class' constructor though as that one needs to initialise the internal structures. Currently there is no way in the engine to require this.

Discussion: In order to address this issue we need to add a flag internally that tells the engine that it should bail out if methods are called, but the extensions' constructor was not called yet. For this to work, we need to add a flag to the bottom most object in the hierarchy that is still an internal class. Add an additional class pointer to the class pointing to the constructor that should be called.

Conclusions:

  1. We add a flag to the class structure to record this
  2. We do not add new syntax for this to userland

5.2 Allow interfaces to specify the __construct() signature

Issue: Currently it is not possible to define a __construct() signature in an interface.

Discussion: We didn't see a reason why this shouldn't be allowed, but Andi seems to have a reason for it.

Conclusions:

  1. Zeev asks Andi why he doesn't want constructors in the interface. If there is no sound reason we add this possibility.

5.3 Implement inheritance rules for type hints

Issue: Currently we don't check inheritance rules for type-hinted parameters.

Discussion: Marcus explains with an example how inheritance rules for type-hinted parameters should work, and also mentions that most probably no language currently implements this correctly. This is not a very important check, and therefore we see no reason why we should implement this either.

Conclusions:

  1. We are not going to add the checks.

5.4 Late static binding using "this" without "$" (or perhaps with a different name)

Issue: Currently, the following script will print "A:static2":

<?php
        class A {
                static function staticA() {
                        self::static2();
                }

                static function static2() {
                        echo "A::static2\n";
                }
        }

        class B extends A {
                static function static2() {
                        echo "B::static2\n";
                }
        }

        B::staticA();
?>

Discussion: Currently there is no way do "runtime evaluating" of static members so that we can call B::static2() from A::staticA() and this is a useful feature. In order to implement this we need a new keyword to allow for this. As we do not want to introduce yet another reserved word the re-use of "static" was suggested for this.

The same example, but now with the call to "self::static2()" replaced with "static::static2()", will then print "B::static2".

Conclusions:

  1. We re-use the "static::" keyword to do runtime evaluation of statics.
  2. Marcus prepares an implementation suggestion.

5.5 Object casting to primitive types

Issue: PHP does not support a call-back when an object is cast to another (scalar) type.

Discussion: As PHP is a weekly typed language this kind of functionality does not make sense in PHP. We only leave the __toString() method which is called on a (string) cast. In PHP 5.1 the following already gives notices on the (int) and (double) casts, where the __toString() method is also correctly called:

<?php
        class a {
                function __toString() {
                        return "string";
                }
        }

        $a = new a;
        echo (int) $a, "\n";
        echo (bool) $a, "\n";
        echo (string) $a, "\n";
        echo (float) $a, "\n";
?>

Conclusions:

  1. We will not add magic call-back functions for other casts.

5.6 name spaces

Issue: PHP currently has no name spaces, which some people find inconvenient as they are required to prefix all their classes with an unique prefix.

Discussion: First we briefly discussed the current name space patch, but as we were not all familiar with its workings we did not go into deep detail for this. Then we saw an alternative implementation of name spaces with "Modules". This is an example on how this should work:

<?php
import M1 as M2;
echo M2::$var,"\n";
echo M2::c,"\n";
echo M2::func(),"\n";
echo M2::C::func(),"\n";
var_dump(new M2::C);
?>

M1.php:

<?php
module M1 {
        var $var = "ok";
        const c = "ok";
        function func() { }

        class C {
                static function func() { return "ok"; }
                static private function bug() { echo "bug\n"; }
        }

        private class FOO {
                public class BAR {
                        static function bug() { echo "bug\n"; }
                }
        }

        function bar() { return new M1::FOO(); }
}
?>

This approach suffers from a few problems:

  • When calling you still have to prefix all your classes.
  • You are forced into a specific naming scheme for your modules.

After the modules, we came up with some implementation guidelines on how we would like to see support for name spaces and decided we would only introduce them if the following rules could be implemented:

  • Implement a "name space" keyword that you can wrap around a class definition with {}.

  • Internally this adds <namespace-name> to the class names defined inside it separated by a separator. The following example would create the class "spl<separator>file":

    <?php
    namespace spl {
            class file {
            }
    }
    ?>
    
  • The suggested separator is "\" as this is the only free choice.

  • import will be request-wide and the import keyword copies class entries to it's new name

  • If we encounter a conflict due to importing we abort execution

  • "import spl\*" will copy all classes in the spl name spaces to the "normal" namespace which doesn't have a prefix.

  • Functions in name spaces are allowed.

  • Constants in name spaces are allowed unless we find problems with the implementation.

  • No variables are allowed in name spaces.

Conclusions:

  1. If we're going to do this, the name spaces look like above.
  2. Marcus is going to provide a patch.

5.7 Using an undefined property in a class with defined properties should throw a warning

Issue: Current PHP will not throw any warning with the following code, and will just create a new property:

<?php
class foobar {
        public $supercalifragilisticexpialidoceous;

        function rød() {
                $this->supercalifragilistcexpialidoceous = 42;
        }
}

$foo = new foobar;
$foo->rød();
?>

This makes debugging of code harder.

Discussion: Just like with normal variables, you don't have to initialise properties. This is a feature of the language, and is used a lot in projects. A solution would be to mark a class as "strict" but that would introduce a new keyword and is against the KISS approach of PHP.

Conclusions:

  1. We will not start throwing any notice for this.

5.8 Type-hinted properties and return values

Issue: PHP only supports type hinted arguments and not for return values or properties.

Discussion:

We quickly agreed that we don't need type-hinted properties, as it would cause problems when they are assigned to other variables and it's just generally not-PHP style.

For return values it does make some sense, but definitely not as much as type-hinted arguments to functions. One discussion point was how to tell the parser the return type of a functions, we came up with the following suggestions for syntax (where ObjectName is the type-hint):

  1. function ObjectName &funcname();
  2. function &ObjectName funcname();
  3. function &funcname ObjectName();
  4. ObjectName function &funcname();
  5. function &funcname() returns ObjectName;

Conclusions:

  1. We do not allow type-hinted properties as it's not the PHP way.
  2. We will add support for type-hinted return values.
  3. We need to pick a syntax for type-hinted return values.

5.10 Method calls

Issue: Currently you can call methods both static and dynamic, whether they are marked as static or not:

<?php
class gren {
        static function grenStatic($a) { echo "$a - static function\n"; }
        function grenDynamic($a) { echo "$a - dynamic function\n"; }
}

gren::grenStatic("static call");
gren::grenDynamic("static call");

$gren = new gren;
$gren->grenStatic("dynamic call");
$gren->grenDynamic("dynamic call");
?>

Discussion:

The second call will now throw an E_STRICT warning, and as it is dangerous we decided to make this an E_ERROR instead.

Conclusions:

  1. We will make calling a dynamic function with the static call syntax E_FATAL.
  2. We will not disallow calling a static member with dynamic syntax.

5.11 ReflectionClass cache in zend_class_entry* and support "$this::class"

Issue: Reflection is quite slow

Discussion: We don't really care if this is cached, as it will only be done when reflection is used. In this case things are sped up a bit.

Conclusions:

  1. We move the reflection code to its own extension.
  2. Marcus implements the ReflectionClass cache in struct zend_class_entry*.

5.12 Delegates

Issue: PHP does not support delegates, but requires you to implement "delegation" yourself.

Discussion: For some interfaces it is useful to have "delegators" so that you don't have to implement the functions to call delegators yourself. We did not see any real-world code example, but basically this is what the "delegate" keyword would do:

<?php
interface IF {
        function f();
        function g();
}

class whatever implements IF {
        // Generate default delegator functions
        delegate IF $if;

        function __construct(IF $x) {
                $this->if = $x;
        }

        /* Generated automatically internally:
        function f() {
                $this->if->f();
        }
        function g() {
                $this->if->g();
        }
        */
}
?>

Conclusions:

  1. We are not going to implement this.

6 Additions

6.1 Add an opcode cache to the distribution (APC)

Issue: Many people are requesting an opcode cache in the default distribution of PHP, as it boosts performance quite a lot.

Discussion: Rasmus suggested to put an opcode cache into PHP, and after a quick discussion we found that the only alternative license wise is APC. A few concerns were raised on whether it should be enabled by default and whether other opcode caches could still be used. Enabling by default is not possible because some configuration needs to be done for the cache.

Conclusions:

  1. We include APC in the core distributions
  2. APC will not be turned on by default.
  3. APC will switch to mmap as default shared memory storage.

6.2 Merge Hardened PHP patch into PHP

Issue: The Hardened PHP patch implements an amount of extra checks to PHP to make things more secure.

Discussion: We went over the features that the patch offers, and discussed whether we might want to include them in stock PHP. One of the points that came up was the allow_url_fopen setting we currently have in PHP. Many ISPs disable it because of sound security reasons for remote paths with include(), but unfortunately by turning this setting off they are also turning off the possibility to use fopen("http:...") f.e. This is why we want to split this option into two settings.

Conclusions:

  1. We want to include the patch' real-path fix.
  2. We want to include the protection against HTTP Response Splitting attacks (header() shouldn't accept multiple headers in one call).
  3. We split allow_url_fopen into two distinct settings: allow_url_fopen and allow_url_include. If allow_url_fopen is off, then allow_url_include will be off too.
  4. We enable allow_url_fopen by default
  5. We disable allow_url_include by default

6.3 Sand boxing or taint mode

Issue: PHP does not have support for a sand boxed environment.

Discussion: We discussed both a taint mode where input variables have to be untainted before use, but this is a moot point as we need to have different contexts (SQL, output...) and this can not be checked without knowing the application. Taint mode is therefore not overly useful in PHP.

Sand boxing might be an option, but we need a good plan and a very solid patch if we even want to consider including it into PHP.

Conclusions:

  1. No taint mode
  2. Only sand boxing if we have a rock solid implementation

6.4 All non-fatal errors should be marked in extensions as E_RECOVERABLE_ERROR

Issue: Currently many extensions use E_ERROR if something goes wrong, which stops the execution of the script immediately, even when this is not really required.

Discussion: PHP 6 (Head) already includes a new error level "E_RECOVERABLE_ERROR" that can be used instead of E_ERROR to signal a severe error that requires handling with a user defined error handler otherwise it aborts the script. This should be used by the engine when it can still recover from the error, while E_ERROR should be reserved for cases where the engine is a in a definite unstable state.

Extensions should be using E_WARNING for cases where something goes wrong, unless they can really not continue after an error. In this case an E_RECOVERABLE_ERROR error should be used. We need to go over all extensions and engine and fix the error levels according to the policy.

Conclusions:

  1. We go over the engine and extensions and make sure only E_ERROR is used where the engine is in an unrecoverable state.

6.5 All non-fatal errors should become exceptions

Issue: PHP does currently not throw exceptions for notices and warnings.

Discussion: Nothing internally throws an exception and it is hard to figure out which error level should throw an exception or not. Besides this, turning your favourite error level into an exception can already easily be done with the following snippet:

<?php
function error_handler($errorType, $message)
{
        if ($errorType == E_NOTICE) {
                throw new Exception( $message, $errorType);
        }
}

set_error_handler('error_handler');

// Throws a notice
echo $new;
?>

Conclusions:

  1. We are not going to make exceptions out of any error level.

6.6 E_STRICT on by default

Issue: PHP's E_STRICT error level is meant to point users to language level warnings/errors. E_STRICT is currently not part of E_ALL and thus often those E_STRICT messages will be hidden from users.

Discussion: As we want to expose the language level warnings a bit more, and because of having all error levels in E_ALL, except E_STRICT is confusing we will be adding E_STRICT to E_ALL. As the current default is E_ALL & ~E_NOTICE we will effectively turn on E_STRICT by default.

Conclusions:

  1. We add E_STRICT to E_ALL

6.7 Remove support for <?, <% and <script language="PHP"> and add "<?php =$var?>"

Issue:

Discussion:

Conclusions:

  1. We kill "<%" but keep "<?".
  2. Jani will prepare a patch that disallows mixing different open/close tags.
  3. We will not add "<?php =".

6.8 Rewrite build system

Issue: The current build system is fine, but has some annoyances such as the requirement to use config.m4 files for configuring parts of PHP.

Discussion: The current stuff works well, except of some annoyances like still requiring autoconf-2.13 and m4. We see no reason to actively start working on a new build system, but if there is a good new idea and somebody who wants to implement it we might have a look at it.

Conclusions:

  1. No active changes.
  2. We might want to look at a solid plan and when there is a volunteer to implement it.

6.9 Added persistent flag to zval struct

Issue: It is impossible to allocate persistent zvals in PHP.

Discussion: We had support for this before, but it was removed because nothing is using it. The idea is to add this functionality back after figuring out the best way on how to do this. There are two possible implementations:

  1. Use a specific memory block list for persistent zvals.
  2. Use a different memory allocator if the flag is set.

Conclusions:

  1. We need to find a good implementation suggestion.

6.10 Read-only properties

Issue: It is impossible for extensions to provide read-only properties to user-land.

Discussion: Some extensions provide read-only data in properties, but the engine api does not support this. Therefore all extensions that do this are slower than necessary. If we add support for this some extensions can be improved as they can directly map a property to a memory element as returned by an extension's internals.

We also discussed whether we should expose this to user land, but if we do then we need to find a way on how to set the read-only properties values in the first place.

Conclusions:

  1. Marcus prepares a patch to add ZEND_ACC_READONLY