# Special problem when use simplexml_load_string function

Today, my boss ask me to fix bug cannot parse XML string: simplexml\_load\_string cannot parse data correctly because of XML file contain some special characters.

Here is my simple code to test:

```php
$xml_file = '139_1356677622_o1_13.jpg.xml';
if (file_exists($xml_file)) {	
    $xml = file_get_contents($xml_file);
    $xml = simplexml_load_string($xml);	
    print_r($xml);
}
```

After run it, system shows some warning/error message like bellow:

```php
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 36: parser error : Input is not proper UTF-8, indicate encoding
```

and

```php
Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 56: parser error : CData section not finished
```

I thought about replacing special characters, but what is special characters? How do we know how many character cannot parse by simplexml\_load\_string/simplexml\_load\_file?

After search on internet, I found [this article](http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Char) and luckily, I found [the code of eZ Public](http://pubsvn.ez.no/doxygen/3.8/html/ezxml_8php_source.html) too :D

```php
$xmlDoc = preg_replace('/[\x00-\x08\x0b-\x0c\x0e-\x1f]/', '', $xmlDoc);
```

So my code will be:

```php
$xml_file = '139_1356677622_o1_13.jpg.xml'; 
if (file_exists($xml_file)) { 
    $xml = file_get_contents($xml_file); 
    $xml = utf8_encode($xml); 
    $xml = preg_replace('/[\x00-\x08\x0b-\x0c\x0e-\x1f]/', '', $xml); 
    $xml = simplexml_load_string($xml); 
    print_r($xml); 
}
```

It works! The problem is solved :-)
