Неправильный синтаксический анализ, если указана неиндентированная строка XML

Я пытаюсь проанализировать XML, получая значения узлов и их атрибутов в ассоциативном массиве. В следующем классе convert_simple_xml_element_object_into_arrayпредназначен для выполнения работы.

Но происходит нечто странное. Если входные данные имеют правильный отступ xml, ассоциативный массив возвращается правильно. Однако если xmlпередается строка без отступа, она возвращает неверный ассоциативный массив с пустыми индексами. В чем может быть причина?

Пример xmlстроки:

<?xml version="1.0"?>
<StreamWebInfo><UserInfo Username="a@b.com" AccountId="19"/><JobInfo Id="594" QualifiedFilePath="https://s.com/main_DIS_23009_1_v_2_1c2_2011_08_30.mpd" ParentContainerType="0" ContainerType="10" EndTime="2016-10-05 11:45:09"/><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="320" Height="240" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="320" Height="240" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="320" Height="240" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="480" Height="368" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="864" Height="480" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="864" Height="480" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1280" Height="720" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo><ProfileInfo><VideoTrackWebInfo CodecType="3" Width="1920" Height="1088" Bitrate="0" TrackDurationInMin="199" FeaturesUsed="0"/></ProfileInfo></StreamWebInfo>

Класс, имеющий метод:

<?php

class _xml_parser {

    const EMPTY_STRING = '';
    const MAX_RECURSION_DEPTH_ALLOWED =  200;
    const SIMPLE_XML_ELEMENT_OBJECT_PROPERTY_FOR_ATTRIBUTES = '@attributes';


    /**
     * Get the SimpleXMLElement representation of the function input 
     * parameter that contains XML string. Convert the XML string 
     * contents to SimpleXMLElement type. SimpleXMLElement type is 
     * nothing but an object that can be processed with normal property 
     * selectors and (associative) array iterators.
     * 
     * @param string $xmlStringContents
     * @return SimpleXMLElement get_simple_xml_element returns a SimpleXMLElement object which 
     * contains an instance variable which itself is an associative array of 
     * several SimpleXMLElement objects.
     * 
     *  
     * @version 1.0.0
     */
    public static function get_simple_xml_element($xmlStringContents) {
        $simpleXmlElementObject = self::EMPTY_STRING;
        if('string' == gettype($xmlStringContents)) {
            $simpleXmlElementObject = simplexml_load_string($xmlStringContents);
        }
        return $simpleXmlElementObject; 
    }

    /**
     * This function accepts a SimpleXmlElementObject as a single argument and
     * converts the XML object into a PHP associative array. 
     * If the input XML is in tree (i.e. nested) format, this function will return an associative  
     * array (tree/nested) representation of that XML.
     * 
     * Note: It is a recursive a function
     * 
     * @param string $simpleXmlElementObject
     * @param number $recursionDepth
     * 
     * @return If everything is successful, it returns an associate array containing 
     *  the data collected from the XML format. Otherwise, it returns null.
     *
     * 
     */
    public static function convert_simple_xml_element_object_into_array($simpleXmlElementObject, &$recursionDepth=0) {
        // Keep an eye on how deeply we are involved in recursion.
        if ($recursionDepth > self::MAX_RECURSION_DEPTH_ALLOWED) {
            // Fatal error. Exit now.
            return(null);
        }

        if ($recursionDepth == 0) {
            if (!($simpleXmlElementObject instanceof SimpleXMLElement)) {
                // If the external caller doesn't call this function initially
                // with a SimpleXMLElement object, return now.
                return(null);
            } else {
                // Store the original SimpleXmlElementObject sent by the caller.
                // We will need it at the very end when we return from here.
                $callerProvidedSimpleXmlElementObject = $simpleXmlElementObject;
            }
        }   

        if ($simpleXmlElementObject instanceof SimpleXMLElement) {
            // Get a copy of the simpleXmlElementObject
            $copyOfsimpleXmlElementObject = $simpleXmlElementObject;
            // Get the object variables in the SimpleXmlElement object for us to iterate.
            $simpleXmlElementObject = get_object_vars($simpleXmlElementObject);
        }

        // It needs to be an array of object variables.
        if (is_array($simpleXmlElementObject)) {
            // Initialize the result array.
            $resultArray = array();
            // Is the input array size 0? Then, we reached the rare CDATA text if any.
            if (count($simpleXmlElementObject) <= 0) {
                // Let us return the lonely CDATA. It could even be whitespaces.
                return (trim(strval($copyOfsimpleXmlElementObject)));
            }

            // Let us walk through the child elements now.
            foreach($simpleXmlElementObject as $key=>$value) {
                // Uncomment the following block of code if XML attributes are
                // NOT required to be returned as part of the result array.
                /*
                 if((is_string($key)) && ($key == self::SIMPLE_XML_ELEMENT_OBJECT_PROPERTY_FOR_ATTRIBUTES)) {
                    continue;
                 }
                 */
                // Let us recursively process the current element we just visited.
                // Increase the recursion depth by one.
                $recursionDepth++;
                $resultArray[$key] = self::convert_simple_xml_element_object_into_array($value, $recursionDepth);
                // Decrease the recursion depth by one.
                $recursionDepth--;
            } 

            if ($recursionDepth == 0) {
                // That is it. Heading to the exit now.
                // Set the XML root element name as the root [top-level] key of
                // the associative array that we are going to return to the caller of this
                // recursive function.
                $tempArray = $resultArray;
                $resultArray = array();
                $resultArray[$callerProvidedSimpleXmlElementObject->getName()] = $tempArray;
            }

            return ($resultArray);
        } else {
            // We are now looking at either the XML attribute text or
            // the text between the XML tags.
            return (trim(strval($simpleXmlElementObject)));
        } // End of else
    }

    /**
     * Converts XML to JSON
     * @param SimpleXMLElement $simpleXmlElementObject
     * @return JSON string
     *  
     */
    public static function xml2json($simpleXmlElementObject) {
        $json_from_xml = null;
        if($simpleXmlElementObject instanceof SimpleXMLElement) {
            $xml_map = self::convert_simple_xml_element_object_into_array($simpleXmlElementObject);
            $json_from_xml = json_encode($xmlMap);
        }
        return $json_from_xml;
    }

}

В приведенном выше xmlмассиве возвращается ключ с именемProfileInfo, но он содержит карту, которая имеет пустую пару значений ключа.

1 ответ

  1. В функции convert_simple_xml_element_object_into_array необходимо проверить, имеют ли дочерние объекты объект SimpleXMLElement, не имеющий атрибутов.
    Если это так,то для каждого ребенка вам придется вызывать convert_simple_xml_element_object_into_array рекурсивно.

    Замена старого кода на Новый должна возвращать правильный массив:

    старый код:

    // Is the input array size 0? Then, we reached the rare CDATA text if any.
    if (count($simpleXmlElementObject) <= 0) {
      // Let us return the lonely CDATA. It could even be whitespaces.
      return (trim(strval($copyOfsimpleXmlElementObject)));
    }
    

    новый кодекс:

    // Is the input array size 0? Then, we reached the rare CDATA text if any.
    if (count($simpleXmlElementObject) <= 0) {
      //Check if the Object have children. If so, call again the function
      if(($copyOfsimpleXmlElementObject instanceof SimpleXMLElement) && (count($copyOfsimpleXmlElementObject->children()) >=1)) {
        foreach($copyOfsimpleXmlElementObject->children() as $child){
          $recursionDepth++;
          $resultArray[$child->getName()] = self::convert_simple_xml_element_object_into_array($child, $recursionDepth);
          $recursionDetph--;
        }                       
      }
      else{
        // Let us return the lonely CDATA. It could even be whitespaces.
        return (trim(strval($copyOfsimpleXmlElementObject)));
      }
    }