Socket use in PHP for use with webservices

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
bcmiller0's picture

I'm at BADCamp, which is awesome, and since we at AllPlayers.com (http://www.allplayers.com) are heavily invested in webservices and being a big proponent of using them for a variety of reasons; I asked crell, how I could help out. It was indicated that if I could start a discussion on how to perform socket level communications in php that would be a small help.

The second part of this being after a write-up on php socket communictions, would be to see if there are is any code in the opensource world that could be leveraged to assist in this manner.

So I'll start adding First how to perform socket level communications both from a client, and how to setup a server or deamon going on this. After we have something there, then hopefully some quick research can be done to see if there is any thing available to leverage in this regard in a OOP setup.

Sockets in php basically exposes the normal BSD socket level calls through an extension in PHP. This allows one who is familar with writting C or C++ programs that take advantage of sockets either as as client or a server to be comfortable performing the same tasks in PHP.

There are two normal ways in PHP to perform socket level operations, via directly with socket calls, or via streams which allow one to use normal file calls in PHP such as fget or fwrite.

Lets start by seeing how to simply open a socket connection , send a request, and recieve a response using just sockets. This would be the case requiring the most steps, as streams will handle some of this for you and allow one to deal with sockets as if they are files.

Here are the steps we'll need to accomplish for socket level communication:

  • Create a Socket
  • Set socket non-blocking, if we want to continue and not have our script pause to wait when we try and write or read to the socket. This is the case with a blocking socket.
  • Socket connect
  • Socket Write
  • Socket Read

Here is some sample code that simple creates a socket, connects, writes to the socket, and then reads a response. This is done in a blocking manner meaning we dont' need our code to do other things while we maybe waiting on the write or read to the socket to actually complete. Blocking is the normal handle creation mode, and socket_set_nonblock would be needed to change the socket's behavior.

  // connect over UDP or TCP depending on $protocol variable
  if ($protocol == 'udp') {
     $socket_handle = socket_create(AF_INET, SOCK_DGRAM, SOL_UDP);
  }
  else {
    $socket_handle = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
  }
  // provided we got a socket handle connect to it use the specified  $host and $port
  if ($socket_handle != FALSE) {
     // this is where we would make the handle nonblocking if needed
     //socket_set_nonblock($socket_handle);
     socket_connect($socket_handle, $host, $port);
  }
  // send a message over the socket
  socket_write($socket_handle, $message);

  // resceive the response from the message sent up to 8192 bytes
  $data_response = socket_read($socket_handle, 8192);

  // close the socket we did all we needed on this one
  socket_close($socket_handle);

Next we'll see how to handle the same case with non-blocking code, so we can write and read to multiple sockets at the same time and loop through those that may need to be read from. In this case we could also implement our own timeout of those sockets to flag those that have hit a timeout condition. This is what would be needed for a server side implementation, or a more sophisticated client that can send many requests in parallel to a server, and manage all those requests at the same time for performance purposes.

Here is a example for non-blocking. Here in our example we have an array of a number of requests we want processed in parallel from the server as quickly as possible. For this example assume we are doing some sort of lookups on a server, and have a number of lookups we need processed, and we would like to perform all these lookups in parallel for performance reasons instead doing them each one at a time. Let's take an example like we are doing domain lookups to see what domains are available on the internet, we are an internet domain registrar (like 1and1 or Godaddy) and have different servers we can use for each TLD for a domain to determine if it is available or not.

When dealing with non-blocking, one sets the socket_handle to non-blocking prior to connecting to the socket. Then one will use socket_select to allow us to determine which sockets are ready to have data read from them or available to have written to them, as we pass socket_select an array for read sockets, and for write sockets to check. Socket select also allows for a timeout value to be used for how long we'll allow socket_select to run before it returns control to our code.

Here we'll have a domainlookup class that can be utilized to send a number of parallel domainlookups which we'll read the responses on to determine if the domain is available in our example. There are a few spots of this code left out and labeled in comments and such as that and magic as not relevant to using sockets.

/* example Class for performing lookups on Domains to see if they are avialable for use
  * checkDomains will run through them and allow one to see what domains are avialable
  **/
class DomainLookup
{
  /*
   * Checks multiple domains in parallel
   * Returns an array containing the same keys as the input $domain, and
   * with values of:
   * array("domain" => $domain, "status" => $status)
   * where $domain is the value of the entry in $domains, and $status is:
   * 0 when the domain is unavailable
   * 1 when the domain is available
   * -1 when an error occurred
   */
  function checkDomains($domains) {
    $domains_info = array();
     foreach ($domains as $key => $domain) {
       // left out ... sets the variables needed to process
       $domains_info[$key] = array(
          "domain" => $domain,
           "basename" => $domain_basename,
           "tld" => $tld,
           "method" => $method,
           "status" => -1,
           "result" => ""
       );

    } // foreach end

   // Setup data
   $tocheck = array();
   foreach ($domains_info as $key => $domain_info) {
     $this->doQueries($domains_info, $tocheck);

     $result = array();
     foreach ($domains as $key => $domain)  {
       $entry = array("domain" => $domain,
       "status" => $domains_info[$key]["status"]);
          $result[$key] = $entry;
     }
     return $result;
  }


  // Connects to the appropriate service for each domain, returns a socket
  // handle, host and port
  function connect($domain_info)  {
   // ..left out ..
   // $host, $protocol, $port are set based on the domain

    if ($protocol == 'udp')  {
       $socket_handle = socket_create(AF_INET, SOCK_DGRAM, SOL_UDP);
    }
    else {
      $socket_handle = socket_create(AF_INET, SOCK_STREAM, SOL_TCP);
    }
    if ($socket_handle != FALSE) {
       socket_set_nonblock($socket_handle);
       @socket_connect($socket_handle, $host, $port);
    }

    return array(
       "socket_handle" => $socket_handle,
       "protocol" => $protocol,
       "host" => $host,
       "port" => $port
    );
}

    // Generates a string to send to the service
    function getQuery($domain_info)
    {
      $domain = $domain_info["domain"];
      // magic that sets the $query based on the domain TLD and method
      // .. left out ..

      return $query;
    }

    // Converts an array (with keys corresponding to the keys in
    // $domains_info) of domains we still have to process, and returns a new
    // array (containing the socket handles for the corresponding
    // connections) to be used in a select call
    function pending2selectArray($pending_array, $domains_info)  {
      $select_array = array();
      foreach ($pending_array as $key => $value) {
        $select_array[$key] = $domains_info[$key]["socket_handle"];
      }

      return $select_array;
    }

    // Calculate timeout
    function getTimeout($starttime, &$domains_info,  $pending_reads, $pending_writes) {
      $pending_keys = array_unique(array_merge(array_keys($pending_reads), array_keys($pending_writes)));
      $timeout = 0;
      foreach ($pending_keys as $key)   {
        $tld = $domains_info[$key]['tld'];
        $method = $domains_info[$key]['method'];
        $connection_timeout = $timeout_array[$method];
        if ($timeout < $connection_timeout)  {
          $timeout = $connection_timeout;
        }
      }
      $endtime = $starttime;
      $endtime["sec"] += $timeout;

      $curtime = gettimeofday();
      $timeout_sec = $endtime["sec"] - $curtime["sec"];
      if ($endtime["usec"] < $curtime["usec"])    {
        $timeout_usec = 1000000 + $endtime["usec"] - $curtime["usec"];
        $timeout_sec--;
      }
      else {
        $timeout_usec = $endtime["usec"] - $curtime["usec"];
      }

      return array($timeout_sec, $timeout_usec);
    }

    // Writes queries to the writable sockets passed through $select_array,
    // and removes the entries from $pending_array
    function sendQueries($select_array, &$pending_array, $socket_map)  {
      foreach ($select_array as $socket_handle) {
        $key = array_search($socket_handle, $socket_map);
        socket_write($socket_handle, $pending_array[$key]);
        unset($pending_array[$key]);
      }
    }

    // Processes data that has been returned from a socket
    function getResponses($select_array, &$pending_reads, &$pending_writes, &$domains_info,
       $socket_map, $domains) {
       foreach ($select_array as $socket_handle)  {
         $key = array_search($socket_handle, $socket_map);
         $data_read = @socket_read($socket_handle, 8192);
         $domains_info[$key]["result"] .= $data_read;
         if (strlen($data_read) == 0
            // For UDP only read the first packet
            || $domains_info[$key]["protocol"] == "udp")  {
            // We've received a complete request
           unset($pending_reads[$key]);
           unset($pending_writes[$key]);
           $domain = $domains_info[$key]['domain'];
          }
       }
    }

    // Parses the result of a query and returns the status of the domain
    function parseResults(&$domains_info, $key)  {
      $result = $domains_info[$key]["result"];
      $domain = $domains_info[$key]["domain"];
      $status = -1; // if it can't be parsed, return -1 (error)

       // magic that takes result and parses that to set $status for that $domain
       // ... left out .... it's only an example
       /// ...
       
       $domains_info[$key]["result"] = "";
       $domains_info[$key]["status"] = $status;
       return;
    }

    // Does the actual parallel lookups.
    // Every domain in $domains_info is looked up
    function doQueries(&$domains_info)  {
      // Open up connections for each lookup
      $pending_reads = array();
      $pending_writes = array();
      $socket_map = array();
      $domains = array();
      $available_connections = array();

      foreach ($domains_info as $key => $domain_info)  {
         $tld = $domain_info['tld'];
         $method = $domain_info['method'];
         $domains[$domain_info['domain']][$domain_info['method']] = $key;
         $result_array = $this->connect($domain_info);
         foreach (array("socket_handle", "protocol", "host", "port") as $value)    {
            $domains_info[$key][$value] = $result_array[$value];
         }
               
         if ($domains_info[$key]["socket_handle"] != FALSE)   {
            $pending_reads[$key] = TRUE;
            $pending_writes[$key] = $this->getQuery($domains_info[$key]);
            $socket_map[$key] = $domains_info[$key]["socket_handle"];   
         }
      }

      // Wait for any connection to become non-blocking and then
      // send/receive, until timeout happens
      $starttime = gettimeofday();

      $select_reads  =  $this->pending2selectArray($pending_reads, $domains_info);
      $select_writes = $this->pending2selectArray($pending_writes, $domains_info);

      list($timeout_sec, $timeout_usec) = $this->getTimeout($starttime, $domains_info, $pending_reads, $pending_writes);

      // Do the big select
      while (!(  $timeout_sec < 0
                || count($pending_reads) + count($pending_writes) == 0)
            && socket_select($select_reads, $select_writes,
            $select_except = NULL, $timeout_sec, $timeout_usec) > 0)  {
          // Write pending queries
          $this->sendQueries($select_writes, $pending_writes, $socket_map);
          $curtime = gettimeofday();
          $this->getResponses($select_reads, $pending_reads, $pending_writes,
          $domains_info, $socket_map, $domains);

          $select_reads  =  $this->pending2selectArray($pending_reads, $domains_info);
          $select_writes =  $this->pending2selectArray($pending_writes, $domains_info);

          list($timeout_sec, $timeout_usec) = $this->getTimeout($starttime, $domains_info, $critical, $pending_reads, $pending_writes);
     }
       
     foreach ($domains_info as $key => $domain_info)  {
       // Close connections
       if ($domains_info[$key]["socket_handle"] != FALSE)   {
          socket_close($domains_info[$key]["socket_handle"]);
          unset($domains_info[$key]["socket_handle"]);
       }
           
     }
     foreach ($domains_info as $key => $domain_info)   {
        // Parse results for successful queries
        if ($domain_info["result"] != "")   {
           $this->parseResults($domains_info, $key);
        }
     }
  }

}

Next We'll use the streams approach so we can deal with sockets as if they are files.

Comments

HTTPRL

mikeytown2's picture

90% of this has been done in the HTTPRL module using streams; including redirect handling.
http://drupal.org/project/httprl

Doing it with sockets can be done as well; as long as socket_select() works as advertised.

HTTPRL

bcmiller0's picture

I'll have to checkout HTTPRL, not familiar with it. I have used socket_select in my past life and works well. I've used both on the deamon side, and with a client that sent requests in parallel for speed. Thanks for the tip.

Wrong sockets :-(

Crell's picture

I'm afraid I wasn't entirely clear. I wasn't talking about sockets in general. I specifically mean HTML5 WebSockets: http://dev.w3.org/html5/websockets/

How can we do that in Drupal/PHP, when it's based on a persistent connection with the browser, which PHP and Drupal in particular were not designed for?

Sorry for the confusion. :-(

Other language

AndrzejG's picture

Isn't it possible to integrate a block written in higher level language?

phpwebsocket project

bcmiller0's picture

Wondered about why we were not just looking for a library already to handle this for us. However I do see phpwebsocket project which would seem to possibly serve to handle this in a websocket perspective. See http://code.google.com/p/phpwebsocket/ (might be a starting point).

It has quick example for client side and server side on their page.

Looks like WebSOcket API uses a handshake but basically most other socket code that has been built would work without much in the way of modifications. Does look like on sending data through the socket, one also needs to add a character to front of the message (chr 0 ) and another to the end (chr 255) to the end of the message (something like chr(0).$msg.chr(255) ).

I'll see what else i can dig up..

3rd party libraries

Crell's picture

I am 100% OK with leveraging external libraries to help us here, provided that:

1) They're properly namespaced in PHP 5.3.

2) The actually work. :-)

3) The licensing is OK. (GPL, LGPL, or MIT/BSD.)

4) They let us integrate it into Drupal's request handler in a sane fashion. Of course, our request handler is in the process of changing anyway but let's keep that in mind. :-)

Correct me if I'm wrong here,

tizzo's picture

Correct me if I'm wrong here, but isn't a PHP websocket connection sort of asking for failure? Wouldn't we have to hold an Apache process open for the entire duration of the open socket? That ties up a lot of RAM just sitting around waiting for an event to happen. It seems like it would be better to rely on an evented system like node.js or twistd to maintain the persistent connections and to do the actual message routing. This is the approach that we have taken on the node.js Drupal module. All the smarts live in Drupal and the messaging lives somewhere else, somewhere that will eat hardly any CPU and can handle tens of thousands of concurrent connections with a single process without wasted memory or clock cycles.

Normally we try to keep capabilities for Drupal in PHP & MySQL land so that even mom-and-pop sites hosted on $6/month accounts can share in the glory, but $6 hosting is going to fall over right quick if a few visitors open web socket connections at the same time anyway...

Possibly

Crell's picture

I haven't worked with websockets, but I agree that they seem dangerous in a PHP world. I do believe we need to have support for websockets in core, or very easily addable via a known, semi-official contrib, though. If we end up having to do that via a node.js bridge, then we need to make sure that bridge is a lot more capable than it is now. For instance, we would need in-core support for the queue that node.js would be processing to send out responses.

I don't know what else is involved here, to be honest. That's why I was hoping I could coax someone else to figure it out and make a solid recommendation. :-)

Browser support

jdwalling's picture

Browser support for WebSocket protocol is another thing to worry about
http://en.wikipedia.org/wiki/WebSocket#Browser_support

That will change

Crell's picture

During Drupal 8's life time, Websockets will become more well-supported. When they do, we want Drupal to be the go-to server-side component for them. That may be something we do in contrib, or via bridging to node.js, or whatever, but we should be forward thinking about it. For once, we want to get out in front of the market if we can. :-)

bcmiller0's picture

Look at http://socket.io/#faq , not strictly websockets but trys to be more and also support whatever the browser can do. Has MIT license https://github.com/LearnBoost/socket.io. ALso looks to work well with nginx and other setups. Looks like warrents some investigation. I agree Drupal normal webserver doesn't look like the best way to deal with websocket connects. This would allow using an eventd type setup for the messaging and leaving the smarts in Drupal proper. The node.js drupal project seems to have merit in this regard.