Development of applications that rely on application-layer protocols like HTTP and FTP is not overly complex, but it's also not trivial. Further, it's not really the focus of the application because in the majority of cases, what's above the protocol is what's actually important. That's what makes libcurl so interesting, as it places the focus on the application instead of the variable aspect of your development. Note that few applications develop their own TCP/IP stack, so in the same vein, reusing what you can minimizes your development schedule and increases the reliability of your application.
This article begins with a short introduction to application-layer protocols, then jumps into cURL, libcurl, and an exploration of their use.
Web protocols
Building applications today is considerably different from the recent past. Today, applications are expected to communicate over a network or the Internet — present a network API or interface for human consumption — and also to be flexible through user scripting. Modern applications commonly export a Web interface using HTTP and provide notification of alarms through Simple Mail Transport Protocol (SMTP). These protocols allow you to point a Web browser at the device for configuration or status and receive standard e-mail from the device to your typical e-mail client (HTTP and SMTP, respectively).
These Web services are typically built on top of the socket layer of the networking stack (see Figure 1). The socket layer implements an API that originated in the Berkeley Software Distribution (BSD) operating system and abstracts the details of the underlying transport and networking-layer protocols.
Figure 1. Networking stack and libcurl
Web services occur in protocol conversations between a client and a server. In the context of HTTP, the server is the end device and the client is the browser at the endpoint. For SMTP, the server is the mail gateway or endpoint user, and the client is the end device. In some cases, the protocol conversation occurs in two steps (request and response), but in others, there's substantially more traffic to negotiate and communicate. This negotiation can create a considerable amount of complexity, which can be abstracted through an API, such as libcurl.
Introduction to cURL
cURL was originally designed as a way to move files between endpoints using different protocols, such as FTP, HTTP, SCP, and others. It started as a command-line utility but is now also a library with bindings to more than 30 languages. So now instead of just using cURL from the shell, you can build applications that incorporate this important functionality. The libcurl library is also portable, supporting Linux®, IBM® AIX® operating system, BSD, Solaris, and many other UNIX® variants.
Getting and installing cURL/libcurl
Getting and installing libcurl is simple, depending upon what Linux
distribution you run. If you run Ubuntu, you can easily install these packages
with apt-get
. The two following lines illustrate how to install libcurl
and the Python bindings for libcurl:
$ sudo apt-get install libcurl3 $ sudo apt-get install python-pycurl
The apt-get
utility ensures that any dependencies are satisfied in the
process.
cURL on the command line
cURL began as a command-line tool for performing data transfer using Uniform Resource Locator (URL) syntax. Given its popularity on the command line, a library to integrate the behavior into applications was created. Today, the command-line cURL is a wrapper over the cURL library. This article starts by exploring cURL on the command line, then digs into its use as a library.
Two of the typical uses of cURL are file transfers using the HTTP and FTP protocols. cURL provides a simple interface to these protocols and others. To get a file from a Web site using HTTP, you simply tell cURL a local file name into which you want the Web page to be written and a URL for the Web site and file to be retrieved. That's a lot of words for the simple command line shown in Listing 1.
Listing 1. Example use of cURL to retrieve a file from a Web site
$ curl -o test html www.exampledomain.com % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 43320 100 43320 0 0 55831 0 --:--:-- --:--:-- --:--:-- 89299 $
Note that because I specify the domain, but
not a file, I'll get the root file (index.html). To move this file
to an FTP site using cURL, specify the file to upload using the
-T
option, then provide a URL for the FTP site and
path to file.
Listing 2. Example use of cURL to upload a file to an FTP site
$ curl -T test.html ftp://user:[email protected]/ftpdir/ % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 43320 0 0 100 43320 0 38946 0:00:01 0:00:01 --:--:-- 124k $
Could it be much simpler? After you learn some of the patterns, cURL is fairly easy
to use. But the breadth of options available to you is large — requesting help
from the command line from cURL (using --help
) results in 129 lines of options.
While that's not huge, there are a large number of options controlling anything from
verbosity to security and a variety of protocol-specific configurable items.
From a developer's perspective, this isn't the most exciting aspect of cURL. Dig into the cURL library to see how you can add these file transfer protocols to your applications.
cURL as a library
If you've watched scripting languages over the past 10 years, you've noticed a distinct change in their makeup. Scripting languages like Python, Ruby, Perl, and many others include not only a sockets layer, which you can find in C or C++, but also application-layer protocol APIs. These scripting languages incorporate higher-level functionality that make creating an HTTP server or client, for example, trivial. The libcurl library adds similar functionality to languages like C and C++, but it does so in a way that's portable across myriad languages. You'll find roughly equivalent behaviors for libcurl in all the languages supported by it, though because these languages can differ greatly (consider C and Scheme), the way they provide the behaviors can also differ.
The libcurl library encapsulates the behavior illustrated in Listings 1 and 2) in an API form so it can be used by high-level languages (more than 30 today). This article provides two examples of libcurl. The first explores a simple HTTP client in C (suitable for building Web spiders), and the second is a simple HTTP client in Python.
C-based HTTP client
The C API provides two APIs over the libcurl functionality. The easy interface is a simple API that's synchronous (meaning when you call libcurl with your request, it satisfies it until complete or an error occurs). The multi-interface provides more control over libcurl, allowing your application to perform multiple simultaneous transfers and to control where and when libcurl moves data.
This example uses the easy interface. This API still gives you some control over the data movement process (using callbacks), but lives up to its name. Listing 3 provides the C language example for HTTP.
Listing 3. C HTTP client using libcurl's easy interface
#include <stdio.h> #include <string.h> #include <curl/curl.h> #define MAX_BUF 65536 char wr_buf[MAX_BUF+1]; int wr_index; /* * Write data callback function (called within the context of * curl_easy_perform. */ size_t write_data( void *buffer, size_t size, size_t nmemb, void *userp ) { int segsize = size * nmemb; /* Check to see if this data exceeds the size of our buffer. If so, * set the user-defined context value and return 0 to indicate a * problem to curl. */ if ( wr_index + segsize > MAX_BUF ) { *(int *)userp = 1; return 0; } /* Copy the data from the curl buffer into our buffer */ memcpy( (void *)&wr_buf[wr_index], buffer, (size_t)segsize ); /* Update the write index */ wr_index += segsize; /* Null terminate the buffer */ wr_buf[wr_index] = 0; /* Return the number of bytes received, indicating to curl that all is okay */ return segsize; } /* * Simple curl application to read the index.html file from a Web site. */ int main( void ) { CURL *curl; CURLcode ret; int wr_error; wr_error = 0; wr_index = 0; /* First step, init curl */ curl = curl_easy_init(); if (!curl) { printf("couldn't init curl\n"); return 0; } /* Tell curl the URL of the file we're going to retrieve */ curl_easy_setopt( curl, CURLOPT_URL, "www.exampledomain.com" ); /* Tell curl that we'll receive data to the function write_data, and * also provide it with a context pointer for our error return. */ curl_easy_setopt( curl, CURLOPT_WRITEDATA, (void *)&wr_error ); curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, write_data ); /* Allow curl to perform the action */ ret = curl_easy_perform( curl ); printf( "ret = %d (write_error = %d)\n", ret, wr_error ); /* Emit the page if curl indicates that no errors occurred */ if ( ret == 0 ) printf( "%s\n", wr_buf ); curl_easy_cleanup( curl ); return 0; }
At the top are the necessary include
files, including the cURL root file. Next, I
define a couple variables for the transfer. The first,
wr_buf
, represents the buffer where the incoming data will be written.
wr_index
represents the current write index to
the buffer.
Jump down to the main
function, which
performs the setup using the easy API. All cURL calls operate through a handle
that maintains state for the particular request. This is defined as a
CURL
pointer reference. This example also creates a special
return code called CURLcode
. Before using any
libcurl functions, you need to call curl_easy_init
to
get the CURL
handle. Next, notice a number of
curl_easy_setopt
calls. These configure the handle for
a particular operation. For these calls, you provide the handle, a command, and an
option. First, this example uses CURLOPT_URL
to specify the URL
to retrieve. Next, it uses CURL_WRITEDATA
to provide a context variable (in this case, it's the internal write error
variable). Finally, it uses CURLOPT_WRITEFUNCTION
to
specify the function that should call when data is available. The API
will call this function one or more times with data it has read after you instruct
it to start.
To kick off the transfer, call curl_easy_perform
.
Its job is to perform the transfer given the prior configuration. When you call
this function, it will not return until the transfer is satisfied or an error
occurs. The final elements of main
are to emit the return statuses, emit the page
read, and, finally, clean up using curl_easy_cleanup
(when
you're done with the handle).
Now look at the write_data
function. This
function is a callback called as data is received for the particular
operation. Note that while you're reading data from the Web site, the data is
written to you (write_data
). The callback is
provided with a buffer (containing the data available), the number of members and
their size (the product being the total data available in the buffer), and the
context pointer. The first task is to ensure that the buffer
(wr_buf
) has sufficient space for the write data. If
not, it sets the context pointer and returns zero, indicating that there was a
problem. Otherwise, it copies the data from the cURL buffer into your buffer and
increments the index to point to the next location in which to write. This example also
terminates the string so you can use printf
on it
later. Finally, it returns the number of bytes that were operated on to libcurl.
This tells libcurl that the data was ingested, and it can discard that data.
And that's it — a relatively simple way to read a file from a Web site into
memory.
Python-based HTTP client
This section provides an example similar to the C-based HTTP client but in Python. Python is a useful object-oriented scripting language that's great for prototyping and building production software. The example assumes you have some familiarity with Python, but uses very little of it, so not much is expected.
The simple Python HTTP client using pycurl
is shown in
Listing 4.
Listing 4. Python HTTP client using libcurl's pycurl
interface
import sys import pycurl wr_buf = '' def write_data( buf ): global wr_buf wr_buf += buf def main(): c = pycurl.Curl() c.setopt( pycurl.URL, 'http://www.exampledomain.com' ) c.setopt( pycurl.WRITEFUNCTION, write_data ) c.perform() c.close() main() sys.stdout.write(wr_buf)
This one is considerably simpler than the C
version. It begins by importing the necessary modules (sys
for standard system
module and the pycurl
module). Next, it defines the
write buffer (wr_buf
). As in the C program, I declare
a write_data
function. Note that this function takes a
single argument: the data buffer read from the HTTP server. I simply take that
buffer and concatenate it to the global write buffer. The
main
function starts by creating a Curl
handle, then uses the setopt
methods to define the
URL
and WRITEFUNCTION
for
the transaction. It calls the perform
method to start
the transfer and closes the handle. Finally, it calls the
main
function and emits the write buffer to stdout
.
Note that in this case, you don't need the error-context pointer because you're using
Python string concatenation, which means you don't use a statically sized
string.
Going further
This article hardly scratches the surface of libcurl, given the vast number of protocols and languages it supports. But hopefully, this shows how simple it is to build applications that use application-layer protocols like HTTP. The libcurl Web site (see Resources) provides a large number of examples and a considerable amount of useful documentation. So next time you're developing a Web browser, spider, or other application that has application-layer protocol requirements, give libcurl a try. It will certainly cut down your development time and bring joy back to coding.
Resources
Learn
- cURL is a command-line tool and library that implements a variety of client-side protocols. It supports more than 12 protocols including FTP, HTTP, Telnet, and their secure variants. You'll find cURL on a number of platforms, including Linux, AIX, BSD, and Solaris, supporting more than 30 languages.
- PycURL is a thin layer over the libcurl API. As a thin layer, PycURL is extremely fast. With PycURL, you can develop Python applications using the libcurl library.
- Speaking of application flexibility, you can learn more about integration of scripting capabilities into your application in "Scripting with Guile."
- To listen to interesting interviews and discussions for software developers, check out developerWorks podcasts.
- Stay current with developerWorks' Technical events and webcasts.
- Follow developerWorks on Twitter.
- Check out upcoming conferences, trade shows, webcasts, and other Events around the world that are of interest to IBM open source developers.
- Visit the developerWorks Open source zone for extensive how-to information, tools, and project updates to help you develop with open source technologies and use them with IBM's products.
- Watch and learn about IBM and open source technologies and product functions with the no-cost developerWorks On demand demos.
Get products and technologies
- Innovate your next open source development project with IBM trial software, available for download or on DVD.
- Download IBM product evaluation versions or explore the online trials in the IBM SOA Sandbox and get your hands on application development tools and middleware products from DB2®, Lotus®, Rational®, Tivoli®, and WebSphere®.
Discuss
- Participate in developerWorks blogs and get involved in the developerWorks community.
Comments
Dig deeper into Open source on developerWorks
-
Bluemix Developers Community
Get samples, articles, product docs, and community resources to help build, deploy, and manage your cloud apps.
-
developerWorks Weekly Newsletter
Keep up with the best and latest technical info to help you tackle your development challenges.
-
DevOps Services
Software development in the cloud. Register today to create a project.
-
IBM evaluation software
Evaluate IBM software and solutions, and transform challenges into opportunities.