Google+

Skip to content

 
   Main News Page

Google searches for an enterprise space

Posted:

The Google Search Appliance packages up the company's famously accurate technology into an easy-to-use search engine for intranets and public-facing corporate sites. In our Clear Choice test of the GB-1001 model, we found that while the searching and indexing features live up to the Google name, the product lacks polish and advanced management features.

google

The appliance's honeycomb case caught our eye, but the whimsy wore off as we began to notice occasional unevenness in the appliance. For example, the appliance takes a number of minutes to start up and run its various system checks. To alert you it is done, it plays a little tune. In testing in our server room and at a collocation facility, we couldn't hear the tune over the dull roar of such environments and had to manually probe for the system's state.

The GB-1001 does not provide obvious light indicators or a small LCD screen on the unit. No on-off switch is provided, as the designer likely intended you to go through the proper shutdown procedure. We experienced an unplanned UPS failure, and upon power restoration the box recovered properly once it performed an automated rebuild of its RAID system that lasted several hours. After you do trigger shutdown through the Web administration system provided, you need to be careful not to cut power too early; otherwise, you will have the RAID rebuild wait on your hands.

We also found other polish points lacking. Within the administration system, confirmations of configuration changes didn't appear in a logical place, form fields were slightly misaligned or oddly arranged, warning messages did not appear reliably, help information was too concise or lacked good examples, result output previews didn't always work, and, in some cases, error messages lacked detail.

There were some bright spots, including clear installation documentation, color-coded cables and a built-in DHCP server that allowed us to plug in a laptop and quickly configure the network settings.

Using a Web-based GUI, your first step after installation would likely be to define a search index by indicating starting URLs, URL patterns and file types that should be recorded and discarded by the crawler. (see "How we did it" ).

According to Google, the crawler is capable of indexing 220 types of content. In our test we saw no limitation in the crawler, and found that the device tended to discover files that we were not aware of in some test data sets.

You will likely want to break up the indexed documents into different collections based upon a URL pattern. The GB-1001 allows for an unlimited number of collections.

The crawler is quite adept at dealing with secured content. It handles Secure-HTTP connections and can negotiate basic authentication, NT LAN Manager authentication, and custom cookie and form-based access. The GB-1001 can crawl content from databases, including Oracle, SQL Server, mySQL, IBM DB2 and Sybase. If you happened upon a data type the crawler cannot access, you can feed it directly to the device in an XML format.

Google does limit its appliances by document count starting with 500,000 for the base unit (for smaller deployments, use the Google Mini ). You can of course increase your license and associated hardware to build out a search infrastructure that could support millions of documents. When you size your appliance be aware that if you plan on doing direct database indexing, Google will count each record as a document, so you might chew up a license very quickly.

One aspect of the crawl process that we especially liked was the diagnostics facility, which was not only useful to understand what the crawler was doing, but it also clearly helped us isolate such indexing problems as broken links, server issues and access-denied problems.

The GB-1001 provides a great deal of flexibility for the search page and result listings. Some administrators may be happy to use the page layout helper and modify the logo and basic aspects of the search page. However, most folks will probably want to modify the results to fully integrate it into the look and feel of the site. If you are familiar with XML Stylesheet Language Transformation you can modify a near-3,000-line template that controls just about every aspect of the search form and result. If this doesn't suit you, just use the raw XML returned from the appliance and do whatever you like, including putting it into another system.

Google's approach is to implement searches in an easy-to-use "black box" fashion, which could place constraints on a private search. You turn the appliance loose, and it ranks based upon the Google algorithm. We were pleased that the accuracy of the test search lived up to what we see in everyday use of the Google Internet search. It easily found buried test phrases and correctly identified primary documents.

The GB-1001 provides features to massage the results; unfortunately, some are a bit limited or not well documented. The most valuable feature for search customization is the KeyMatch configuration, which allows you to define keywords, phrases and exact queries. The latter returns up to three matches, or five if you dig to find out about a setting change. The Synonym setting provides a useful way to suggest alternate search terms triggered by the original query. It is also possible to create filters against the domain in which a document is found, the language a document is written, the file type it was created or the meta tag it was given. The meta tag facility, if carefully applied, can provide a rich system to slice indexed data in a variety of ways: by author, owner, or rating, for example.

Various front-end and search-result features we tested took an unpredictable length of time to register our changes. If you add synonyms, keyword matches or a variety of other template changes, you typically can't see the result right away. You must be patient if you like to tinker.

In terms of performance, the GB-1001 appliances start at around 300 queries per minute (vs. the Mini's rate of 60 queries per minute [see story ]). Our test verified that the Google Search Appliance unit was roughly four times faster than the lower-end unit. We were able to increase response time past 1 second per query under heavy load well beyond 300 queries per minute, but we did not see any drop-off that would suggest the device did not perform to specification.

The GB-1001 provides monitoring facilities, including graphs on queries per second, an event log detailing basic system activity, and a device health report. The device is also SNMP -capable and provides MIB for basic monitoring of device health, crawler status, index size and query rates.

The most valuable reports we found outlined the number of searches over time and the common keywords and queries. Many corporate Webmasters pay a surprising lack of attention to search activity, despite the great insight it provides into customer intention, so we are glad to see Google making this data easily available to its appliance customers. For those looking for more than these standard reports provided, the GB-1001 offers search logs in a common log format, useful for crunching in Web log analysis or standard reporting systems. We would add in this category some indication of user click rates on various search terms, though with a little bit of work you could collect that data.

Security on the GB-1001 is a mixed bag. Google states emphatically that the box is secured because it comes with a built-in firewall allowing access on permitted ports only. Beyond this lone measure, we found a disturbing security posture in place.

The security setup for the GB-1001's administration environment is weak. It's strange that the device allows you to create users and delegate administrative authority, but the Web-based administration system does not provide any enforcement on password strength or length, even allowing single-letter passwords. Couple this with the fact that the appliance does not limit password attempts, which means that it's vulnerable to brute-force password-guessing tools. The GB-1001 will note logon failures in its event log, but provides little to work with other than IP address and full-event logging. There are no SSL requirement to access the administrative back end and no restrictions to IP range or domain.

The GB-1001 has its rough edges notably in hardware design, administration and security. However, the overall ease of use and the power of the Google search algorithm dwarf the limitations of the appliance. For companies looking for a powerful yet easy to administer search facility, the Google Search Appliance gets a fairly high ranking.

How we did it

We tested Google Search Appliance (GB-1000) within an intranet and on public Web sites and indexed production sites with up to around 20,000 documents each. A full implementation test was performed on a multinational and multilingual Web site with just less than 20 sub-collections with server farms around the world.

Search pages were customized to test integration with existing site styles. Load verification was done using inexpensive load generation tools (www.loadtestingtool.com) given the required load to prove the devices operated as specified did not warrant more sophisticated tools.

About PINT

Headquartered in San Diego since 1994, PINT Inc. (http://www.pint.com ) is a nationally recognized interactive Web agency providing web strategy, interactive design, development, user experience, analytics, search marketing, and optimization to global companies and institutions. PINT founder Thomas Powell is the author of eleven best-selling industry textbooks on HTML and Web design. Clients include San Diego Chargers, ViewSonic, Hewlett-Packard, Allergan, Biogen Idec, UCSD, Linksys, Scripps Health, and USC. For updates and information about PINT and the Web, please subscribe to the PINT blog at http://blog.pint.com and follow PINT on Twitter at http://twitter.com/PINTSD