Data Filtering

Data filtering works by looking at the content returned by the web page rather than the request for that content. Because this requires looking at much more data, these solutions are generally less scalable and more expensive than those using URL filtering.

The most widely-used type of data filtering is keyword filtering. The keywords may be chosen by humans or by computers. Unfortunately, this approach is prone to errors. Everyone has heard how the White House home page was blocked because some filtering company had "couple" in their keyword filter list. In addition, many inappropriate sites consist entirely of pictures or have accompanying text in different languages. Keyword filters don't do well in these situations.

Some of the limitations of keyword filters can be partially overcome by using sophisticated AI techniques (most notably neural networks) to build more complex rules. However, ultimately these are still keyword filters and suffer from most of the limitations inherent to textual analysis.

A few people have tried to implement real-time or near-real-time image recognition ("nipple filters") with limited success.

Another approach to data-filtering is to embed meta-tags into the web pages or the HTTP reply headers. These self-labelling schemes have been tried by various parties (PICS, RSACi, SafeSurf) over the past several years with minimal success. This approach requires complete acceptance and compliance by all of the relevant content providers, which is extremely unlikely.