Indexes and metadata

Introduction

Indexing is action to make object data searchable. Plone stores available indexes in the database. You can create them through-the-web and inspect existing indexes in portal_catalog on Index tab.

Indexes and metadata

portal_catalog does subset of object field as a copy and makes them searchable.

  • Indexes make content searchable: Indexes are are stored values which are used to match queries. Indexed might be preprocessed to make the matching possible. For example, full text search indices run incoming text output through splitters and such filters to generate fast searchable data out of it.
  • Metadata make content summariable: Metadata, also known as columns, are stored values which can be displayed to the user with the search hit. They usually copy the field value as is.

Metadata can exist without index and vice versa.

See also

Viewing indexed data

You can do this through portal_catalog tool in ZMI.

  • Click portal_catalog in the portal root
  • Click Catalog tab
  • Click any object

Index types

Zope 2 product PluginIndexes defines various portal_catalog index types used by Plone.

  • FieldIndex stores values as is
  • DateIndex and DateRangeIndex store dates (Zope 2 DateTime objects) in searhable format. The latter provides ranged searches.
  • KeywordIndex allows keyword-style look-ups (query term is matched against the all values of a stored list)
  • ZCTextIndex is used for full text indexing
  • ExtendedPathIndex is used for indexing content object locations.

Default Plone indexes and metadata columns

Some interesting indexes

  • start and end: Calendar event timestamps, used to make up calendar portlet
  • sortable_title: Title provided for sorting
  • portal_type: Content type as it appears in portal_types
  • Type: Translated, human readable, type of the content
  • path: Where the object is (getPhysicalPath accessor method).
  • object_provides: What interfaces and marker interfaces object has. KeywordIndex of interface full names.

Some interesting columns

  • getRemoteURL: Where to go when the object is clicked
  • getIcon: Which content type icon is used for this object in the navigation
  • exclude_from_nav: If True the object won’t appear in sitemap, navigation tree

Indexing an object

Warning

Unit test warning: Usually Plone reindexes modified objects at the end of each request (each transaction). If you modify the object yourself you are responsible to notify related catalogs about the new object data.

Indexing an object is done by calling reindexObject() method. reindexObject() method is defined in ICatalogAware interface.

Plone calls reindexObject() if

  • The object is modified by the user using the standard edit forms

You must call reindexObject() if you

  • Directly call object field mutators
  • Otherwise directly change object data

reindexObject() method takes optional argument idxs which will list the changed indexes. If idxs is not given, all related indexes are updated even though they were not changed.

Example:

object.setTitle("Foobar")

# Object.reindexObject() method is called to reflect the changed data in portal_catalog.
# In our example, we change the title. The new title is not updated in the navigation,
# since the navigation tree and folder listing pulls object title from the catalog.

object.reindexObject(idxs=["Title"])

Also, if you modify security related parameters (permissions), you need to call reindexObjectSecurity().

Custom index methods

** Version warning:** Available since Plone 3.3.

plone.indexer provides method to create custom indexing functions.

import Missing

from plone.indexer.decorator import indexer

# indexer decorator matches all objects against a marker interface before being run
@indexer(IConvergenceSupport)
def getContentMedias(object, portal, **kw):
    """ Provide indexing hooksk for portal_catalog """

    if IConvergenceSupport.providedBy(object):

        schema = object.Schema()

        if not "contentMedias" in schema:
            # Missing.Value must be returned if the indexing
            # cannot be complete for the object
            return Missing.Value
        else:
            filter = getUtility(IConvergenceMediaFilter)
            return filter.getContentMedia(object)

TextIndexNG3

TextIndexNG3 is advanced text indexing solution for Zope.

Please read TextIndexNG3 README.txt regarding how to add support for custom fields. Besides installing TextIndexNG3 in GenericSetup XML you need to provide a custom indexing adapter.

# Add TextIndexNG3 in catalog.xml. Example:

<index name="getYourFieldName" meta_type="TextIndexNG3">

  <field value="getYourFieldName"/>

  <autoexpand value="off"/>
  <autoexpand_limit value="4"/>
  <dedicated_storage value="False"/>
  <default_encoding value="utf-8"/>
  <index_unknown_languages value="True"/>
  <language value="en"/>
  <lexicon value="txng.lexicons.default"/>
  <query_parser value="txng.parsers.en"/>
  <ranking value="True"/>
  <splitter value="txng.splitters.simple"/>
  <splitter_additional_chars value="_-"/>
  <splitter_casefolding value="True"/>
  <storage value="txng.storages.term_frequencies"/>
  <use_normalizer value="False"/>
  <use_stemmer value="False"/>
  <use_stopwords value="False"/>
</index>

# Create adapter which will add TextIndexNG3 indexing support for your custom fields. Example:

import logging

from Products.TextIndexNG3.adapters.cmf_adapters import CMFContentAdapter
from zope.component import adapts

logger = logging.getLogger("Plone")

class TextIndexNG3SearchAdapter(CMFContentAdapter):
    """ Adapter which provides custom field specific index information for TextIndexNG3
    """

    # Your content marker interface here
    adapts(IDescriptionBase)

    def indexableContent(self, fields):
        """ Produce TextIndexNG3 indexing information for the object

        Traceback::

              ZCatalog.py(536)catalog_object()
            -> update_metadata=update_metadata)
              Catalog.py(360)catalogObject()
            -> blah = x.index_object(index, object, threshold)
              Products/TextIndexNG3/TextIndexNG3.py(91)index_object()
            -> result = self.index.index_object(obj, docid)
              Products/TextIndexNG3/src/textindexng/index.py(114)index_object()
            -> default_language=self.languages[0])
              Products/TextIndexNG3/src/textindexng/content.py(99)extract_content()
            -> icc = adapter.indexableContent(fields)
            > indexableContent()

        """
        logging.debug("Indexing" + str(self.context))

        # Use superclass to construct generic field adapters (id, title, description, SearchableText)
        icc = CMFContentAdapter.indexableContent(self, fields)

        # These fields have their own TextIndexNG3 indexes which
        # are queried separately from SearchableText
        accessors = [ "getClassifications", "getOtherNames" ]

        for accessor in accessors:

            try:
                method = getattr(self.context, accessor)
            except AttributeError:
                logger.warn("Declared indexing for unsuppoted accessor:" + accesor)
                continue

            value = method()

            # We might have a value which is not a real string,
            # but must be first stringified
            try:
                value = unicode(value)
            except UnicodeDecodeError, e:
                # Bad things happen here?
                logger.warn("Failed to index field:" + accessor)
                logger.exception(e)
                continue

            # Convert value to text format (utf-8) expected
            # by the indexer
            text = self._c(value)

            icc.addContent(accessor, text, self.language)

        return icc

# Add adapter in your ZCML:

<adapter factory=".customcontent.TextIndexNG3SearchAdapter"/>