Identifiers, Authentication, and Directories:
Best Practices for Higher Education

internet2-mi-best-practices-00.html
Internet2 Middleware Initiative
9 May 2000

This document is a product of the Internet2 Middleware Initiative's Early Harvest workshop, held in Denver in September 1999, and subsequent discussions. This document is a work in progress; it is inappropriate to cite this document as anything other than a work in progress.

Identifiers

"Any problem in Computer Science can be solved with another level of indirection."
— Butler Lampson

"...except the problem of indirection complexity."
— Bob Morgan

Motivation

For the following reasons, it is increasingly important that identifiers be made coherent and consistent throughout the enterprise.

Integrating the various identifiers requires:

A campus will want, for example, enterprise-wide email addresses, netid logins, and several other identifiers. For each of these identifiers there should be explicit policies; for example, whether the identifier is permanently assigned or can be reissued, who is permitted to assign the identifier, and what applications can read or use the identifier. Given one identifier, such as a person's network ID, it is important to have effective mechanisms to obtain related identifiers, such as the person's email address or local LAN account.

Identifier Characteristics

Before we begin the discussion of specific identifiers, we note some important generic characteristics of identifiers.

Needs for Identifiers in Higher Education

As well as different kinds of people (students, faculty, staff, alumni, guests, etc.) there are likely to be several other types of objects, such as printers and groups, whose identifiers will need to coexist with people IDs in the same namespaces. For example, when sending email, it is useful to be able to use email addresses for individuals, group names, and printer names interchangeably in the To: field. Not a lot is known yet about what kinds of separate identifiers will prove best for groups and printers. For this reason, and because how non-people identifiers are used depends heavily on how people identifiers are used, people identifiers are the main type discussed in this document.

In a modern university setting, there are likely to be needs for several types of enterprise-wide people identifiers. Ten such types are discussed here; additional identifiers may be generated by local requirements. A particular identifier may fulfill several functions; for example, a name may serve as both a netid and an email address. Frequently a single ID may be used in several of the roles listed below. In such a case attention must be paid to the fact that different IDs may have different eligibility requirements. For example, if the account name is revokable but the netid is not, there may be problems. Where least-authority authorization is practiced, it is recommended that distinct identifiers be used for different roles; this also helps maintain flexibility in technologies and policies. Where least-authority is not in use, attribute-based authorization is also a reasonable approach.

Identifer Types

Identifier Relationships and Mapping

Directories (and in particular person registries — see below) are where different types of identifiers are correlated. Often having one ID permits you to get other types of IDs; for example, having an account login may make it possible for you to use an email address. It is important to understand such relationships among IDs.

One way to do this is to go through an ID mapping process, asking (often difficult) questions about the characteristics associated with each ID type to be used. It is often helpful to create an ID mapping table, with rows for the types of ID to be used and columns for relevant characteristics such as primary use, secondary use, lucency, persistence, provisioning, and who is eligible to get the ID. The lists of characteristics and ID types above provide a place to start.

This process can also shed light on the contents of the fields of an X.509 certificate, the CRUD (Create, Read, Update, Delete) matrix used to understand the various roles that individuals and organizations have as data is consolidated within a directory, the RDN for a person within a directory, and many other areas where identifiers are used.

Person Registries

A person registry is a directory or database whose primary functions are identity management, reconciliation ("Is this person the same as that person?"), and cross-indexing ("Given this person's ID on system X, find their ID on system Y.") The person registry can also serve as a reference identifier for other systems. Other types of registries, such as organization registries or group registries, may also exist; registries in general are also referred to as metadirectories. Both directory and metadirectory products often come with person registries.

Person registries come in two varieties: thick and thin. Thick person registries contain lots of details on each individual; thin person registries contain only the system-identifier pairs needed to enable you to find the details elsewhere. As reconciling identities involves not only gathering up identifier strings, but also rationalizing the expression of the interesting relationships (e.g. faculty/staff/students/alumni/affiliates), registries tend to start thin and end up thick.

The person registry may contain more people than the directories it serves; for example, guests on campus may need to be placed in the person registry, but not in enterprise directories. Typically the person registry has feeds from several sources:

The person registry is activated by a trigger event in the source systems, usually the adding of a new person to a source-system database. The feeds from these systems can be done via batch files or with interactive procedures.

The person registry has two stages in its process.

A third mode of operation is to update information held within the registry, as with a name change. Since the person registry holds very little volatile information, this is an infrequent and straightforward activity.

The person registry may be operated by a central IT organization or by a sponsoring campus unit such as the Registrar or Personnel. This unit may handle the arbitration process as well. Although there is a real cost in labor to this work, there are major institutional efficiencies to having this focused approach.

Because performance issues are not important, the person registry may be implemented as a database rather than a directory. Unexpected benefits of a person registry may include cleaning up after student information system errors, such as mistyped SSNs. Person registries should be cautious about merging two separate entries into the same real world subject. Once merged, separation is difficult.

Authentication

The three main authentication mechanisms currently in use in higher education are PKI, Kerberos, and passwords. To a first approximation, PKI is the future, Kerberos is the present, and password-based authentication, although it is the past, is likely to linger for quite some time. Some institutions have deployed challenge-response systems (e.g., S/Key, Smart Card), but, due to the expense, this has usually been done only for a small number of highly secure accounts. Experience with exotic methods of authentication (e.g., biometrics) is similarly unavailable. Ideally all routine service authentication should be done via Kerberos v5 or (once we figure out how to do it) X.509 certificates.

While some institutions have deployed X.509 and/or other PKI authentication schemes, for the most part these PKI deployments have generated the certificate, or established the trust relationship, by bootstrapping from an existing password-based authentication scheme. In an environment where people move around a lot, such as a university, the fact that certificates are currently easiest to store on machines, rather than (like passwords) with individuals, presents major problems. Everybody wants personal certificates to be associated with individuals, by means of a house-key or car-key-like device that the individual can easily retain in his or her possession at all times. So far the smartcard and similar systems that make this possible have been too expensive for widespread deployment, but recent hardware developments such as USB make this an obstacle soon to be overcome.

The shortcomings of password-based authentication are well known. However, password-based authentication is still overwhelmingly the dominant type, and is only slowly being replaced by other, superior methods. Other methods often make some use of passwords as well. For example, while using a password once in a secure, three-way authentication mechanism such as Kerberos is far superior to passing a password all the time to every service you want to talk to, poor password management practices can still be a source of problems in a Kerberized environment. For these reasons, the following discussion of authentication best practices focuses on passwords. At the same time, we recognize that passwords are bad — all of these ideas for password management are ways to make them less bad, but they'll always be bad. Campuses should strive to move away from using them.

User-side password management

Server-side password management

First password assignment

Policies: Some institutions restrict the use of an identifier/authentication pair to secure environments. Some require the use of certain identifier/authentication services for particular applications.

Directories

The problems of making directories work are closely related to the problems of making identifiers and authentication work. One of the principal functions of identifiers is to serve as indices into directories; metadirectories are important as a means of resolving questions about identifiers; X.509 and similar authentication schemes base themselves on directories. When used to provide electronic white pages and similar services, directories are also customers of authentication services. The following discussion focuses on the use of directories to provide white pages and similar services.

Campus directory structure

In the complex world of higher education, campus-wide directory services will likely have several components . The enterprise directory, usually covering a single campus, is typically published from a relational directory database. The directory database represents a join of core administrative information from student information systems, human-resources systems, campus-wide IT services (email, Web, etc.), and feeds from related departmental systems such as Alumni and Facilities. The enterprise directory is the center of things. It is the major institutional operational directory, likely used to support white pages, email addresses, account management, access controls, etc.

The enterprise directory can provide information to, and receives information from, a metadirectory for the larger organization (a state university system, for example). Registries store and reconcile identifiers; directories hold additional information about items stored in registries. Most metadirectory products include a person registry service, as well as serving to coordinate directories. Roughly, registries deal with nouns, directories add adjectives, and metadirectories add verbs. We anticipate implementations starting with person registries, with other registries being added later. There is value in aggregating and identifying IDs across a wide range of services.

The enterprise directory also provides information to a border directory. The purpose of a border directory is to publish campus personal information for the external world in a secure manner. The border directory need not be a separate directory; it could be implemented by adding appropriate access controls to the enterprise directory. Border directories may become more prevalent over the next year or two. There may also be department-specific, operating-system-specific, and application-specific directories, often operated within a LAN.

Application-specific directories can help limit the proliferation of schemas and the growth of the directory information tree in the enterprise directory. On the other hand, this minimalist approach to the enterprise directory creates more need for synchronization, and can end up creating more work than it saves. Factors to consider are how many applications need the attributes to be added, how often new attributes are expected to be added, and how long updates will be delayed by the existence of multiple directories. A single "catch-app" directory for all custom attributes is also a possibility.

Universities vary in the degree to which they centralize information in the enterprise directory. While some campuses are putting everything possible in LDAP directories and doing updates there, others do updates exclusively in the RDBMS, and mostly use the enterprise directory to hold "snapshots" taken periodically from the database. The latter approach allows the directory service managers to use standard database tools to deal with administrative issues, but makes it harder to use COTS clients for updates, as these often speak only LDAP. The "snapshot" approach also makes it necessary to write more scripts and SQL code.

Other key big-picture directory issues include dealing with legacy systems (DCE, NDS, NT), integrating current commercial applications (Peoplesoft, SAP), and ensuring that directories will work well with future applications-development platforms. Data ownership, and the more general issue of who has authority to make what changes, are also important.

Enterprise directory design and implementation

The following best practices for enterprise directories are grouped into five categories: schema, referrals and redundancy; naming; attributes; replication and synchronization; and groups.

Schema: Overall logical design, inheritance of policy and attributes, referrals to other parts of the directory tree, and delegation and synchronization are all schema issues that are centrally important to making directories work. Neither replication nor group policies have usually been primary concerns in designing directory schemas.

Naming

Attributes: Most LDAP clients do not treat multivalued attributes well (for example, taking the first value and ignoring the rest), but doing multiple fields with separate attribute names is no better. Most workarounds for this problem are kludgey. The best alternative is to build Web clients; they can display multiple values in a scroll window.

Replication and synchronization: For the most part, faster hardware has been an adequate solution for replication problems.

Groups: A very common need is to create groups, particularly of people. Examples include different types of people associated with a university (faculty, students, staff, alumni), faculty by discipline or tenure status, students by class year or dormitory, and staff by benefit plan. Creating groups is sometimes known as "people picking". There are two sets of issues, one relating to user tools and one relating to storage of information.