XDS

As described in this chapter, rotation data images are processed in 8 steps

which are called in succession by XDS.

Information between the steps is communicated by files, which allows repetition of selected steps with a different set of input parameters without rerunning the whole program. The files generated by XDS are either ASCII type files that can be inspected and modified by using a text editor, or binary image files in the CBF format, a byte-offset variant of the CBFlib format. Such images are indicated by the file name extension ".cbf". All files have a fixed name defined by XDS, which makes it mandatory to process each data set in a newly created directory to avoid name clashes. Clearly, one should not run more than one XDS-job simultaneously in the same directory. Also, output files generated by rerunning selected steps (see Table 1) should first be given another name if their original contents are meant to be saved.

Data processing begins by copying an appropriate input file template into the data processing directory. Input file templates are provided with the XDS package for a number of frequently used data collection facilities. The copied input file must be renamed XDS.INP and edited to provide the correct parameter values for the actual data collection experiment.

All parameters in XDS.INP are named by keywords containing an equal sign as the last character, and many of them will be mentioned here in context to clarify their meaning. Execution of XDS invokes in succession each of the 8 program steps described below - or a subset of the steps named in the parameter JOB=. Results and diagnostics from each step are saved in files with the extension .LP attached to the program step name. These files should always be studied carefully to see whether processing was satisfactory or - in case of failure - to find out what could have gone wrong.


XYCORR

calculates lookup tables of spatial corrections for each detector pixel which are stored in the files X-CORRECTIONS.cbf and Y-CORRECTIONS.cbf . In subsequent data processing steps, when the true coordinates of a pixel with respect to the laboratory coordinate system are needed, the correction values for the X- and Y-coordinates are retrieved from the tables and added to the pixel's array coordinates in the data image.

Dependent on the detector, XYCORR computes the spatial corrections in different ways.

Problems:


INIT

determines three lookup tables, saved as files BLANK.cbf, GAIN.cbf, and BKGINIT.cbf, that are required by the subsequent processing steps for classifying pixels in the data images as background or belonging to a diffraction spot ('strong' pixels).

Problems:
Some detectors with insufficient protection from electromagnetic pulses may generate badly spoiled images whose inclusion leads to a completely wrong X-ray background table. These images can be identified in INIT.LP by their unexpected high mean pixel contents, and this step should be repeated with a different set of images.


COLSPOT

locates strong diffraction spots occurring in a subset of the data images and saves their centroids on the file SPOT.XDS.

The data subset is defined by image number ranges where each range is specified by the keyword SPOT_RANGE=. Corrupted images can be exluded by using the input parameter EXCLUDE_DATA_RANGE=. COLSPOT identifies 'strong' pixels ( STRONG_PIXEL=) that are not in the background region ( BACKGROUND_PIXEL=). If the total number of 'strong' pixels occuring in the specified data images exceeds the upper limit as given by the input parameter MAXIMUM_NUMBER_OF_STRONG_PIXELS=, the weaker ones are discarded. Spots are defined as sets of 'strong' pixels adjacent in three dimensions. A spot is accepted if it contains a minimum number of 'strong' pixels ( MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=) and if the spot centroid is sufficiently close to the location of the strongest pixel in the spot ( SPOT_MAXIMUM-CENTROID=).

Problems:
Sharp edges like ice rings in the images can lead to an excessive number of 'strong' pixels erroneously classified as contributing to diffraction spots. These aliens could prevent IDXREF to recognize the crystal lattice.


IDXREF

uses the initial parameters describing the diffraction experiment as provided by XDS.INP and the observed centroids of the spots from the file SPOT.XDS to find the orientation, metric, and symmetry of the crystal lattice. IDXREF refines all or a specified subset of these parameters (input parameter REFINE(IDXREF)=). On return, the complete set of parameters are saved in the file XPARM.XDS, and the original file SPOT.XDS is replaced by a file of identical name - now with indices attached to each observed spot. Spots not belonging to the crystal lattice are given indices 0,0,0. XDS considers the run successful if the coordinates of at least 50% of the given spots can be explained with reasonable accuracy ( MINIMUM_FRACTION_OF_INDEXED_SPOTS=, MAXIMUM_ERROR_OF_SPOT_POSITION=); otherwise XDS will stop with an error message. Alien spots often arise because of the presence of ice or small satellite crystals, and continuation of data processing may still be meaningful. In this case, XDS is called again with an explicit list of the subsequent steps specified in XDS.INP (input parameter JOB=).

To determine a crystal lattice that explains the observed locations of the diffraction spots listed in file SPOT.XDS, IDXREF proceeds as follows.

  1. A subset of spots is selected from file SPOT.XDS that are used to find the orientation, metric, and symmetry of the crystal lattice. This selection is controled via the parameters SPOT_RANGE=, TRUSTED_REGION=, INCLUDE_RESOLUTION_RANGE=, and EXCLUDE_RESOLUTION_RANGE=.
  2. The laboratory coordinates of the diffracted beam wave vector (normalized to 1/λ) that produced the spot at pixel coordinates IX, IY are calculated by using the input parameter values that describe the mapping (see Detector coordinate system).
  3. Subtraction of the incident beam wave vector (determined from the input parameter values INCIDENT_BEAM_DIRECTION= and X-RAY_WAVELENGTH=) from the diffracted beam wave vector results in the corresponding reciprocal lattice vector when the Laue equations are satisfied.
  4. The reciprocal lattice vector for the unrotated crystal is then found from the centroid of image numbers of the spot (as given in SPOT.XDS) and the input parameter values ROTATION_AXIS=, OSCILLATION_RANGE=, STARTING_ANGLE=, and STARTING_FRAME=.
  5. Differences between any two such reciprocal lattice vectors which are above a specified minimal length ( SEPMIN=) are then accumulated in a 3-dimensional histogram. These difference vectors will form clusters in the histogram since there are many different pairs of reciprocal lattice vectors of nearly identical vector difference.
  6. The clusters are found as maxima in the smoothed histogram (CLUSTER_RADIUS=), and a basis of three linear independent cluster vectors is selected that allows all other cluster vectors to be expressed as nearly integral multiples of small magnitude with respect to this basis. The basis vectors and the 60 most populated clusters with attached indices are listed in IDXREF.LP. If many of the indices deviate significantly from integral values, the program is unable to find a reasonable lattice basis and all further processing will be meaningless.
  7. If space-group and cell constants or unit cell basis vectors are specified (input parameters SPACE_GROUP_NUMBER=, UNIT_CELL_CONSTANTS=, UNIT_CELL_A-AXIS=, UNIT_CELL_B-AXIS=, UNIT_CELL_C-AXIS=), the reciprocal basis vectors found above are interpreted by the given cell; otherwise, a reduced triclinic cell is determined directly from the reciprocal basis. Parameters of the reduced cell, coordinates of the reciprocal basis vectors, and their indices with respect to the reduced cell are reported in IDXREF.LP.
  8. Based on the orientation and metric of the reduced cell now available, IDXREF indexes up to 3,000 of the strongest spots by the local indexing method. This method considers each spot as a node of a tree and identifies the largest subtree of nodes which can be assigned reliable indices. The number of reflections in the ten largest subtrees is reported and usually shows a dominant first tree corresponding to a single lattice, whereas alien spots are found in small subtrees. Input parameters that control the local indexing are INDEX_ERROR=, INDEX_MAGNITUDE=, INDEX_QUALITY=.
  9. Reflections in the largest subtree are used for initial refinement of the basis vectors of the reduced cell, the incident beam wave vector, and the origin of the detector, which is the point in the detector plane nearest to the crystal.
  10. After initial refinement based on the reflections in the largest subtree, all spots which can now be indexed are included.

IDXREF uses the refined metric parameters of the reduced cell for testing each of the 44 possible lattice types (Kabsch, 1993). For each lattice type IDXREF reports the likelihood of being correct and the conventional cell parameters. IDXREF concludes with an overview of possible lattice symmetries (decision constants MAX_CELL_AXIS_ERROR=, MAX_CELL_ANGLE_ERROR=) but makes no automatic decision for the space-group. If the crystal symmetry is unknown, XDS will continue data processing with the crystal being described by its reduced cell basis vectors and triclinic symmetry.

Problems:


DEFPIX

recognizes regions in the initial background table (file BKGINIT.cbf) that are obscured by intruding hardware and marks the shaded pixels as untrusted. In addition, pixels outside a user-defined resolution range (INCLUDE_RESOLUTION_RANGE=) are marked and eliminated from the trusted region. The marked background table thus obtained is saved on file BKGPIX.cbf which is needed by the subsequent program steps.

For recognizing the obscured regions in the initial background, DEFPIX generates a control image (file ABS.cbf) that contains values around 10000 for unshaded pixels and lower values for shaded pixels. The classification of the pixels into reliable and untrusted ones is based on the two input parameters VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= (default: 6000 30000) and INCLUDE_RESOLUTION_RANGE= (default: 20.0 0.0). Pixels in the table ABS.cbf with a value outside the ranges specified by the two parameters are marked unreliable (by -3) in the background table BKGPIX.cbf.

Problems:
If the parameter VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= specifies a too narrow value range, "good" regions will erroneously be excluded from the trusted detector region. Check BKGPIX.cbf with the XDS-Viewer program and, if necessary, repeat the DEFPIX step with more appropriate values.


XPLAN

supports the planning of data collection. It is based upon information provided by the input files XPARM.XDS, BKGPIX.cbf, X-CORRECTIONS.cbf and and Y-CORRECTIONS.cbf which become available on processing a few test images. XPLAN estimates the completeness of new reflection data, expected to be collected for each given starting angle ( STARTING_ANGLES_OF_SPINDLE_ROTATION=) and total crystal rotation ( TOTAL_SPINDLE_ROTATION_RANGES=), and reports the results for a number of selected resolution shells ( RESOLUTION_SHELLS=) in the file XPLAN.LP. To minimize recollection of data, the name of a file can be provided by the input parameter REFERENCE_DATA_SET=, the reference data set, which contains already measured reflections.

Problems:


INTEGRATE

determines the intensity of each reflection predicted to occur in the rotation data images ( DATA_RANGE=) and saves the results on file INTEGRATE.HKL.

Corrupted images can be exluded by using the input parameter EXCLUDE_DATA_RANGE=. The diffraction parameters needed for predicting the reflection positions are initially provided by the file XPARM.XDS. These parameters are either kept constant or refined periodically using strong diffraction spots encountered in the data images. Whether refinement should be carried out at all and which parameters are to be refined can be specified by the user (input parameter REFINE(INTEGRATE)=). Centroids of the strong spots in the data images are computed from pixels that exceed the background by a given multiple of standard deviations (input parameters SIGNAL_PIXEL=, BACKGROUND_PIXEL=). Strong spots are used in the refinement if their centroids are reasonably close to their calculated position (input parameter MAXIMUM_ERROR_OF_SPOT_POSITION=).

For determination of the intensity, approximate values describing extension and form of the diffraction spot must be specified. The shapes of all spots become very similar when the contents of each of their contributing image pixel is mapped into a 3-dimensional, reflection-specific coordinate system centered on the surface of the Ewald sphere, at the terminus of the diffracted beam wave vector (Kabsch, 1988b). In this coordinate system alpha and beta span the plane tangential to the Ewald sphere with the alpha-axis perpendicular to the incident- and the diffracted beam wave vector. The gamma axis runs perpendicular to the alpha-axis and to the rotated reciprocal lattice vector representing the reflection when the Laue equations are satisfied. The number of grid points in this coordinate system used for representing the transformed reflection profile are usually chosen automatically by XDS; the user has the option to override the automatic assignment by specifying the two input parameters NUMBER_OF_PROFILE_GRID_POINTS_ALONG_ALPHA/BETA=
NUMBER_OF_PROFILE_GRID_POINTS_ALONG_GAMMA=.
The transformed spot can roughly be described as a Gaussian. Four parameters are used for this purpose:

  1. BEAM_DIVERGENCE= is twice the opening angle of a cone with the diffracted beam wave vector as cone axis. The interception of the cone with the data image traces the boundary of the spot and includes some neighbouring background pixels. The parameter value can be estimated as
    BEAM_DIVERGENCE= arctan(spot_diameter/detector_distance).
  2. BEAM_DIVERGENCE_E.S.D.= characterizes the Gaussian spot shape by its standard deviation.
  3. REFLECTING_RANGE= is the approximate rotation angle required for a strong spot recorded perpendicular to the rotation axis to pass completely through the Ewald sphere.
  4. REFLECTING_RANGE_E.S.D.= is the standard deviation of the Gaussian intensity distribution when the reflection is rotated through the Ewald sphere on shortest route. This is also defined as the mosaicity of the crystal.

All of the four parameters describing shape and extension of the spots can be determined automatically from the data images.

Integration is carried out by a two-step procedure. In the first pass, spot templates are generated by superimposing profiles of fully recorded, strong reflections, and all grid points with a value above a minimum percentage of the maximum in the template ( CUT=) are defined as elements of the integration domain. To allow for variations of their shape, profile templates are generated from reflections located at nine regions of equal size covering the detector surface and additional sets of nine to cover equally-sized ( DELPHI=) batches of images. The actual integration is carried out in the second pass by profile fitting with respect to the spot shape determined in the first pass. Reflections with less than MINPK= % of observed reflection intensity will be discarded. Otherwise, the missing intensity is estimated from the learned reflection profiles.
On return from the INTEGRATE step, the data image last processed with all expected spots encircled is saved in the file FRAME.cbf for inspection with help of the XDS-Viewer program.

Problems:


CORRECT

applies correction factors to intensities and standard deviations of all reflections found in the file INTEGRATE.HKL, determines the space group if unknown and refines the unit cell constants, reports the quality and completeness of the data set, and saves the final integrated intensities on the file XDS_ASCII.HKL.

CORRECT accepts reflections from file INTEGRATE.HKL that are

Thus, the user has the option to exclude unreliable reflections from the final data set by repeating the CORRECT step with appropriate parameter values.

Intensities of the accepted reflections are first corrected for effects due to polarization of the incident beam (parameters FRACTION_OF_POLARIZATION=, POLARIZATION_PLANE_NORMAL=) and absorption effects (parameter AIR=, SILICON=, SENSOR_THICKNESS=) arising from differences in path lengths of the diffracted beam. These corrections do not depend on knowledge of the space group.

The integrated intensities of the reflections on file INTEGRATE.HKL may or may not have been indexed in the correct space group; for the purpose of integration it is important only that all reflections occurring in the data images have been located exactly and indexed using some unit cell basis. The correct reflection indices in the true space group are always a linear transformation of the original indices used in INTEGRATE.HKL. All lattices consistent with the locations of the reflections saved in INTEGRATE.HKL (decision parameters MAX_CELL_AXIS_ERROR=, MAX_CELL_ANGLE_ERROR=) and their corresponding linear transformations are printed to provide a useful overview similar to the one shown in IDXREF.LP.

If the space group is not specified, XDS proposes one of the enantiomorphous space groups without screw axes that is compatible with the observed lattice symmetry and explains the intensities of a subset of the reflections (TEST_RESOLUTION_RANGE=) at an acceptable Rmeas (Diederichs and Karplus, 1997) using a minimum number of unique reflections. The criteria for an acceptable Rmeas are controlled by the decision parameters MIN_RFL_Rmeas=, and MAX_FAC_Rmeas=.

The user can always override the automatic decisions by specifying the correct space group number (SPACE_GROUP_NUMBER=) and unit cell constants (UNIT_CELL_CONSTANTS=) in XDS.INP and repeating the CORRECT step. This provides a simple way to rename orthorhombic cell constants if screw axes are present.
In addition, the user has the option to specify in XDS.INP

The possibility to compare the new data with a reference data set is particularly useful for resolving the issue of alternative settings of polar or rhombohedral cells (like P4, P6, R3). Also, reference data are quite useful for recognizing misindexing or for testing potential heavy-atom derivatives.

For refinement of the unit cell constants (parameter REFINE(CORRECT)=), CORRECT uses a subset of the accepted reflections, whose observed centroid is sufficiently close to the predicted spot position (parameter MAXIMUM_ERROR_OF_SPOT_POSITION=). The refined set of parameters is saved on file GXPARM.XDS which has the identical layout as file XPARM.XDS produced by IDXREF. If the crystal has not slipped during data collection, these parameters are quite accurate.

Other correction factors which partially compensate for radiation damage, absorption effects, and variations in sensitivity of the detector surface are determined from symmetry-equivalent reflections usually found in the data images. The corrections are chosen such that the integrated intensities of symmetry-equivalent reflections come out as similar as possible. The user may control application of the various corrections by specifying the parameter CORRECTIONS= by a combination of the keywords DECAY MODULATION ABSORPTION. Whether Friedel-pairs are considered as symmetry-equivalent reflections in the calculation of the correction factors depends on the values of the two parameters STRICT_ABSORPTION_CORRECTION= and FRIEDEL'S_LAW=. The number of correction factors is controlled by the input parameters MINIMUM_I/SIGMA=, NBATCH=, and REFLECTIONS/CORRECTION_FACTOR=.

The residual scatter in intensity of symmetry-equivalent reflections is used to estimate their standard deviations. Here, the initial estimate v0(I) (obtained from the INTEGRATE step) for the variance of the reflection intensity I is replaced by v(I)=a*(v0(I)+b*I^2). The two constants a and b are chosen to minimize discrepancies between v(I) and the variance estimated from sample statistics of symmetry related reflections. Based on the more realistic error estimates for the intensities, outliers are recognized by comparison with other symmetry-equivalent reflections. These outliers are included in the main output file XDS_ASCII.HKL in which they are marked by a negative sign attached to the estimated standard deviations of their intensity. Classification of a reflection as a misfit is controlled by a decision constant which has the default value of WFAC1=1.5. A lower value like WFAC1=1.0 specified by the user will lead to an increasing number of misfits and lower R-factors as outliers are not included in the reported statistics.

Data quality as a function of resolution is described by the agreement of intensities of symmetry-related reflections and quantified by the R-factors, Rsym, and the more robust indicator, Rmeas (Diederichs and Karplus, 1997). These R-factors as well as the intensities of all reflections with indices of type h 0 0, 0 k 0, and 0 0 l and those expected to be systematically absent provide important information for identification of the correct space-group. Clearly, large R-factors or many rejected reflections (MISFITS) or large observed intensities for reflections expected to be systematically absent suggest that the assumed space-group or the indexing is incorrect. The presence or absence of anomalous scatterers is specified by the parameter FRIEDEL'S_LAW=.

Finally, CORRECT analyzes the distribution of reflection intensities as a function of their resolution and reports outliers from the Wilson plot. Often these aliens arise from ice rings in the data images. To suppress the unwanted reflections from the final output file XDS_ASCII.HKL, the user copies them to a file named REMOVE.HKL in the current directory and repeats the CORRECT step.

Problems:


© 2009-2016, MPI for Medical Research, Heidelberg      Imprint.
Wolfgang.Kabsch@mpimf-heidelberg.mpg.de
page last updated: April 24, 2016