Microsoft SharePoint Online Connector

For a general introduction to the connector, please refer to https://www.rheininsights.com/en/connectors/sharepoint-online.php .

Entra Id Configuration

Creation of Private and Public Key

The connector uses a public key to authenticate against the Graph and SharePoint APIs. These keys are created by an administrator and not by the connector or Azure. You will need to upload the public key in one of the steps below.

A private and public key combination can be created by using openssl as follows

openssl req -newkey rsa:2048 -new -nodes -x509 -days 3650 -keyout privatekey.pem -out publickey.pem

where the resulting publickey.pem is the public key and privatekey.pem is the private key.

Application Registration

The connector acts as an Entra Id application. This application must be registered as follows

  1. Navigate to https://portal.azure.com

  2. Open Entra Id

  3. Open App registrations

    image-20241003-132245.png

  4. Click on New registration

    image-20241003-132321.png

  5. Give it a name

    image-20241003-132359.png

  6. Click on Register

  7. Click on API permissions

    image-20241003-132431.png
    1. Add a Permission

    2. Click on Microsoft Graph

      image-20241003-132515.png

    3. Choose Application Permissions

      image-20241003-132541.png

    4. Please search for the following permissions and check the respective boxes:

      1. Search for User.Read.All and check the Box

      2. Search for Group.Read.All and check the box

      3. Sites.FullControl.All (needed for accessing the SharePoint Online Site permissions and secure search)

      4. Sites.Read.All

    5. Click on Add permissions

      image-20241003-133744.png
    6. Then click on Add a permission again

    7. Then choose Microsoft SharePoint

      Adding SharePoint permissions to an application

    8. Choose Application Permissions

    9. Choose Sites.FullControl.All

      Permission needed to get reasonable document ACLs
    10. Click on Add permission

    11. Grant admin consent for all chosen permissions

  8. Now open certificates and secrets

  9. Click on Certificates

  10. Here click on Upload certificate and upload the public key, as generated above

    Upload dialog for certificates which are used to authenticate the connector
  11. Click on add

  12. Now, finially click on Overview and make a note of client Id and tenant Id

    image-20241003-132937.png

Content Source Configuration

The content source configuration of the connector comprises the following mandatory configuration fields.

SharePoint Online Connector Configuration

Within the connector’s configuration please add the following information:

  1. Tenant Id. Is the tenant Id information from the steps above.

  2. Client Id. Is the client Id information from the steps above.

  3. SharePoint base URL is the base URL of your tenant, e.g. https://company.sharepoint.com

  4. Private key. Here you need to upload your private key which you generated in the steps above

  5. Public key. Here you need to upload your public key, which you generated in the steps above.

  6. Rate limit: You can furthermore reduce the number of API calls per second.

  7. Index One Drives: If turned on OneDrives are crawled (cf. OneDrive connector)

  8. Index One Drives: If turned on SharePoint Online sites are crawled (cf. SharePoint Online connector)

  9. Index hidden lists: By default, the connector skips hidden SharePoint lists.

  10. Included Sites: here you can add site urls. If given, only these sites will be crawled.
    Then all previously indexed sites which are not included anymore will be deleted from the search index.

  11. Excluded Sites: here you can add site urls. If given, these sites will be not be crawled.
    Then all previously indexed sites which are not included anymore will be deleted from the search index.

  12. Excluded attachments: the file suffixes in this list will be used to determine if certain documents should not be indexed, such as images or executables.

After entering the configuration parameters, click on validate. This validates the content crawl configuration directly against the content source. If there are issues when connecting, the validator will indicate these on the page. Otherwise, you can save the configuration and continue with Content Transformation configuration.

Recommended Crawl Schedules

Content Crawls

The connector supports incremental crawls. These are based on the SharePoint changelog and depending on your tenant’s size, these can run every few hours.

The change log might not be complete and factor in all permission changes. Therefore depending on your requirements, we recommend to run a Full Scan every week.

For more information see Crawl Scheduling .

Principal Crawls

Depending on your requirements, we recommend to run a Full Principal Scan every day or less often.

For more information see Crawl Scheduling .