How to Import and Store Documents

→ "Try it out - API Component "YADB""
→ check GitHub Python calls | Java Script calls | Java calls | Postman collection


This "How to" describes how to import and store documents (objects) using the yuuvis® API.

Importing and Storing a Single Document

Import using multipart requests

To import and store a document (object), you must have at least the metadata ready - the content file itself is optional unless you have modified the schema (per document type you define whether a content file is required, optional or not allowed - "Document Object Type Definitions", contentStreamAllowed).

Format for the metadata file:

metaData.json
{
    "objects": [{
        "properties": {
            "enaio:objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import"
            }
        },
        "contentStreams": [{
            "cid": "cid_63apple"
        }]
    }]
}

In this example, the schema contains an object type document with the Name property, which may or must have content.
The content is referenced in the contentStreams object by specifying a cid (multipart content ID). In the example, the cid references a multipart content with content ID cid_63apple.

A content file can be in different file formats. We recommend to specify the format correctly in the metadata and in the multipart request. If the content type is not specified, it is automatically determined during the content analysis. If the content type determination is not clear or the content analysis is switched off, the content type application/octet-stream is used.

In the example we have chosen a text file (content-Type: text/plain).

Request

To import and store a document (object) in the system, you send a POST request to the URL /dms-core/objects with a multipart body consisting of metadata and, if applicable, a content file to be stored ("POST store one or more documents" endpoint). To construct such a request, use a MultipartBody.Builder(), which allows you to build the request body from several FORM parts as follows.

Building the Multipart Body with OkHttp3
RequestBody requestBody = new MultipartBody.Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("data", "metaData.json",
			RequestBody.create(MediaType.parse("application/json; charset=utf-8"), 
				new File("./src/main/resources/metaData.json")))
        .addFormDataPart("cid_63apple", "test.txt",
        	RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
               	new File("./src/main/resources/test.txt")))
        .build();

Use a Request.Builder() to create a request object with the multipart body, headers, and the URL.
The header Ocp-Apim-Subscription-Key is necessary because it contains the user information to access the endpoint.

Building a POST Request for an Import
Request request = new Request.Builder()
        .header("Ocp-Apim-Subscription-Key", key)
        .url(baseUrl + "/dms-core/objects")
        .post(requestBody)
        .build();

Response

To display the response of the yuuvis® API to the console, create an associated response object when the request is executed. Please note that an IOException can be thrown by the OkHttpClient when creating the response object.

Handling any IOException
try{	
	Response response = client.newCall(request).execute();
	System.out.println(response.body().string());	//print to console
} catch (IOException e) {
	e.printStackTrace();
}
Status Code Meaning
200 OK
401 Unauthorized
404 Not Found
422 Invalid Metadata

Import using two POST requests

Using the content type 'multipart/form-data' with purely binary body parts can be tricky to implement with certain languages or libraries (i.e. C++; JavaScript Axios…). If you are struggling with the multipart requests, you should consider using the alternative import method consisting of two HTTP POST requests, in which metadata is imported first and content is updated to the created object second.

Note that this option is only available if the object you are trying to create is of a type that allows for missing content ("Document Object Type Definitions, contentStreamAllowed").

To import metadata without content, you simply post the metadata.json file to the "POST Update document index data by ID" endpoint, (/dms-core/objects) of the yuuvis® API. From the response of your HTTP POST request, you retrieve the objectId of your new metadata object.

Using this objectId, you can construct the URL for updating you content, to which you post your content file with an appropriate content type. This update is similar to the content update described in the "Update Documents". Once your update is completed, you will end up with a complete (metadata + content) document.

Important: if you use the two HTTP POST requests the version number will have incremented during the update, so the initial version of our complete document would be '2' instead of '1' for documents imported using a multipart request.

Two POST requests with OkHttp3
import okhttp3.*;
import okhttp3.MediaType;
import org.json.JSONObject;

import java.io.File;
import java.net.CookieManager;
import java.net.CookiePolicy;
import java.util.concurrent.TimeUnit;

public class SimpleTwoPartImport {
    public static final MediaType JSON = MediaType.parse("application/json; charset=utf-8");
    public static final MediaType PLAINTEXT = MediaType.parse("text/plain; charset=utf-8");
    public static final String key = "";
    public static final String baseUrl = "https://api.yuuvis.io";

    public static String metadataFilePath = "D:\\Projects\\metaData.json";
    public static String contentFilePath = "D:\\Projects\\test.txt";

    public static void main(String[] args) {
        try{
            OkHttpClient.Builder builder = new OkHttpClient.Builder();
            builder.cookieJar(new JavaNetCookieJar(new CookieManager(null, CookiePolicy.ACCEPT_ALL)));
            OkHttpClient client = builder.build();

            //first, only import Metadata
            Request importMetadataRequest = new Request.Builder()
                    .header("Ocp-Apim-Subscription-Key", key)
                    .url(baseUrl+"/dms-core/objects")
                    .post(RequestBody.create(JSON, new File(metadataFilePath)))
                    .build();

            Response importMetadataResponse = client.newCall(importMetadataRequest).execute();
            String importMetadataResponseBodyString = importMetadataResponse.body().string();
            System.out.println(importMetadataResponseBodyString);

            //then extract the objectID from the response
            JSONObject importResponseJsonObject = new JSONObject(importMetadataResponseBodyString);
            String objectId = importResponseJsonObject
                    .getJSONArray("objects")
                    .getJSONObject(0)
                    .getJSONObject("properties")
                    .getJSONObject("enaio:objectId")
                    .getString("value");

            //our backend is eventually consistent, so a timer ensures that the system ...
            // ... has converged to the latest current state
            TimeUnit.SECONDS.sleep(1);
            //finally update the content to the /contents/file endpoint of our object
            Request updateContentRequest = new Request.Builder()
                    .header("Ocp-Apim-Subscription-Key", key)
                    .header("Content-Disposition", "attachment; filename=\"test.txt\"")
                    .url(baseUrl+"/dms-core/objects/"+objectId+"/contents/file")
                    .post(RequestBody.create(PLAINTEXT, new File(contentFilePath)))
                    .build();
            Response updateContentResponse = client.newCall(updateContentRequest).execute();
            System.out.println(updateContentResponse.code());
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Importing and Storing Multiple Documents in Batch Mode

If you would like to import and store multiple documents (objects) at the same time, you can use the same endpoint: "POST store one or more documents".
Instead of a single object, the objects list consists of several metadata records. The individual content files of the objects then each require a unique cid as the name of the FormDataParts in the multipart request. This cid is referenced in the associated metadata record in the contentStreams list, which allows metadata to be uniquely assigned to content.

metaDataBatch.json
{
    "objects": [{
        "properties": {
            "enaio:objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import object 1"
            }
        },
        "contentStreams": [{
            "cid": "cid_63apple"
        }]
    },
    {
      "properties": {
            "enaio:objectTypeId": {
                "value": "document"
            },
            "Name": {
                "value": "test import object 2"
            }
        },
        "contentStreams": [{
            "cid": "cid_64apple"
        }]
    }]
}

Request

In the multipart body, you create a separate FormDataPart for the content of each object, whose first parameter is the content ID (cid).

Building a POST Request for a Batch Import
RequestBody batchImportRequestBody = new MultipartBody
        .Builder()
        .setType(MultipartBody.FORM)
        .addFormDataPart("data",
        	"metaDataBatch.json",
           	RequestBody.create(MediaType.parse("application/json; charset=utf-8"),
				new File("./src/main/resources/metaDataBatch.json")))
        .addFormDataPart("cid_63apple",
        	"test1.txt",
           	RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
				new File("./src/main/resources/test1.txt")))
        .addFormDataPart("cid_64apple",
			"test2.txt",
			RequestBody.create(MediaType.parse("text/plain; charset=utf-8"),
				new File("./src/main/resources/test2.txt")))
  		.build();

The assembly of the request object is identical to the normal import.

Response

If successful, the response object contains a multi-element objects list that contains the metadata records of all documents (objects) imported in this batch import.

Importing Compound Documents

For importing compound documents please refer to "Compound Documents". It describes what compound documents are and what to consider when importing compound documents.

Content Digest - Generation and Validation

For already imported documents, using the "Store one or more documents (POST)" endpoint, a content digest is automatically generated and stored ("Secure Hash Algorythm, SHA256")

To validate the content digest for a stored document, you use the "Validate content digest by ID" endpoint. Send a request with the objectId which generates a new content digest based on the currently stored document. This newly generated content digest is compared with the formerly generated and stored one.

To validate the content digest of a specific document (object) version, simply add a /versions/{versionNr} between the objectId and the suffix beginning with /actions.

Responses

Status Code Meaning
200 OK - The value of the content digest of the specified version stored in the index data is still correct.
404 Not Found - The document (object) with this objectId and this version number can not be found.
409 Conflict - The generated content digest of the specified version does not match the value stored in the index data.