Repository: zeppelin Updated Branches: refs/heads/master 4fd85f773 -> 5e75145ac
[ZEPPELIN-1859] Add MongoNotebookRepo ### What is this PR for? This PR adds Mongo notebook storage. The reason that I made this feature is for HA(High Availability). S3 and Git storage are the only available method for HA as far as I know. I'm managing Ambari cluster in my lab, but Zeppelin is the most vulnerable part of it. Because one server contains all Zeppelin notes. Therefore, by deploying MongoDB's [replica set](https://docs.mongodb.com/manual/replication/) and using it as Zeppelin notebook storage, I would like to achieve HA. #### The way to use Mongo DB as notebook storage ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo ``` or at `zeppelin-site.xml`: ```xml <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> ``` #### Configurable environment variables * `ZEPPELIN_NOTEBOOK_MONGO_URI` MongoDB connection URI * `ZEPPELIN_NOTEBOOK_MONGO_DATABASE` Database name * `ZEPPELIN_NOTEBOOK_MONGO_COLLECTION` Collection name * `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` If `true`, automatically import your local notes. Default `false` They can be configured at `zeppelin-site.xml` as well: * `zeppelin.notebook.mongo.uri` * `zeppelin.notebook.mongo.database` * `zeppelin.notebook.mongo.collection` * `zeppelin.notebook.mongo.autoimport` #### Future work If we use Mongo DB's [oplog tailing](https://docs.mongodb.com/manual/core/replica-set-oplog/), maybe multi-server architecture is possible. ### What type of PR is it? [Feature] ### Todos * [ ] - Write a documentation for Mongo storage ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1859 ### How should this be tested? #### Install MongoDB (if you don't have) ```sh brew update brew install mongodb ``` #### Build Zepppelin ```sh mvn clean package -DskipTests ``` #### Run Zeppelin wih Mongo storage ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true bin/zeppelin-daemon.sh restart ``` The default database and collection names are `zeppelin`, `notes` respectively. And `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` option will automatically import your `local notes` that don't exist in MongoDB. #### Check whether a document in MongoDB updated Create, update, remove a note and open mongo shell: ```sh mongo zeppelin ``` And check state of the note is the same as you think: ```sh db.notes.findOne({_id: '<NOTE_ID_THAT_YOU_WANT_TO_SEE>'}) ``` #### Confirm that configurations works ```sh export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=otherdb export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=mynotes export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://localhost:27017 bin/zeppelin-daemon.sh restart ``` The collection `mynotes` should be created in db `otherdb`. Let's check it! ```sh mongo otherdb db.mynotes.count() ``` The result should not be zero. #### Confirm that configurations from `zeppelin-site.xml` works Open your `conf/zeppelin-site.xml` file (copy from `zeppelin-site.xml.template` if you don't have one), and comment lines below: ```xml <!-- <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.VFSNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> --> ``` And add lines below: ```xml <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> <description>notebook persistence layer implementation</description> </property> <property> <name>zeppelin.notebook.mongo.uri</name> <value>mongodb://localhost</value> <description>MongoDB connection URI used to connect to a MongoDB database server</description> </property> <property> <name>zeppelin.notebook.mongo.database</name> <value>zepl</value> <description>database name for notebook storage</description> </property> <property> <name>zeppelin.notebook.mongo.collection</name> <value>notes</value> <description>collection name for notebook storage</description> </property> <property> <name>zeppelin.notebook.mongo.autoimport</name> <value>false</value> <description>import local notes into MongoDB automatically on startup</description> </property> ``` This time we will import a note via `mongoimport`. I made it possible to import a note from JSON just in case. ```sh cd $ZEPPELIN_HOME/notebook/<NOTE_ID_YOU_WANT_TO_IMPORT> mongoimport --db zepl --collection notes --file note.json ``` Ensure that your environment variables are clean(just reopen your terminal if you are not), and restart zeppelin: ```sh bin/zeppelin-daemon.sh restart ``` Open browser and go to `localhost:8080`. The note that you imported should be shown. ### Questions: * Does the licenses files need update? Maybe...? I used [java-mongodb-driver](https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver/3.4.1) which has *The Apache Software License, Version 2.0* * Is there breaking changes for older versions? NO * Does this needs documentation? YES Author: Jun Kim <i2r....@gmail.com> Closes #1826 from tae-jun/ZEPPELIN-1859 and squashes the following commits: 98282ae [Jun Kim] Add a documentation for MongoDB notebook storage 77947b8 [Jun Kim] Add license information of mongo-java-driver 08eee3d [Jun Kim] fix style check error a4fba8c [Jun Kim] Add MongoNotebookRepo Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/5e75145a Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/5e75145a Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/5e75145a Branch: refs/heads/master Commit: 5e75145ac84bf6fb844b90f3a466bca89a6fde9a Parents: 4fd85f7 Author: Jun Kim <i2r....@gmail.com> Authored: Sun Feb 12 20:21:59 2017 +0900 Committer: Lee moon soo <m...@apache.org> Committed: Tue Feb 21 09:29:09 2017 +0900 ---------------------------------------------------------------------- conf/zeppelin-env.sh.template | 4 + conf/zeppelin-site.xml.template | 33 +++ docs/_includes/themes/zeppelin/_navigation.html | 4 +- docs/storage/storage.md | 72 ++++- zeppelin-distribution/src/bin_license/LICENSE | 1 + zeppelin-zengine/pom.xml | 9 +- .../zeppelin/conf/ZeppelinConfiguration.java | 20 ++ .../notebook/repo/MongoNotebookRepo.java | 268 +++++++++++++++++++ 8 files changed, 407 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/conf/zeppelin-env.sh.template ---------------------------------------------------------------------- diff --git a/conf/zeppelin-env.sh.template b/conf/zeppelin-env.sh.template index 46fd481..3dccca6 100644 --- a/conf/zeppelin-env.sh.template +++ b/conf/zeppelin-env.sh.template @@ -36,6 +36,10 @@ # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_ID # AWS KMS key ID # export ZEPPELIN_NOTEBOOK_S3_KMS_KEY_REGION # AWS KMS key region # export ZEPPELIN_NOTEBOOK_S3_SSE # Server-side encryption enabled for notebooks +# export ZEPPELIN_NOTEBOOK_MONGO_URI # MongoDB connection URI used to connect to a MongoDB database server. Default "mongodb://localhost" +# export ZEPPELIN_NOTEBOOK_MONGO_DATABASE # Database name to store notebook. Default "zeppelin" +# export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION # Collection name to store notebook. Default "notes" +# export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT # If "true" import local notes under ZEPPELIN_NOTEBOOK_DIR on startup. Default "false" # export ZEPPELIN_IDENT_STRING # A string representing this instance of zeppelin. $USER by default. # export ZEPPELIN_NICENESS # The scheduling priority for daemons. Defaults to 0. # export ZEPPELIN_INTERPRETER_LOCALREPO # Local repository for interpreter's additional dependency loading http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/conf/zeppelin-site.xml.template ---------------------------------------------------------------------- diff --git a/conf/zeppelin-site.xml.template b/conf/zeppelin-site.xml.template index abaff30..fa47e1b 100755 --- a/conf/zeppelin-site.xml.template +++ b/conf/zeppelin-site.xml.template @@ -182,6 +182,39 @@ </property> --> +<!-- MongoDB notebook storage --> +<!-- +<property> + <name>zeppelin.notebook.storage</name> + <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> + <description>notebook persistence layer implementation</description> +</property> + +<property> + <name>zeppelin.notebook.mongo.uri</name> + <value>mongodb://localhost</value> + <description>MongoDB connection URI used to connect to a MongoDB database server</description> +</property> + +<property> + <name>zeppelin.notebook.mongo.database</name> + <value>zeppelin</value> + <description>database name for notebook storage</description> +</property> + +<property> + <name>zeppelin.notebook.mongo.collection</name> + <value>notes</value> + <description>collection name for notebook storage</description> +</property> + +<property> + <name>zeppelin.notebook.mongo.autoimport</name> + <value>false</value> + <description>import local notes into MongoDB automatically on startup</description> +</property> +--> + <property> <name>zeppelin.notebook.storage</name> <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/docs/_includes/themes/zeppelin/_navigation.html ---------------------------------------------------------------------- diff --git a/docs/_includes/themes/zeppelin/_navigation.html b/docs/_includes/themes/zeppelin/_navigation.html index b53e9e9..ec7abfa 100644 --- a/docs/_includes/themes/zeppelin/_navigation.html +++ b/docs/_includes/themes/zeppelin/_navigation.html @@ -101,6 +101,7 @@ <li><a href="{{BASE_PATH}}/storage/storage.html#notebook-storage-in-s3">S3 Storage</a></li> <li><a href="{{BASE_PATH}}/storage/storage.html#notebook-storage-in-azure">Azure Storage</a></li> <li><a href="{{BASE_PATH}}/storage/storage.html#storage-in-zeppelinhub">ZeppelinHub Storage</a></li> + <li><a href="{{BASE_PATH}}/storage/storage.html#notebook-storage-in-mongodb">MongoDB Storage</a></li> <li role="separator" class="divider"></li> <li class="title"><span><b>REST API</b><span></li> <li><a href="{{BASE_PATH}}/rest-api/rest-interpreter.html">Interpreter API</a></li> @@ -110,7 +111,7 @@ <li><a href="{{BASE_PATH}}/rest-api/rest-helium.html">Helium API</a></li> <li role="separator" class="divider"></li> <li class="title"><span><b>Security</b><span></li> - <li><a href="{{BASE_PATH}}/security/shiroauthentication.html">Shiro Authentication</a></li> + <li><a href="{{BASE_PATH}}/security/shiroauthentication.html">Shiro Authentication</a></li> <li><a href="{{BASE_PATH}}/security/notebook_authorization.html">Notebook Authorization</a></li> <li><a href="{{BASE_PATH}}/security/datasource_authorization.html">Data Source Authorization</a></li> <li role="separator" class="divider"></li> @@ -139,4 +140,3 @@ </nav><!--/.navbar-collapse --> </div> </div> - http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/docs/storage/storage.md ---------------------------------------------------------------------- diff --git a/docs/storage/storage.md b/docs/storage/storage.md index 73388da..9deca3d 100644 --- a/docs/storage/storage.md +++ b/docs/storage/storage.md @@ -32,6 +32,7 @@ There are few notebook storage systems available for a use out of the box: * all notes are saved in the notebook folder in your local File System - `VFSNotebookRepo` * storage using Amazon S3 service - `S3NotebookRepo` * storage using Azure service - `AzureNotebookRepo` + * storage using MongoDB - `MongoNotebookRepo` Multiple storage systems can be used at the same time by providing a comma-separated list of the class-names in the configuration. By default, only first two of them will be automatically kept in sync by Zeppelin. @@ -184,7 +185,7 @@ Or using the following setting in **zeppelin-site.xml**: ``` </br> -## Notebook Storage in Azure <a name="Azure"></a> +## Notebook Storage in Azure <a name="Azure"></a> Using `AzureNotebookRepo` you can connect your Zeppelin with your Azure account for notebook storage. @@ -274,3 +275,72 @@ export ZEPPELINHUB_API_ADDRESS = address of ZeppelinHub service (e.g. https://ww ``` You can get more information on generating `token` and using authentication on the corresponding [help page](http://help.zeppelinhub.com/zeppelin_integration/#add-a-new-zeppelin-instance-and-generate-a-token). + + +## Notebook Storage in MongoDB <a name="MongoDB"></a> +Using `MongoNotebookRepo`, you can store your notebook in [MongoDB](https://www.mongodb.com/). + +### Why MongoDB? +* **[High Availability (HA)](https://en.wikipedia.org/wiki/High_availability)** by a [replica set](https://docs.mongodb.com/manual/reference/glossary/#term-replica-set) +* Seperation of storage from server + +### How to use +You can use MongoDB as notebook storage by editting `zeppelin-env.sh` or `zeppelin-site.xml`. + +#### (Method 1) by editting `zeppelin-env.sh` +Add a line below to `$ZEPPELIN_HOME/conf/zeppelin-env.sh`: + +```sh +export ZEPPELIN_NOTEBOOK_STORAGE=org.apache.zeppelin.notebook.repo.MongoNotebookRepo +``` + +> *NOTE:* The default MongoDB connection URI is `mongodb://localhost` + +#### (Method 2) by editting `zeppelin-site.xml` +Or, **uncomment** lines below at `$ZEPPELIN_HOME/conf/zeppelin-site.xml`: + +```xml +<property> + <name>zeppelin.notebook.storage</name> + <value>org.apache.zeppelin.notebook.repo.MongoNotebookRepo</value> + <description>notebook persistence layer implementation</description> +</property> +``` + +And **comment** lines below: + +```xml +<property> + <name>zeppelin.notebook.storage</name> + <value>org.apache.zeppelin.notebook.repo.GitNotebookRepo</value> + <description>versioned notebook persistence layer implementation</description> +</property> +``` + +### Configurable Options + +You can configure options below in `zeppelin-env.sh`. + +* `ZEPPELIN_NOTEBOOK_MONGO_URI` [MongoDB connection URI](https://docs.mongodb.com/manual/reference/connection-string/) used to connect to a MongoDB database server +* `ZEPPELIN_NOTEBOOK_MONGO_DATABASE` Database name +* `ZEPPELIN_NOTEBOOK_MONGO_COLLECTION` Collection name +* `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` If `true`, import local notes (refer to description below for details) + +Or, you can configure them in `zeppelin-site.xml`. Corresponding option names as follows: + +* `zeppelin.notebook.mongo.uri` +* `zeppelin.notebook.mongo.database` +* `zeppelin.notebook.mongo.collection` +* `zeppelin.notebook.mongo.autoimport` + +#### Example configurations in `zeppelin-env.sh` + +```sh +export ZEPPELIN_NOTEBOOK_MONGO_URI=mongodb://db1.example.com:27017 +export ZEPPELIN_NOTEBOOK_MONGO_DATABASE=myfancy +export ZEPPELIN_NOTEBOOK_MONGO_COLLECTION=notebook +export ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT=true +``` + +#### Import your local notes automatically +By setting `ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT` as `true` (default `false`), you can import your local notes automatically when Zeppelin daemon starts up. This feature is for easy migration from local file system storage to MongoDB storage. A note with ID already existing in the collection will not be imported. http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/zeppelin-distribution/src/bin_license/LICENSE ---------------------------------------------------------------------- diff --git a/zeppelin-distribution/src/bin_license/LICENSE b/zeppelin-distribution/src/bin_license/LICENSE index d27f27d..9f0c74b 100644 --- a/zeppelin-distribution/src/bin_license/LICENSE +++ b/zeppelin-distribution/src/bin_license/LICENSE @@ -216,6 +216,7 @@ The following components are provided under Apache License. (Apache 2.0) Scalatest 2.2.4 (org.scalatest:scalatest_2.10:2.2.4 - https://github.com/scalatest/scalatest) (Apache 2.0) frontend-maven-plugin 1.3 (com.github.eirslett:frontend-maven-plugin:1.3 - https://github.com/eirslett/frontend-maven-plugin/blob/frontend-plugins-1.3/LICENSE (Apache 2.0) frontend-plugin-core 1.3 (com.github.eirslett:frontend-plugin-core) - https://github.com/eirslett/frontend-maven-plugin/blob/frontend-plugins-1.3/LICENSE + (Apache 2.0) mongo-java-driver 3.4.1 (org.mongodb:mongo-java-driver:3.4.1) - https://github.com/mongodb/mongo-java-driver/blob/master/LICENSE.txt ======================================================================== MIT licenses http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/zeppelin-zengine/pom.xml ---------------------------------------------------------------------- diff --git a/zeppelin-zengine/pom.xml b/zeppelin-zengine/pom.xml index ff70fed..d5fef16 100644 --- a/zeppelin-zengine/pom.xml +++ b/zeppelin-zengine/pom.xml @@ -144,7 +144,7 @@ <artifactId>jetty-client</artifactId> <version>${jetty.version}</version> </dependency> - + <dependency> <groupId>org.eclipse.jetty.websocket</groupId> <artifactId>websocket-client</artifactId> @@ -284,6 +284,13 @@ <artifactId>commons-lang3</artifactId> <version>${commons.lang3.version}</version> </dependency> + + <dependency> + <groupId>org.mongodb</groupId> + <artifactId>mongo-java-driver</artifactId> + <version>3.4.1</version> + </dependency> + </dependencies> <build> http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java ---------------------------------------------------------------------- diff --git a/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java b/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java index 0708719..81507ba 100644 --- a/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java +++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/conf/ZeppelinConfiguration.java @@ -384,6 +384,22 @@ public class ZeppelinConfiguration extends XMLConfiguration { return getBoolean(ConfVars.ZEPPELIN_NOTEBOOK_S3_SSE); } + public String getMongoUri() { + return getString(ConfVars.ZEPPELIN_NOTEBOOK_MONGO_URI); + } + + public String getMongoDatabase() { + return getString(ConfVars.ZEPPELIN_NOTEBOOK_MONGO_DATABASE); + } + + public String getMongoCollection() { + return getString(ConfVars.ZEPPELIN_NOTEBOOK_MONGO_COLLECTION); + } + + public boolean getMongoAutoimport() { + return getBoolean(ConfVars.ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT); + } + public String getInterpreterListPath() { return getRelativeDir(String.format("%s/interpreter-list", getConfDir())); } @@ -595,6 +611,10 @@ public class ZeppelinConfiguration extends XMLConfiguration { ZEPPELIN_NOTEBOOK_AZURE_CONNECTION_STRING("zeppelin.notebook.azure.connectionString", null), ZEPPELIN_NOTEBOOK_AZURE_SHARE("zeppelin.notebook.azure.share", "zeppelin"), ZEPPELIN_NOTEBOOK_AZURE_USER("zeppelin.notebook.azure.user", "user"), + ZEPPELIN_NOTEBOOK_MONGO_DATABASE("zeppelin.notebook.mongo.database", "zeppelin"), + ZEPPELIN_NOTEBOOK_MONGO_COLLECTION("zeppelin.notebook.mongo.collection", "notes"), + ZEPPELIN_NOTEBOOK_MONGO_URI("zeppelin.notebook.mongo.uri", "mongodb://localhost"), + ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT("zeppelin.notebook.mongo.autoimport", false), ZEPPELIN_NOTEBOOK_STORAGE("zeppelin.notebook.storage", GitNotebookRepo.class.getName()), ZEPPELIN_NOTEBOOK_ONE_WAY_SYNC("zeppelin.notebook.one.way.sync", false), // whether by default note is public or private http://git-wip-us.apache.org/repos/asf/zeppelin/blob/5e75145a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/MongoNotebookRepo.java ---------------------------------------------------------------------- diff --git a/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/MongoNotebookRepo.java b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/MongoNotebookRepo.java new file mode 100644 index 0000000..9502cf3 --- /dev/null +++ b/zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/repo/MongoNotebookRepo.java @@ -0,0 +1,268 @@ +package org.apache.zeppelin.notebook.repo; + +import com.google.gson.Gson; +import com.google.gson.GsonBuilder; +import com.mongodb.MongoBulkWriteException; +import com.mongodb.MongoClient; +import com.mongodb.MongoClientURI; +import com.mongodb.bulk.BulkWriteError; +import com.mongodb.client.MongoCollection; +import com.mongodb.client.MongoCursor; +import com.mongodb.client.MongoDatabase; +import static com.mongodb.client.model.Filters.eq; +import static com.mongodb.client.model.Filters.type; +import static com.mongodb.client.model.Filters.in; + +import com.mongodb.client.model.InsertManyOptions; +import com.mongodb.client.model.UpdateOptions; +import org.apache.zeppelin.conf.ZeppelinConfiguration; +import org.apache.zeppelin.notebook.Note; +import org.apache.zeppelin.notebook.NoteInfo; +import org.apache.zeppelin.notebook.NotebookImportDeserializer; +import org.apache.zeppelin.notebook.Paragraph; +import org.apache.zeppelin.notebook.ApplicationState; +import org.apache.zeppelin.scheduler.Job; +import org.apache.zeppelin.user.AuthenticationInfo; +import org.bson.BsonType; +import org.bson.Document; +import org.bson.types.ObjectId; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.util.*; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Backend for storing Notebook on MongoDB + */ +public class MongoNotebookRepo implements NotebookRepo { + private static final Logger LOG = LoggerFactory.getLogger(MongoNotebookRepo.class); + + private final ZeppelinConfiguration conf; + private final MongoClient mongo; + private final MongoDatabase db; + private final MongoCollection<Document> coll; + + public MongoNotebookRepo(ZeppelinConfiguration conf) throws IOException { + this.conf = conf; + + mongo = new MongoClient(new MongoClientURI(conf.getMongoUri())); + db = mongo.getDatabase(conf.getMongoDatabase()); + coll = db.getCollection(conf.getMongoCollection()); + + if (conf.getMongoAutoimport()) { + // import local notes into MongoDB + insertFileSystemNotes(); + } + } + + /** + * If environment variable ZEPPELIN_NOTEBOOK_MONGO_AUTOIMPORT is true, + * this method will insert local notes into MongoDB on startup. + * If a note already exists in MongoDB, skip it. + */ + private void insertFileSystemNotes() throws IOException { + LinkedList<Document> docs = new LinkedList<>(); // docs to be imported + NotebookRepo vfsRepo = new VFSNotebookRepo(this.conf); + List<NoteInfo> infos = vfsRepo.list(null); + // collect notes to be imported + for (NoteInfo info : infos) { + Note note = vfsRepo.get(info.getId(), null); + Document doc = noteToDocument(note); + docs.add(doc); + } + + /* + * 'ordered(false)' option allows to proceed bulk inserting even though + * there are duplicated documents. The duplicated documents will be skipped + * and print a WARN log. + */ + try { + coll.insertMany(docs, new InsertManyOptions().ordered(false)); + } catch (MongoBulkWriteException e) { + printDuplicatedException(e); //print duplicated document warning log + } + + vfsRepo.close(); // it does nothing for now but maybe in the future... + } + + /** + * MongoBulkWriteException contains error messages that inform + * which documents were duplicated. This method catches those ID and print them. + * @param e + */ + private void printDuplicatedException(MongoBulkWriteException e) { + List<BulkWriteError> errors = e.getWriteErrors(); + for (BulkWriteError error : errors) { + String msg = error.getMessage(); + Pattern pattern = Pattern.compile("[A-Z0-9]{9}"); // regex for note ID + Matcher matcher = pattern.matcher(msg); + if (matcher.find()) { // if there were a note ID + String noteId = matcher.group(); + LOG.warn("Note " + noteId + " not inserted since already exists in MongoDB"); + } + } + } + + @Override + public List<NoteInfo> list(AuthenticationInfo subject) throws IOException { + syncId(); + + List<NoteInfo> infos = new LinkedList<>(); + MongoCursor<Document> cursor = coll.find().iterator(); + + while (cursor.hasNext()) { + Document doc = cursor.next(); + Note note = documentToNote(doc); + NoteInfo info = new NoteInfo(note); + infos.add(info); + } + + cursor.close(); + + return infos; + } + + /** + * Find documents of which type of _id is object ID, and change it to note ID. + * Since updating _id field is not allowed, remove original documents and insert + * new ones with string _id(note ID) + */ + private void syncId() { + // find documents whose id type is object id + MongoCursor<Document> cursor = coll.find(type("_id", BsonType.OBJECT_ID)).iterator(); + // if there is no such document, exit + if (!cursor.hasNext()) + return; + + List<ObjectId> oldDocIds = new LinkedList<>(); // document ids need to update + List<Document> updatedDocs = new LinkedList<>(); // new documents to be inserted + + while (cursor.hasNext()) { + Document doc = cursor.next(); + // store original _id + ObjectId oldId = doc.getObjectId("_id"); + oldDocIds.add(oldId); + // store the document with string _id (note id) + String noteId = doc.getString("id"); + doc.put("_id", noteId); + updatedDocs.add(doc); + } + + coll.insertMany(updatedDocs); + coll.deleteMany(in("_id", oldDocIds)); + + cursor.close(); + } + + /** + * Convert document to note + */ + private Note documentToNote(Document doc) { + // document to JSON + String json = doc.toJson(); + // JSON to note + Gson gson = new GsonBuilder() + .registerTypeAdapter(Date.class, new NotebookImportDeserializer()) + .create(); + Note note = gson.fromJson(json, Note.class); + + for (Paragraph p : note.getParagraphs()) { + if (p.getStatus() == Job.Status.PENDING || p.getStatus() == Job.Status.RUNNING) { + p.setStatus(Job.Status.ABORT); + } + + List<ApplicationState> appStates = p.getAllApplicationStates(); + if (appStates != null) { + for (ApplicationState app : appStates) { + if (app.getStatus() != ApplicationState.Status.ERROR) { + app.setStatus(ApplicationState.Status.UNLOADED); + } + } + } + } + + return note; + } + + /** + * Convert note to document + */ + private Document noteToDocument(Note note) { + // note to JSON + Gson gson = new GsonBuilder().create(); + String json = gson.toJson(note); + // JSON to document + Document doc = Document.parse(json); + // set object id as note id + doc.put("_id", note.getId()); + return doc; + } + + @Override + public Note get(String noteId, AuthenticationInfo subject) throws IOException { + Document doc = coll.find(eq("_id", noteId)).first(); + + if (doc == null) { + throw new IOException("Note " + noteId + "not found"); + } + + return documentToNote(doc); + } + + @Override + public void save(Note note, AuthenticationInfo subject) throws IOException { + Document doc = noteToDocument(note); + coll.replaceOne(eq("_id", note.getId()), doc, new UpdateOptions().upsert(true)); + } + + @Override + public void remove(String noteId, AuthenticationInfo subject) throws IOException { + coll.deleteOne(eq("_id", noteId)); + } + + @Override + public void close() { + mongo.close(); + } + + @Override + public Revision checkpoint(String noteId, String checkpointMsg, AuthenticationInfo subject) + throws IOException { + // no-op + LOG.warn("Checkpoint feature isn't supported in {}", this.getClass().toString()); + return Revision.EMPTY; + } + + @Override + public Note get(String noteId, String revId, AuthenticationInfo subject) throws IOException { + LOG.warn("Get note revision feature isn't supported in {}", this.getClass().toString()); + return null; + } + + @Override + public List<Revision> revisionHistory(String noteId, AuthenticationInfo subject) { + LOG.warn("Get Note revisions feature isn't supported in {}", this.getClass().toString()); + return Collections.emptyList(); + } + + @Override + public Note setNoteRevision(String noteId, String revId, AuthenticationInfo subject) + throws IOException { + // Auto-generated method stub + return null; + } + + @Override + public List<NotebookRepoSettingsInfo> getSettings(AuthenticationInfo subject) { + LOG.warn("Method not implemented"); + return Collections.emptyList(); + } + + @Override + public void updateSettings(Map<String, String> settings, AuthenticationInfo subject) { + LOG.warn("Method not implemented"); + } +}