mneedham opened a new pull request, #8673:
URL: https://github.com/apache/pinot/pull/8673

   When I'm working with nested JSON documents it's a bit difficult to figure 
out exactly what the end result of a batch/streaming ingestion is gonna be, 
especially if you're using `complexConfig` or transformation functions.
   
   I find myself doing trial and error with different configurations to see if 
it's gonna work, which is a quite frustrating workflow! So this PR introduces a 
command to pinot-admin that will let the user pass in a JSON file + Table 
config and it will output the fields and their values that will be available 
for ingestion into Pinot once all transformations have been done.
   
   An example:
   
   
   ```json
   {
      "Title":"The Matrix",
      "Year":"1999",
      "Rated":"R",
      "Meta":{
         "Released":"31 Mar 1999",
         "Runtime":"136 min",
         "Genres":[
            "Action",
            "Sci-Fi"
         ]
      },
      "Director":"Lana Wachowski, Lilly Wachowski",
      "Writer":"Lilly Wachowski, Lana Wachowski",
      "Actors":"Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss",
      "Plot":"When a beautiful stranger leads computer hacker Neo to a 
forbidding underworld, he discovers the shocking truth--the life he knows is 
the elaborate deception of an evil cyber-intelligence.",
      "Language":"English",
      "Country":"United States, Australia",
      "Awards":"Won 4 Oscars. 42 wins & 51 nominations total",
      "Ratings":[
         {
            "Source":"Internet Movie Database",
            "Value":"8.7/10"
         },
         {
            "Source":"Rotten Tomatoes",
            "Value":"88%"
         },
         {
            "Source":"Metacritic",
            "Value":"73/100"
         }
      ],
      "Metascore":"73",
      "imdbRating":"8.7",
      "imdbVotes":"1,851,767",
      "imdbID":"tt0133093",
      "Type":"movie",
      "DVD":"15 May 2007",
      "BoxOffice":"$172,076,928",
      "Production":"N/A",
      "Website":"N/A",
      "Response":"True"
   }
   ```
   
   ```json
   {
     "tableName": "movie_ratings_non_primitive",
     "tableType": "OFFLINE",
     "segmentsConfig": {
       "replication": 1,
       "schemaName": "movie_ratings_non_primitive"
     },
     "tenants": {
       "broker": "DefaultTenant",
       "server": "DefaultTenant"
     },
     "tableIndexConfig": {
       "loadMode": "MMAP"
     },
     "ingestionConfig": {
       "batchIngestionConfig": {
         "segmentIngestionType": "APPEND",
         "segmentIngestionFrequency": "DAILY"
       },
       "complexTypeConfig": {
         "collectionNotUnnestedToJson": "NON_PRIMITIVE",
         "delimiter": "__"
       },
       "transformConfigs": [
         {
           "columnName": "id",
           "transformFunction": "imdbID"
         },
         {
           "columnName": "Writers",
           "transformFunction": "split(Writer, ',')"
         }
       ]
     },
     "metadata": {}
   }
   ```
   
   And then we call it like this:
   
   ```
   DataImportDryRun -jsonFile /path/to/movies.json -tableConfigFile 
/path/to/config/table_non_primitive.json
   ```
   
   Output:
   
   ```
   Available Fields: {
     "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss",
     "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total",
     "BoxOffice" : "$172,076,928",
     "Country" : "United States, Australia",
     "DVD" : "15 May 2007",
     "Director" : "Lana Wachowski, Lilly Wachowski",
     "Language" : "English",
     "Meta__Genres" : [ "Action", "Sci-Fi" ],
     "Meta__Released" : "31 Mar 1999",
     "Meta__Runtime" : "136 min",
     "Metascore" : "73",
     "Plot" : "When a beautiful stranger leads computer hacker Neo to a 
forbidding underworld, he discovers the shocking truth--the life he knows is 
the elaborate deception of an evil cyber-intelligence.",
     "Production" : "N/A",
     "Rated" : "R",
     "Ratings" : "[{\"Value\":\"8.7/10\",\"Source\":\"Internet Movie 
Database\"},{\"Value\":\"88%\",\"Source\":\"Rotten 
Tomatoes\"},{\"Value\":\"73/100\",\"Source\":\"Metacritic\"}]",
     "Response" : "True",
     "Title" : "The Matrix",
     "Type" : "movie",
     "Website" : "N/A",
     "Writer" : "Lilly Wachowski, Lana Wachowski",
     "Writers" : [ "Lilly Wachowski", " Lana Wachowski" ],
     "Year" : "1999",
     "id" : "tt0133093",
     "imdbID" : "tt0133093",
     "imdbRating" : "8.7",
     "imdbVotes" : "1,851,767"
   }
   ```
   
   It's super quick to try out a different config. 
   
   e.g. this is with unnesting of the `Ratings` field and without the 
transformation functions:
   
   ```
   {
     "tableName": "movie_ratings_non_primitive",
     "tableType": "OFFLINE",
     "segmentsConfig": {
       "replication": 1,
       "schemaName": "movie_ratings_non_primitive"
     },
     "tenants": {
       "broker": "DefaultTenant",
       "server": "DefaultTenant"
     },
     "tableIndexConfig": {
       "loadMode": "MMAP"
     },
     "ingestionConfig": {
       "batchIngestionConfig": {
         "segmentIngestionType": "APPEND",
         "segmentIngestionFrequency": "DAILY"
       },
       "complexTypeConfig": {
         "collectionNotUnnestedToJson": "NON_PRIMITIVE",
         "delimiter": "__",
         "fieldsToUnnest": [
           "Ratings"
         ]
       }
     },
     "metadata": {}
   }
   ```
   
   ```
   Available Fields: {
     "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss",
     "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total",
     "BoxOffice" : "$172,076,928",
     "Country" : "United States, Australia",
     "DVD" : "15 May 2007",
     "Director" : "Lana Wachowski, Lilly Wachowski",
     "Language" : "English",
     "Meta__Genres" : [ "Action", "Sci-Fi" ],
     "Meta__Released" : "31 Mar 1999",
     "Meta__Runtime" : "136 min",
     "Metascore" : "73",
     "Plot" : "When a beautiful stranger leads computer hacker Neo to a 
forbidding underworld, he discovers the shocking truth--the life he knows is 
the elaborate deception of an evil cyber-intelligence.",
     "Production" : "N/A",
     "Rated" : "R",
     "Ratings__Source" : "Internet Movie Database",
     "Ratings__Value" : "8.7/10",
     "Response" : "True",
     "Title" : "The Matrix",
     "Type" : "movie",
     "Website" : "N/A",
     "Writer" : "Lilly Wachowski, Lana Wachowski",
     "Year" : "1999",
     "imdbID" : "tt0133093",
     "imdbRating" : "8.7",
     "imdbVotes" : "1,851,767"
   }
   Available Fields: {
     "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss",
     "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total",
     "BoxOffice" : "$172,076,928",
     "Country" : "United States, Australia",
     "DVD" : "15 May 2007",
     "Director" : "Lana Wachowski, Lilly Wachowski",
     "Language" : "English",
     "Meta__Genres" : [ "Action", "Sci-Fi" ],
     "Meta__Released" : "31 Mar 1999",
     "Meta__Runtime" : "136 min",
     "Metascore" : "73",
     "Plot" : "When a beautiful stranger leads computer hacker Neo to a 
forbidding underworld, he discovers the shocking truth--the life he knows is 
the elaborate deception of an evil cyber-intelligence.",
     "Production" : "N/A",
     "Rated" : "R",
     "Ratings__Source" : "Rotten Tomatoes",
     "Ratings__Value" : "88%",
     "Response" : "True",
     "Title" : "The Matrix",
     "Type" : "movie",
     "Website" : "N/A",
     "Writer" : "Lilly Wachowski, Lana Wachowski",
     "Year" : "1999",
     "imdbID" : "tt0133093",
     "imdbRating" : "8.7",
     "imdbVotes" : "1,851,767"
   }
   Available Fields: {
     "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss",
     "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total",
     "BoxOffice" : "$172,076,928",
     "Country" : "United States, Australia",
     "DVD" : "15 May 2007",
     "Director" : "Lana Wachowski, Lilly Wachowski",
     "Language" : "English",
     "Meta__Genres" : [ "Action", "Sci-Fi" ],
     "Meta__Released" : "31 Mar 1999",
     "Meta__Runtime" : "136 min",
     "Metascore" : "73",
     "Plot" : "When a beautiful stranger leads computer hacker Neo to a 
forbidding underworld, he discovers the shocking truth--the life he knows is 
the elaborate deception of an evil cyber-intelligence.",
     "Production" : "N/A",
     "Rated" : "R",
     "Ratings__Source" : "Metacritic",
     "Ratings__Value" : "73/100",
     "Response" : "True",
     "Title" : "The Matrix",
     "Type" : "movie",
     "Website" : "N/A",
     "Writer" : "Lilly Wachowski, Lana Wachowski",
     "Year" : "1999",
     "imdbID" : "tt0133093",
     "imdbRating" : "8.7",
     "imdbVotes" : "1,851,767"
   }
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to