mneedham opened a new pull request, #8673: URL: https://github.com/apache/pinot/pull/8673
When I'm working with nested JSON documents it's a bit difficult to figure out exactly what the end result of a batch/streaming ingestion is gonna be, especially if you're using `complexConfig` or transformation functions. I find myself doing trial and error with different configurations to see if it's gonna work, which is a quite frustrating workflow! So this PR introduces a command to pinot-admin that will let the user pass in a JSON file + Table config and it will output the fields and their values that will be available for ingestion into Pinot once all transformations have been done. An example: ```json { "Title":"The Matrix", "Year":"1999", "Rated":"R", "Meta":{ "Released":"31 Mar 1999", "Runtime":"136 min", "Genres":[ "Action", "Sci-Fi" ] }, "Director":"Lana Wachowski, Lilly Wachowski", "Writer":"Lilly Wachowski, Lana Wachowski", "Actors":"Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss", "Plot":"When a beautiful stranger leads computer hacker Neo to a forbidding underworld, he discovers the shocking truth--the life he knows is the elaborate deception of an evil cyber-intelligence.", "Language":"English", "Country":"United States, Australia", "Awards":"Won 4 Oscars. 42 wins & 51 nominations total", "Ratings":[ { "Source":"Internet Movie Database", "Value":"8.7/10" }, { "Source":"Rotten Tomatoes", "Value":"88%" }, { "Source":"Metacritic", "Value":"73/100" } ], "Metascore":"73", "imdbRating":"8.7", "imdbVotes":"1,851,767", "imdbID":"tt0133093", "Type":"movie", "DVD":"15 May 2007", "BoxOffice":"$172,076,928", "Production":"N/A", "Website":"N/A", "Response":"True" } ``` ```json { "tableName": "movie_ratings_non_primitive", "tableType": "OFFLINE", "segmentsConfig": { "replication": 1, "schemaName": "movie_ratings_non_primitive" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "loadMode": "MMAP" }, "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY" }, "complexTypeConfig": { "collectionNotUnnestedToJson": "NON_PRIMITIVE", "delimiter": "__" }, "transformConfigs": [ { "columnName": "id", "transformFunction": "imdbID" }, { "columnName": "Writers", "transformFunction": "split(Writer, ',')" } ] }, "metadata": {} } ``` And then we call it like this: ``` DataImportDryRun -jsonFile /path/to/movies.json -tableConfigFile /path/to/config/table_non_primitive.json ``` Output: ``` Available Fields: { "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss", "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total", "BoxOffice" : "$172,076,928", "Country" : "United States, Australia", "DVD" : "15 May 2007", "Director" : "Lana Wachowski, Lilly Wachowski", "Language" : "English", "Meta__Genres" : [ "Action", "Sci-Fi" ], "Meta__Released" : "31 Mar 1999", "Meta__Runtime" : "136 min", "Metascore" : "73", "Plot" : "When a beautiful stranger leads computer hacker Neo to a forbidding underworld, he discovers the shocking truth--the life he knows is the elaborate deception of an evil cyber-intelligence.", "Production" : "N/A", "Rated" : "R", "Ratings" : "[{\"Value\":\"8.7/10\",\"Source\":\"Internet Movie Database\"},{\"Value\":\"88%\",\"Source\":\"Rotten Tomatoes\"},{\"Value\":\"73/100\",\"Source\":\"Metacritic\"}]", "Response" : "True", "Title" : "The Matrix", "Type" : "movie", "Website" : "N/A", "Writer" : "Lilly Wachowski, Lana Wachowski", "Writers" : [ "Lilly Wachowski", " Lana Wachowski" ], "Year" : "1999", "id" : "tt0133093", "imdbID" : "tt0133093", "imdbRating" : "8.7", "imdbVotes" : "1,851,767" } ``` It's super quick to try out a different config. e.g. this is with unnesting of the `Ratings` field and without the transformation functions: ``` { "tableName": "movie_ratings_non_primitive", "tableType": "OFFLINE", "segmentsConfig": { "replication": 1, "schemaName": "movie_ratings_non_primitive" }, "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant" }, "tableIndexConfig": { "loadMode": "MMAP" }, "ingestionConfig": { "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY" }, "complexTypeConfig": { "collectionNotUnnestedToJson": "NON_PRIMITIVE", "delimiter": "__", "fieldsToUnnest": [ "Ratings" ] } }, "metadata": {} } ``` ``` Available Fields: { "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss", "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total", "BoxOffice" : "$172,076,928", "Country" : "United States, Australia", "DVD" : "15 May 2007", "Director" : "Lana Wachowski, Lilly Wachowski", "Language" : "English", "Meta__Genres" : [ "Action", "Sci-Fi" ], "Meta__Released" : "31 Mar 1999", "Meta__Runtime" : "136 min", "Metascore" : "73", "Plot" : "When a beautiful stranger leads computer hacker Neo to a forbidding underworld, he discovers the shocking truth--the life he knows is the elaborate deception of an evil cyber-intelligence.", "Production" : "N/A", "Rated" : "R", "Ratings__Source" : "Internet Movie Database", "Ratings__Value" : "8.7/10", "Response" : "True", "Title" : "The Matrix", "Type" : "movie", "Website" : "N/A", "Writer" : "Lilly Wachowski, Lana Wachowski", "Year" : "1999", "imdbID" : "tt0133093", "imdbRating" : "8.7", "imdbVotes" : "1,851,767" } Available Fields: { "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss", "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total", "BoxOffice" : "$172,076,928", "Country" : "United States, Australia", "DVD" : "15 May 2007", "Director" : "Lana Wachowski, Lilly Wachowski", "Language" : "English", "Meta__Genres" : [ "Action", "Sci-Fi" ], "Meta__Released" : "31 Mar 1999", "Meta__Runtime" : "136 min", "Metascore" : "73", "Plot" : "When a beautiful stranger leads computer hacker Neo to a forbidding underworld, he discovers the shocking truth--the life he knows is the elaborate deception of an evil cyber-intelligence.", "Production" : "N/A", "Rated" : "R", "Ratings__Source" : "Rotten Tomatoes", "Ratings__Value" : "88%", "Response" : "True", "Title" : "The Matrix", "Type" : "movie", "Website" : "N/A", "Writer" : "Lilly Wachowski, Lana Wachowski", "Year" : "1999", "imdbID" : "tt0133093", "imdbRating" : "8.7", "imdbVotes" : "1,851,767" } Available Fields: { "Actors" : "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss", "Awards" : "Won 4 Oscars. 42 wins & 51 nominations total", "BoxOffice" : "$172,076,928", "Country" : "United States, Australia", "DVD" : "15 May 2007", "Director" : "Lana Wachowski, Lilly Wachowski", "Language" : "English", "Meta__Genres" : [ "Action", "Sci-Fi" ], "Meta__Released" : "31 Mar 1999", "Meta__Runtime" : "136 min", "Metascore" : "73", "Plot" : "When a beautiful stranger leads computer hacker Neo to a forbidding underworld, he discovers the shocking truth--the life he knows is the elaborate deception of an evil cyber-intelligence.", "Production" : "N/A", "Rated" : "R", "Ratings__Source" : "Metacritic", "Ratings__Value" : "73/100", "Response" : "True", "Title" : "The Matrix", "Type" : "movie", "Website" : "N/A", "Writer" : "Lilly Wachowski, Lana Wachowski", "Year" : "1999", "imdbID" : "tt0133093", "imdbRating" : "8.7", "imdbVotes" : "1,851,767" } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org