You’re probably right re BBEdit vs other special tools - even Rich suggested this in the beginning of our conversation.
The question is: Which other tool exactly? Excel would not import 3 million+ records 😔 and yes, I encountered the „helpful“ changes and some more too 😶 The Perl-CSV family of modules sounds like heaven 😉 - unfortunately I didn’t learn Perl yet 😶 > Am 26.03.2025 um 15:13 schrieb Bruce Van Allen <[email protected]>: > > Properly formatted .csv files may have CR, LF, and CRLF end of line > constructs in records. However, this would be where the best approach would > be to use tools designed for the .csv protocol, rather than BBEdit’s manifold > text editor tools. > > In Perl, we use the Text::CSV family of modules. You can specify the column > separator as the semi-colon in your files, and read in the records as simple > arrays, with null or blank values handled. There’s also the database route if > you wanted to go a bit further with your Perl - DBI plus DBD::CSV. > > Other modern languages have similar capabilities. > > You could probably also open that file in Excel, if you want the GUI path. > Sometimes I do that just to take a look at a file’s structure. But beware > that Excel will convert some data such as dates to its own format, and make > other “helpful” changes like dropping leading zeroes - a problem with postal > codes, e.g. - unless you first designate all cells in the sheet as “text”. > > HTH > > ------ Original Message ------ > From "Vlad Ghitulescu" <[email protected] <mailto:[email protected]>> > To [email protected] <mailto:[email protected]> > Date 3/26/2025 6:47:17 AM > Subject Re: How to filter / sort CSV-files by certain columns? > >> The „disaster" was most probably caused by errors in the CSV-file: There is >> a " in the line just after the tragedy happens and 3 CRLF in the middle of >> some rows 😣 >> >> >> >>> Am 26.03.2025 um 09:29 schrieb Vlad Ghitulescu <[email protected]>: >>> >>> The rearrange columns worked only on the first 25,816 lines >>> >>> -- >>> This is the BBEdit Talk public discussion group. If you have a feature >>> request or believe that the application isn't working correctly, please >>> email "[email protected]" rather than posting here. Follow @bbedit on >>> Mastodon: <https://mastodon.social/@bbedit> >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "BBEdit Talk" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To view this discussion visit >>> https://groups.google.com/d/msgid/bbedit/D78F9924-03B3-4117-9B3D-BC6614B5D1D5%40Ghitulescu.de. >>> <CleanShot 2025-03-26 at 09.21.19.png> >>> >>> The rest of the file was more or less mixed (see above). I’ll talk with the >>> support about it. >>> >>> >>> >>> >>> >>>> Am 26.03.2025 um 09:08 schrieb Vlad Ghitulescu <[email protected]>: >>>> >>>> Hello again, >>>> >>>> >>>> I changed all the semicolons to ",", changed then the grep ^ and $ into " >>>> and then voilá! the columns magically appeared: >>>> >>>> <CleanShot 2025-03-26 at 08.57.09.png> >>>> >>>> After this I could successfully rearrange the columns >>>> >>>> <CleanShot 2025-03-26 at 09.01.19.png> >>>> >>>> I then sorted all but the first line… and this totally destroyed the file >>>> >>>> <CleanShot 2025-03-26 at 09.03.44.png> >>>> >>>> (The copy of the file, that is - naturally! 😉) >>>> >>>> I’ll start analyzing the new problem… >>>> >>>> >>>> Regards, >>>> Vlad >>>> >>>> >>>> >>>> >>>> >>>> >>>>> Am 26.03.2025 um 08:15 schrieb Vlad Ghitulescu <[email protected]>: >>>>> >>>>> Hey Bruce, >>>>> >>>>> >>>>> Thanks for the idea! >>>>> >>>>> I changed the semicolon into tab >>>>> >>>>> <CleanShot 2025-03-26 at 08.11.14.png> >>>>> >>>>> but unfortunately that didn’t unlock the possibility to rearrange the >>>>> columns >>>>> >>>>> <CleanShot 2025-03-26 at 08.11.50.png> >>>>> >>>>> Any idea where I went wrong? >>>>> >>>>> Thanks again! >>>>> >>>>> >>>>> Regards, >>>>> Vlad >>>>> >>>>> >>>>> >>>>> >>>>>> Am 25.03.2025 um 23:49 schrieb Bruce Van Allen <[email protected]>: >>>>>> >>>>>> Hey Vlad, >>>>>> >>>>>> A BBEdit feature that might help is its ability to detect and manipulate >>>>>> *columns*. You’ll find it under the Edit menu. As far as I can tell, the >>>>>> columns it detects are tab or comma separated. A look at your sample >>>>>> showed no quote marks, but it did have a few commas; if you replace the >>>>>> semi-colon (‘;’) separators with tabs, then BBEdit will recognize your >>>>>> columns. How could that help? Well, one idea would be to move the >>>>>> column(s) you care about for search or sorting to the left-most >>>>>> position, greatly simplifying any pattern matching you want to try. >>>>>> >>>>>> Always work on a copy of your file, of course. >>>>>> >>>>>> HTH >>>>>> >>>>>> ------ Original Message ------ >>>>>> From "GP" <[email protected]> >>>>>> To "BBEdit Talk" <[email protected]> >>>>>> Date 3/25/2025 2:32:00 PM >>>>>> Subject Re: How to filter / sort CSV-files by certain columns? >>>>>> >>>>>>> As a follow up... >>>>>>> >>>>>>> BBEdit's Pattern Playground is a great help in constructing tedious >>>>>>> grep patterns like you'll need for your filtering and sorting needs. >>>>>>> The really tedious part is getting the field position(s) you want to >>>>>>> filter or sort on so you can modify that field's match pattern to >>>>>>> conform to the desired filter or sorting criteria. >>>>>>> >>>>>>> For example... For your " Filter all lines that have ADR_CHK_KZ = 1" >>>>>>> using Text -> Process Lines Containing ... with the grep pattern: >>>>>>> >>>>>>> >>>>>>> \d{3};\w{3};[^;]*;[^;]*;\d{10};\w{2};\d{2};\d{5};[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;(1);[^;]*;[^\n]* >>>>>>> >>>>>>> >>>>>>> >>>>>>> will do the trick. For filtering you don't need the group capturing on >>>>>>> the 1 but it is useful with Pattern Playground to verify you're getting >>>>>>> the right field position and field contents matched. >>>>>>> >>>>>>> >>>>>>> >>>>>>> For your "Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, >>>>>>> ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and >>>>>>> ADRC_HOUSE_NUM1" using Text -> Sort Lines ... with a grep pattern of: >>>>>>> >>>>>>> >>>>>>> >>>>>>> \d{3};\w{3};[^;]*;[^;]*;\d{10};(\w{2});(\d{2});(\d{5});([^;]*);[^;]*;([^;]*);([^;]*);([^;]*);[^;]*;\w{2};\d{2};\d{5};[^;]*;\d{12};[^;]*;[^;]*;\d{8};[^;]*;\d{12};[^;]*;[^;]*;\d;\d;\d;\d;\d;\d;\d;\d;([^;]*);[^\n]* >>>>>>> >>>>>>> with "Specific sub-patterns" selected with \8\1\2\3\4\5\6\7 in the fill >>>>>>> in field will sort your example text using your desired field ordering. >>>>>>> >>>>>>> On Tuesday, March 25, 2025 at 12:53:47 PM UTC-7 GP wrote: >>>>>>>> For filtering, look at Text -> Process Lines Containing ... and for >>>>>>>> sorting Text -> Sort Lines ... using grep patterns to identify what >>>>>>>> you want to match for filtering and what subpattern field or fields >>>>>>>> you want to sort ordered on. >>>>>>>> >>>>>>>> If the number of fields in your sample is representative of the real >>>>>>>> CSV files you're working with, it is going to be something of a pain >>>>>>>> in the rear coming up with the grep patterns needed to accomplish the >>>>>>>> desired filtering and sorting. >>>>>>>> >>>>>>>> On Tuesday, March 25, 2025 at 11:03:35 AM UTC-7 Vlad Ghitulescu wrote: >>>>>>>>> Hey, >>>>>>>>> >>>>>>>>> >>>>>>>>> I use BBEdit very often while working with big CSV-files (300 - 500 >>>>>>>>> MB, up to 4 million rows) looking like this: >>>>>>>>> >>>>>>>>> MANDT;BU;IDENTIFIER;OBJNR;ADRC_ADDRNUMBER;ADRC_COUNTRY;ADRC_REGION;ADRC_POST_CODE1;ADRC_CITY1;ADRC_CITY_EXT;ADRC_CITY2;ADRC_STREET;ADRC_HOUSE_NUM1;ADRC_HOUSE_NUM2;LOKAREF_COUNTRY;LOKAREF_REGION;LOKAREF_POST_CODE1;LOKAREF_CITY1;LOKAREF_CITY_CODE;LOKAREF_CITY_EXT;LOKAREF_CITY2;LOKAREF_CITYP_CODE;LOKAREF_STREET;LOKAREF_STRT_CODE;LOKAREF_HOUSE_NUM1;LOKAREF_HOUSE_NUM2;COUNTRY_KZ;REGION_KZ;POST_CODE1_KZ;CITY1_KZ;CITY_EXT_KZ;CITY2_KZ;STREET_KZ;ADR_CHK_KZ;MSGNO;MESSAGE >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723592;DE;09;86415;Mering;;Sankt >>>>>>>>> Afra;Egerländer Straße;;;DE;09;86415;Mering;500000002795;, >>>>>>>>> Schwab;Sankt >>>>>>>>> Afra;00000006;Egerländerstraße;910011919800;;;0;0;0;0;1;0;1;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723657;DE;09;85655;Aying;;Kaps;Kaps;;;DE;09;85653;Aying;500000002262;;Kaps;00000010;Kaps;700055566100;;;0;0;1;0;3;0;0;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723658;DE;09;83083;Riedering;;Patting;Patting;;;DE;09;83083;Riedering;500000002552;b >>>>>>>>> Rosenheim, Oberbay;Patting;00000037;Pattinger >>>>>>>>> Straße;910003809300;;;0;0;0;0;1;0;1;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723674;DE;09;85655;Aying;;Großhelfendorf;Hirschbergstraße;;;DE;09;85653;Aying;500000002262;;Großhelfendorf;00000007;Hirschbergstraße;910002873200;;;0;0;1;0;3;0;0;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723878;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>>>>>>>> >>>>>>>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>>>>>>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723908;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>>>>>>>> >>>>>>>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>>>>>>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723918;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>>>>>>>> >>>>>>>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>>>>>>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007723956;DE;09;93336;Altmannstein;;Berghausen;Altmannsteiner >>>>>>>>> >>>>>>>>> Str.;;;DE;09;93336;Altmannstein;500000005266;;Berghausen;00000003;Altmannsteiner >>>>>>>>> Straße;910001339100;;;0;0;0;0;3;0;1;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724554;DE;09;95131;Schwarzenbach >>>>>>>>> a.Wald;;Schwarzenbach a >>>>>>>>> Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach >>>>>>>>> a.Wald;500000011836;;Schwarzenbach >>>>>>>>> a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; >>>>>>>>> 200;BAG;20250324080508_/ETN/PM_EAV_ADR_CHK_ADRC_V14157F;;0007724593;DE;09;95131;Schwarzenbach >>>>>>>>> a.Wald;;Schwarzenbach a >>>>>>>>> Wald;Walter-Münch-Straße;;;DE;09;95131;Schwarzenbach >>>>>>>>> a.Wald;500000011836;;Schwarzenbach >>>>>>>>> a.Wald;00000001;Walter-Münch-Straße;910007835500;;;0;0;0;0;3;1;0;1;; >>>>>>>>> >>>>>>>>> Once in a while I’d like to filter or sort such huge files by one or >>>>>>>>> more columns, like: >>>>>>>>> >>>>>>>>> 1. Filter all lines that have ADR_CHK_KZ = 1 or >>>>>>>>> 2. Sort the file by MSGNO, ADRC_COUNTRY, ADRC_REGION, >>>>>>>>> ADRC_POST_CODE1, ADRC_CITY1, ADRC_CITY2, ADRC_STREET and >>>>>>>>> ADRC_HOUSE_NUM1. >>>>>>>>> >>>>>>>>> Is there a way to do this sort of tasks with BBEdit? >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Vlad >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> This is the BBEdit Talk public discussion group. If you have a feature >>>>>>> request or believe that the application isn't working correctly, please >>>>>>> email "[email protected]" rather than posting here. Follow @bbedit >>>>>>> on Mastodon: <https://mastodon.social/@bbedit> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "BBEdit Talk" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>>> an email to [email protected]. >>>>>>> To view this discussion visit >>>>>>> https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com<https://groups.google.com/d/msgid/bbedit/50130484-14eb-4298-b762-800f88b2c66en%40googlegroups.com?utm_medium=email&utm_source=footer>. >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> - Bruce >>>>>> >>>>>> _bruce__van_allen__santa_cruz__ca_ >>>>>> >>>>>> >>>>>> -- >>>>>> This is the BBEdit Talk public discussion group. If you have a feature >>>>>> request or believe that the application isn't working correctly, please >>>>>> email "[email protected]" rather than posting here. Follow @bbedit >>>>>> on Mastodon: <https://mastodon.social/@bbedit> >>>>>> --- You received this message because you are subscribed to the Google >>>>>> Groups "BBEdit Talk" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>>> an email to [email protected]. >>>>>> To view this discussion visit >>>>>> https://groups.google.com/d/msgid/bbedit/em0f3661a9-5cf2-410e-bf67-3da2c28d5975%40c8f72f7e.com. >>>>> >>>> >>> >>> -- >>> This is the BBEdit Talk public discussion group. If you have a feature >>> request or believe that the application isn't working correctly, please >>> email "[email protected]" rather than posting here. Follow @bbedit on >>> Mastodon: <https://mastodon.social/@bbedit> >>> --- >>> You received this message because you are subscribed to the Google Groups >>> "BBEdit Talk" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to [email protected]. >>> To view this discussion visit >>> https://groups.google.com/d/msgid/bbedit/D78F9924-03B3-4117-9B3D-BC6614B5D1D5%40Ghitulescu.de. >> > > > Thanks, > > - Bruce > > _bruce__van_allen__santa_cruz__ca_ > > > -- > This is the BBEdit Talk public discussion group. If you have a feature > request or believe that the application isn't working correctly, please email > "[email protected]" rather than posting here. Follow @bbedit on Mastodon: > <https://mastodon.social/@bbedit> > --- You received this message because you are subscribed to the Google Groups > "BBEdit Talk" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/d/msgid/bbedit/ema35f173d-e03b-43d0-84b2-ae4ce1b51f9d%40181c06df.com. -- This is the BBEdit Talk public discussion group. If you have a feature request or believe that the application isn't working correctly, please email "[email protected]" rather than posting here. Follow @bbedit on Mastodon: <https://mastodon.social/@bbedit> --- You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/bbedit/DE1323B5-2C8F-4A38-8F80-943A2A5808D3%40Ghitulescu.de.
