- #Powershell search multiple excel files how to
- #Powershell search multiple excel files drivers
- #Powershell search multiple excel files update
- #Powershell search multiple excel files driver
#Powershell search multiple excel files driver
# The Text OleDB driver is only available in PowerShell x86. This can possibly be addressed by typing via scheme.ini, but this is fast enough for me. Text fields were super fast (6 secs), whereas number fields were much slower (14 secs). Executing just the dupecheck query completed in 5.56 seconds.Īlso, the type of data it's comparing seems to matter. Because the duplicates are added to a datatable, the more dupes you have, the longer it will take to fill. The first is the number of dupes returned. Note that a couple things will impact the speed of this script.
No thanks! We'll just use the 32-bit version of PowerShell.
#Powershell search multiple excel files drivers
64-bit drivers are available but you have to download them separately, and they weren't even faster. Ultimately, the 32-bit versions of OdbcConnection and OleDbConnection Text drivers did the trick. Here's a screenshot of the results after parsing a one million row CSV file:
#Powershell search multiple excel files how to
I ended up figuring out how to stream directly to $bulkcopy.WriteToServer() but was hugely disappointed when it actually decreased performance (go StreamReader!) But then I realized that I had actually come up with a way to process the CSV using and the results are fantastic. I read PowerShell Deep Dives which actually gave me some additional ideas on how to stream text, so I returned to Import-CSVtoSQL.ps1 to see if I could increase the performance even more using some streaming (think $bulkCopy.WriteToServer($sqlcmd.ExecuteReader())). Each of these either weren't fast enough or just didn't work. At the end of the day, I ended up playing with bulkcopy, StreamReader, StringCollection, Dictionary, XmlTextWriter, XDocument, HashSet, Datatable, DataView, $dt.DefaultView.ToTable, notcontains, Select-Unique, Get-Content -ReadCount 0, and other methods. I thought it may be possible to perform a SELECT on this CSV data using XML or a datatable. So how do I accomplish this set-based query natively, without SQL Server? Basically, set-based queries make SQL Server do work only once, whereas row-by-row based queries (such as CURSORS and UDFs) make SQL Server do work for every row. So I wondered if it was possible to search a CSV using a set-based method rather than RBAR (row by agonizing row).įinding duplicates in SQL Server using GROUP BY and HAVING is super fast because the query is set-based. I soon realized, however, that because the technique emptied the dataset, I wouldn't be able to find duplicates within the entire CSV. With this script, I'm able to import more than 1.7 million rows per minute from CSV to SQL. This got me thinking that perhaps I could employ a technique similar to the one that I used in Import-CSVtoSQL.ps1. I actually tried this using LINQ in PowerShell, but it seemed exponetially slower as the dataset grew.Ī user on reddit's PowerShell subreddit asked for the fastest way to search for duplicates in a CSV. My million row CSV with 9k dupes processed in 6.4 seconds.
#Powershell search multiple excel files update
Update : This native PowerShell script can process over 165,000 rows a second.