Regex Help?

ijason jason at infogenix.com
Mon Oct 10 09:04:38 MDT 2011


On 10/8/2011 3:58 PM, Charles Curley wrote:
> On Sat, 8 Oct 2011 13:19:43 -0600
> "S. Dale Morrey"<sdalemorrey at gmail.com>  wrote:
>
>> Ok I'll admit it, I suck at regex.
>> Unfortunately, I now have the task of importing a customers catalog
>> price list into a database and I'm not sure where to begin.
>> I really think a simple regex could convert the whole thing to a CSV
>> and I could then import the CSV directly.
>>
>> Here is an example of what I'm looking at.
>> Item Item Description Page # Retail Member Wholesale Pkg
>> 12137 HOLIDAY PENGUIN FIGURINES 335 9.95 4.25 5.95 1 PR
>>
>> As you can see all the fields are separated by whitespace, but then
>> again so are the individual words in the description.
> I wonder what kinds of white space? For example, are the fields
> delimited by tabs, so that the item description field can contain
> spaces. If so, a simple split on \t would do you nicely.
>
> Perl has an excellent CSV library, which would make exporting to CSV
> very easy.
>
If you have a properly formatted csv file coming in cpan has some great 
tools but i've yet to see one from a client. I do this kind of thing 
often and i usually end up doing this. I don't know your db defs so i 
left that part out. This is perl btw so if you were to add a dbi 
connection above and alter the names you could run this on any linux 
powered terminal. Assuming you have the dbi library from cpan.
|open FILE, ">filename.txt" or print $!;|
while(<FILE>){
     my @line = split(/\W/,$_);
     $db->do("insert into tablename (column names) values 
('$line[0]','$line[1]'...)");
}|
close FILE;

|If \W is too aggressive \s is a good alternative but if they use tabs 
and you know they are using it then \t|. from what you sent it does look 
like they are using tabs to separate columns based on the description 
having more than one space in it. So you will probably end up splitting 
on \t. you may need to concatenate some values on the insert.
|


More information about the PLUG mailing list