Matches in a database.

Nicholas Leippe nick at byu.edu
Tue Jun 28 12:21:43 MDT 2005


On Tuesday 28 June 2005 11:20 am, C. Ed Felt wrote:
> Fellow Open Software Supporters:
>
> I would like to ask for some input on a database issue I am working on.
>
> I am getting repeat records in my billing database for a VoIP company.
> There is no fix for this as it is a SIP, (Session Initiation Protocol),
> specific problem so I am working on a post process solution.
>
> I am writing a script in Python and the database is MySQL.  The
> programming language doesn't matter since I am only looking for a
> suggestion on my sql (MySQL) query.  My current plan is to pull all
> records (called Call Data Records or CDRs) from the database for the
> last 24 hours in to a big array.  If any of the records match on a field
> called 'sessid' then they are copies (though not exact copies as there
> can be timestamp differences).
>
> So, in short, (since a paragraph will get confusing):
>
> 1.  SELECT all records in the CDR table after a requested date (usually
> 24 hours).
> 2.  Store these records in a huge array.
> 3.  Find all repeats on the 'sessid' field and store this in an array.
> 4.  Delete all repeats (save one copy of each repeat CDR).
>
> Is there a MySQL query, (version 3), to select all rows that have one or
> more matching rows on a specific field ('sessid')?  This would
> essentially combine steps 1, 2 and 3 in one MySQL query.

Yes.  There a few ways, some depending on your version of MySQL.
If you have 4.1 or newer with subselects you can do:

You'll need a primary key column in your table to identify rows uniquely for 
this method:


DELETE t.* 
FROM <table> t
LEFT JOIN (
	SELECT	sessid,
		MIN(<uid>) AS <uid>, # any expression pick the uid of the row to keep
		COUNT(*) AS cnt 
	FROM <table> 
	GROUP BY sessid 
	HAVING cnt > 1
	) dups USING(sessid) 
WHERE t.uid != dups.uid;

With MySQL < 4.1, you can simply replace the subquery with a temp table 
populated with the same--only one extra query.

If you don't have a primary key column in your table, it gets a bit more 
difficult--and might require an iterative approach, unless you have some 
other way of singling out rows, such as min(timestamp) != timestamp.




-- 
Respectfully,

Nicholas Leippe
Sales Team Automation, LLC
1335 West 1650 North, Suite C
Springville, UT  84663 +1 801.853.4090
http://www.salesteamautomation.com



More information about the PLUG mailing list